The ability to run this at scale is going to happen because we're getting better and better at creating silicon that specializes in doing things in a massively parallelized way. And the cost of energy at the same time is getting cheaper and cheaper along with it. When you multiply these two things together, the effect of it is that you'll be able to run these models. The same output today will cost one one tenth as long as you ride the energy and compute curve for the next few years. So that's just going to naturally happen.View on YouTube
Evidence from Stanford’s 2025 AI Index shows that the cost of querying a GPT‑3.5‑level model fell from about $20 per million tokens in November 2022 to around $0.07 by October 2024—a more than 280× reduction in roughly 18 months, far exceeding the ~10× drop Chamath predicted would occur over “the next few years.” (therightstack.com) This comparison holds model quality constant (using MMLU to match GPT‑3.5‑level performance), so it is effectively measuring the cost of producing the “same output,” not of switching to weaker systems. (therightstack.com) The same report attributes these declines largely to hardware and efficiency gains: at the hardware level, compute costs declined about 30% annually while energy efficiency improved roughly 40% per year, matching Chamath’s emphasis on increasingly specialized silicon and improved energy efficiency as the drivers of cheaper inference. (shaswat.dev) Independent write‑ups of the AI Index and related analyses likewise highlight that inference prices for GPT‑3.5‑ or GPT‑4‑class capabilities have dropped by one to several orders of magnitude, with some models now matching GPT‑3.5/GPT‑4o performance at around $0.07 per million tokens, down from ~$20 in late 2022. (techopedia.com) While retail electricity prices themselves have not uniformly fallen, the effective energy and compute cost per inference has collapsed thanks to far more efficient accelerators (e.g., H100‑class GPUs delivering multi‑fold better inference performance and performance‑per‑watt than A100‑generation hardware) and much smaller, optimized models that achieve the same benchmark scores. (bestgpusforai.com) Taken together, by the mid‑2020s the per‑inference cost of producing ChatGPT‑style outputs is well below one‑tenth of the early‑2023 level, achieved via exactly the kind of hardware/efficiency curve he described, so his prediction is substantively correct (if anything, the cost decline has been far larger than he forecast).