Last updated Nov 29, 2025
Prediction
Friedberg
aitecheconomy
Between roughly December 2024 and June 2026 (18–36 months after June 2023), the cost of a given large AI model training run that would cost $100 million in mid-2023 will decline to about $1 million, i.e., around a 100x reduction in training cost over that period.
Play this out. $100 million of capital spent training today is $1 million spent doing training in 18 months. Yeah, three years, 1818 to 36 months. Somewhere in that time frame is likely the time frame.View on YouTube
Explanation

As of 30 November 2025, the prediction window (≈Dec 2024–Jun 2026) is still open, so it is too early to say definitively whether it will end up correct. That said, available evidence mostly points against a 100× cost drop so far.

1. Frontier training costs have not fallen 100×

Public estimates for training top models show rising (or at best slightly reduced) dollar costs for frontier-scale training runs, not a collapse from ~$100M to ~$1M:

  • The 2025 AI Index (summarized by Visual Capitalist) puts GPT‑4 (2023) at about $79M, Gemini 1.0 Ultra (2024) at ~$192M, Llama 3.1‑405B (2024) at ~$170M, Mistral Large (2024) at ~$41M, and Grok‑2 (2024) at ~$107M for a single training run. These are all still in the tens to hundreds of millions of dollars range, not near $1M. (visualcapitalist.com)
  • A detailed cost model from Epoch AI finds that the amortized cost to train the most compute‑intensive frontier models has grown ~2.4× per year since 2016, with GPT‑4 and Gemini dominated by tens of millions in accelerator and staff costs. It projects > $1B training runs by around 2027 if trends continue. (arxiv.org)
  • Other 2025 summaries similarly estimate GPT‑4‑class training in 2025 at roughly $50M–$200M per run, even after efficiency gains—far from a 100× cost collapse. (localaimaster.com)

Given that GPT‑4‑type runs were already >$100M in 2023, current public estimates for similar‑capability training runs remain well above $1M; in some cases they are higher than 2023 figures.

2. Some individual models claim big savings, but not clearly 100× and not uncontested

  • DeepSeek‑V3 / R1 (China): DeepSeek claims it trained V3 for about $5.6M in GPU‑rental‑equivalent cost using ~2,048 H800s over ~2.8M GPU‑hours, with performance comparable to GPT‑4‑class systems. (en.wikipedia.org) That’s roughly a 10–20× reduction vs GPT‑4‑style estimates (> $100M), but:
    • The figure explicitly excludes prior R&D, ablations, and infrastructure, and has been criticized as misleading; some analyses argue that total capital outlay may be orders of magnitude higher. (reddit.com)
    • Even if you take the $5–6M figure at face value, it is still short of the 100× drop (to ~$1M) the prediction specified.
  • MiniMax M1 (Shanghai): MiniMax has claimed that its M1 model—reported as competitive with top frontier models—was trained for about $534,700, nearly 200× cheaper than some >$100M GPT‑4/4o cost estimates. (ainvest.com) However, these numbers are self‑reported, not independently audited, and it’s unclear how directly comparable the task, scale, and quality are to the 2023 $100M reference runs.

These examples show pockets of large efficiency gains and at least claimed 20–200× reductions for specific architectures and vendors. But they are:

  • Not yet clearly established as industry‑wide norms.
  • Often measured on different architectures and training setups rather than “the exact same run that cost $100M in 2023 now costs $1M.”

3. Direct same‑model comparisons suggest slower cost decline

Where we do have like‑for‑like comparisons, the cost reductions are large but not on the 100× / 18–36‑month schedule Friedberg predicted:

  • Andrej Karpathy’s 2024–25 reproduction of GPT‑2 on a single 8×H100 node brought the training cost down to about $672, versus earlier estimates of tens of thousands of dollars for the original 2019 training, an ~90× reduction over roughly five years, not 100× in 1.5–3 years. (tomshardware.com)

This suggests that per‑FLOP or per‑model cost can fall substantially over time, but the observed pace so far is much slower than “100× cheaper in 18–36 months” for a fixed large model.


Why this is labeled inconclusive rather than wrong

  • Friedberg’s window runs to June 2026, about 7 months after the current date (Nov 2025), so the prediction period is not over yet.
  • So far, public data on major frontier models shows no broad 100× drop in training cost for a run that would have cost ~$100M in mid‑2023; if anything, headline frontier runs are as expensive or more expensive. (visualcapitalist.com)
  • There are emerging claims (MiniMax, DeepSeek) that individual models with near‑frontier capability were trained at 10–200× lower marginal compute cost, but these are either below the 100× threshold (DeepSeek at ~$5–6M) or not yet well‑verified (MiniMax’s ~$0.5M claim).

Given the remaining time in the window and the lack of clear, widely accepted evidence that a $100M‑class 2023 run is now reproducible for ~$1M, the fairest judgment today is “inconclusive (too early)”, with current trends pointing against the prediction ultimately being met without a dramatic further breakthrough in the next several months.