Last updated Nov 29, 2025
aitechscience
Within the next hardware generation or so (on the order of a few years from 2023), training new foundation or near‑foundation AI models will commonly take on the order of weeks rather than many months.
So to your point, this thing is going to be like people will be training models in weeks.
Explanation

Evidence on training times is mixed and depends heavily on what counts as a “foundation or near‑foundation” model and what “commonly” means.

1. Large frontier models still often take on the order of a few months

  • A 2024 survey table of major models estimates wall‑clock training times of about 95 days for GPT‑4 and 100 days for Gemini, i.e., roughly 3+ months, even on very large GPU/TPU clusters. These are canonical frontier‑level foundation models. (aimodels.fyi)
  • Meta’s open frontier‑scale model Llama‑3.1 405B reportedly used over 16,000 H100 GPUs and around 30.84 million GPU‑hours. Given that GPU count, this implies a wall‑clock of roughly 80 days (~11–12 weeks), again on the order of several months rather than just a few weeks. (venturebeat.com)
  • A 2025 overview article notes that training large LLMs like GPT‑4 still requires vast energy and that training “can take weeks or months,” not clearly showing a decisive shift away from multi‑month runs for the biggest models. (livescience.com)

These data suggest that for the largest, GPT‑4/5‑class and Gemini‑class models, training remains in the multi‑month regime, not clearly reduced to just a few weeks.

2. Many sizable models do train in weeks – but that was already true before 2023

  • The same training‑cost table lists GPT‑3 at ~15 days, Llama‑1 at ~21 days, and Llama‑2 at ~35 days, all trained on large A100/V100 clusters. (aimodels.fyi)
  • Newer, mid‑scale models also hit short times: e.g., the 1.5‑Pints LLM (a from‑scratch pre‑train, not just a tiny toy model) was trained in 9 days while matching or beating other small LLMs, explicitly advertised as “pretraining in days, not months.” (arxiv.org)
  • Numerous domain‑specific foundation models (e.g., medical and retinal imaging foundations) emphasize orders‑of‑magnitude compute reductions and training costs <~$100, implicitly corresponding to runs well under a month on modest clusters. (arxiv.org)

So training substantial but not frontier‑scale models in one to several weeks is already feasible and demonstrated, but that was true around GPT‑3 / early LLaMA as well, i.e., before the 2023 podcast date.

3. Why the prediction is hard to score cleanly

  • The normalized prediction says that within the next hardware generation (~a few years), training new foundation / near‑foundation models will commonly take weeks rather than many months.
  • In practice, the largest and most important frontier models in 2024–2025 still cluster around ~2–3+ months of training, not a clear collapse to <~4 weeks. (aimodels.fyi)
  • At the same time, a substantial number of non‑frontier but still serious foundation‑style models now train in under a few weeks, yet this was already happening (GPT‑3, LLaMA‑1) before the prediction. The mix of weeks vs. months for different scales and use cases looks broadly continuous from 2020–2025 rather than showing a decisive industry‑wide transition.
  • Terms like “near‑foundation” and “commonly” are not precisely defined: if one restricts to GPT‑4/5‑class frontier models, the prediction looks mostly wrong; if one includes a broad class of mid‑sized open models and sector‑specific foundations, it looks plausibly right but not meaningfully new.

Because:

  • There is no sharp, widely‑acknowledged shift from “many months” to “weeks” for the top foundation models, yet
  • Weeks‑scale training is demonstrably common for many significant models, but already was pre‑2023, and
  • The claim’s key qualifiers ("commonly", "near‑foundation", "many months") are too vague to pin down a clear pass/fail,

the outcome cannot be determined cleanly as simply right or wrong. Hence the prediction is best classified as ambiguous.