When this stuff becomes very valuable, is that when you really need a precise answer and you can guarantee that to be overwhelmingly right, that's the last 1 to 2%. That is exceptionally hard. And I don't think that we're at a place yet where these models can do that.View on YouTube
Evidence since late 2022 shows that even the best frontier LLMs have not reached the kind of near‑perfect, guaranteeable reliability needed for the last 1–2% of high‑consequence use cases, matching Chamath’s prediction.
-
Vendors explicitly warn against high‑stakes reliance. OpenAI’s GPT‑4 technical report and legal briefings summarizing it state that GPT‑4 “is not fully reliable” and can still hallucinate, and they recommend human review, grounding with additional context, or avoiding high‑stakes uses altogether. These documents stress that care is required “particularly in contexts where reliability is important.” (mondaq.com) OpenAI’s 2025 Operator system card goes further, saying the system proactively refuses high‑risk tasks like banking transactions or other high‑stakes decisions, and enforces a supervised “watch mode” on sensitive sites—clear acknowledgment that current models can’t be trusted to act autonomously in these domains. (openai.com)
-
Persistent hallucinations in critical domains (law, medicine, scientific work).
- Legal: A 2024 study on “legal hallucinations” finds that when asked specific, verifiable questions about random U.S. federal court cases, ChatGPT‑4 hallucinated in 58% of cases, and other models did even worse, leading the authors to warn against unsupervised legal use. (arxiv.org)
- Medicine: A 2024 paper on patient‑summary generation reports that even carefully tuned models like GPT‑4 still generate non‑trivial numbers of medical hallucinations, with authors explicitly advising caution for clinical use because standard metrics don’t capture all errors. (arxiv.org)
- Research summarization: Work on hallucinations in academic paper summaries finds that GPT‑4 and other frontier models regularly insert subtle but incorrect claims; automated methods are required just to detect these, and the authors again recommend caution. (arxiv.org)
-
Hallucinations remain a fundamental, hard‑to‑eliminate problem. Recent coverage and research note that hallucinations are structural to how LLMs work and cannot be fully eliminated with current architectures. A 2024–2025 wave of reporting and studies emphasizes that even as models get more capable, hallucinations persist and can even increase for some new reasoning models (e.g., OpenAI’s o3 and o4‑mini showing higher hallucination rates than an older o1 model). Experts stress that this is especially problematic in domains like law, medicine, and finance where rare errors are unacceptable. (livescience.com)
-
Regulatory and technical consensus that reliability is insufficient for unsupervised high‑risk use. The 2024 EU AI Act explicitly imposes strict robustness, risk‑management, and human‑oversight obligations on “high‑risk” AI, reflecting that current systems are not considered dependable enough for critical applications without strong controls. (en.wikipedia.org) Overviews of LLM limitations continue to list hallucinations and brittleness as key barriers to deployment in high‑stakes settings. (en.wikipedia.org)
By late 2025—nearly three years after the December 2022 podcast—frontier LLMs are dramatically more capable but still cannot reliably deliver the last 1–2% of precision required for autonomous use in high‑consequence domains, and major developers and regulators openly treat this as an unsolved, exceptionally hard problem. That aligns closely with Chamath’s prediction, so it is right for the (now‑elapsed) near‑to‑medium‑term window he was talking about.