Last updated Nov 29, 2025

E165: Vision Pro: use or lose? Meta vs Snap, SaaS recovery, AI investing, rolling real estate crisis

Fri, 09 Feb 2024 19:42:00 +0000
Back to episodes
techai
If Apple Vision Pro (or similar AR goggles) are deployed to the greenhouse/lab technician workflows Freiberg describes (scanning QR codes on flowers, image/data capture, task lists), they will ultimately increase worker productivity in that job by roughly 10x compared to the current iPhone + scanner + printer setup, once the necessary software is built and adopted.
literally every aspect of this job will be massively improved and productivity will go up by ten x with these goggles. Will it happen in the next couple of weeks or months, I don't know, but my engineering team is looking into it.View on YouTube
Explanation

Available evidence by November 30, 2025 does not clearly confirm or falsify Friedberg’s prediction.

  • The precise scenario hasn’t really happened yet. Apple Vision Pro only launched in 2024 and, while Apple and partners highlight enterprise use cases (training, remote fieldwork, digital twins, industrial visualization), there is no public documentation of it being deployed in the specific greenhouse or lab‑technician workflows Friedberg described (QR codes on plants, task lists, in‑situ data capture) at scale, let alone with measured productivity multipliers. (apple.com)
  • Closest real-world analogs (non‑Apple smart glasses in agriculture/livestock) are still pilot‑level and mostly qualitative. Multiple studies and trials show that AR smart glasses can support QR‑code based data access, remote assistance, and hands‑free operation in farms and livestock settings, including detailed measurements of QR scanning performance. These papers and evaluations conclude that smart glasses are promising for improving workflow efficiency, but they do not report anything like a 10× increase in overall worker productivity versus phone‑ or paper‑based systems. (mdpi.com)
  • Greenhouse-focused pilots exist but lack quantified 10× gains. Smart-glasses projects in greenhouses (e.g., Tokyo tomato greenhouses using 5G + smart eyewear, and smart glasses assisting cluster‑tomato harvesting by assessing ripeness) report that the devices make tasks easier or support operation with fewer or less‑experienced workers, but public write‑ups do not quantify a roughly 10× per‑worker productivity improvement versus previous phone/scanner/printer workflows. (natureworldnews.com)
  • General XR/AVP productivity research is still early and mixed. Broader XR and Apple Vision Pro studies in professional contexts suggest some productivity and usability benefits, but also emphasize limited app ecosystems, integration friction with existing workflows, and user comfort constraints. These findings indicate that the long‑term productivity potential is not yet fully realized or measured in real industrial or agricultural deployments. (arxiv.org)
  • Friedberg’s own statement is explicitly long‑term and conditional. He frames the 10× improvement as what will happen if AR goggles like Vision Pro are properly deployed with the necessary software and adoption in those workflows, not as something expected within months. Secondary coverage repeats his view as a forward‑looking thesis about spatial computing in agriculture, not as a claim that has already been validated. (agtecher.com)

Because (1) the specific Vision Pro/AR‑goggles greenhouse workflow Friedberg described has not yet been widely implemented with mature software, and (2) existing field and academic studies in adjacent agricultural/livestock scenarios do not provide clear, quantitative evidence of ~10× productivity gains, the prediction cannot currently be tested. It remains a plausible but unproven long‑term claim, so the outcome is inconclusive rather than clearly right or wrong.

techmarkets
Apple will generate at least $100 billion in cumulative revenue from Apple Vision Pro hardware (current and next-generation models) within less than five years of launch (i.e., before February 2029).
I think they're going to sell $100 billion of Apple Vision pros, not this version, but this version plus the next version, probably over the next, I would guess for them to get to 100 billion in sales. It'll take them less than five years.View on YouTube
Explanation

The prediction is that Apple will generate at least $100 billion in cumulative revenue from Apple Vision Pro hardware (current and next‑generation models) within less than five years of launch, i.e., before around February 2029.

As of the current date (November 30, 2025):

  • Fewer than two years have passed since the initial Vision Pro launch in early 2024.
  • Public reporting and analyst estimates indicate Vision Pro sales are in the hundreds of thousands to low single‑digit millions of units, not tens of millions, and nowhere near $100B in revenue yet. At roughly $3,500 per unit, Apple would need on the order of ~28–29 million units sold to reach $100B ([100,000,000,000 ÷ 3,500 ≈ 28.6M], see calculation).
  • There is no evidence that Apple has approached $100B in cumulative Vision Pro revenue so far; that would be an extraordinary, widely reported milestone if it had occurred.

However, the deadline for the prediction is February 2029, and that date has not yet arrived. The prediction could still come true if future Vision Pro generations sell much more strongly.

Therefore, it is too early to determine whether the prediction is ultimately correct or incorrect.

Jason @ 00:16:30Inconclusive
tech
Apple will become the dominant vendor in the high-end VR/AR headset category created around devices like Vision Pro, effectively capturing the majority of the market versus competitors such as Meta and others over the coming years.
I think this is gonna they're gonna run the table on this.View on YouTube
Explanation

Jason’s claim was long‑term: Apple would "run the table" in the Vision Pro–style high‑end VR/AR category over the coming years, i.e., become the clear dominant vendor.

So far (end of 2025), available data show:

  • In 2024, Meta still held about 70–75% of global VR/MR headset unit shipments, with Apple only around 5%, making Apple a distant third overall rather than a dominant player. (trendforce.com)
  • IDC estimates Apple shipped roughly 400,000 Vision Pros in 2024, versus 5.6 million Meta headsets, and notes that premium devices (> ~$1,000) are only 5–6% of total VR shipments, underscoring that Apple’s ultra‑high‑end segment is still small and niche. (theverge.com)
  • Analysts do not yet describe Apple as having decisively overtaken Meta in any broad “high‑end XR” category; instead, Meta is still characterized as the dominant XR hardware platform by unit share. (x.pawanlabana.com)
  • In October 2025, reports indicated Apple paused development of a next‑generation Vision Pro to redirect resources toward AI smart glasses, explicitly citing weak momentum and strong competition from cheaper Meta Quest devices—hardly an indication that Apple has already "run the table". (reuters.com)
  • New high‑end competitors (e.g., Samsung’s Galaxy XR headset around $1,799) are just entering the market in late 2025, which further complicates any early verdict about long‑term dominance. (reuters.com)

However, the prediction’s horizon is explicitly multi‑year and extends beyond the ~1.75 years that have elapsed since February 2024. Apple could still pivot, release new high‑end models, or consolidate a larger share of the premium segment later. Because the claimed future state (Apple clearly dominating the high‑end XR category) is neither realized nor definitively ruled out yet, the fairest judgment at this time is that the outcome is too early to call.

Chamath @ 00:09:13Inconclusive
techhealth
Over the coming years, negative outcomes currently observed in younger generations (e.g., depression, suicide, drug/SSRI dependence, reduced marriage and childbirth) will worsen, at least at the margin, as a byproduct of increasingly immersive consumer technologies (social media, VR/AR, etc.), rather than improve.
I suspect on the margin, if you were going to bet all of these things that we see in these young people today will get worse as a byproduct of technology, not necessarily get better.View on YouTube
Explanation

Available data through 2024–25 show some aspects of youth well‑being still at historically bad levels, but the short post‑prediction window and ambiguous causality make it too early to say Chamath’s technology‑driven worsening thesis is right or wrong.

Mental health and suicide: CDC’s 2023 Youth Risk Behavior Survey still finds about 40% of high school students reporting persistent sadness or hopelessness and roughly 1 in 5 seriously considering suicide—far worse than a decade ago—but several indicators improved slightly from 2021 to 2023 (e.g., persistent sadness 42%→40%, and some suicide‑risk measures falling in key subgroups). (cdc.gov) NSDUH data and SAMHSA’s 2024 report show that, between 2021 and 2024, serious suicidal thoughts, plans, attempts, and major depressive episodes among 12–17 year‑olds declined, not increased. (mentalhealthresources.org) So while levels remain high, they have not clearly worsened “on the margin” since early 2024.

Marriage and childbirth: Long‑run trends that Chamath referenced—later and less marriage, fewer children—have continued. Pew and Census show that by 2023 only 7% of 18–24‑year‑olds and 29% of 25–29‑year‑olds were married, far below 1990s levels, and the share of 25–34‑year‑olds who have ever married or live with a child has fallen markedly since the mid‑2000s. (pewresearch.org) CDC natality data report record‑low birth rates for teens and women 20–24 in 2023, with further declines for these age groups in 2024 even as overall births ticked up slightly—so reduced early‑life childbearing among younger cohorts has indeed persisted. (blogs.cdc.gov)

Causality from technology: The specific claim that these negative outcomes will worsen as a byproduct of increasingly immersive consumer technologies remains scientifically unsettled. Large‑scale and meta‑analytic work generally finds that overall screen or social‑media time has only small average associations with adolescent well‑being, explaining well under 1% of variance, although “problematic/addictive” use and cyberbullying are more strongly linked to depression and suicidality. (formative.jmir.org) At the same time, CDC analyses highlight family violence, abuse, and other home‑environment factors—not screens per se—as the dominant correlates of teen suicide attempts and poor mental health. (sfchronicle.com) This mix of evidence does not yet support a clear, dominant causal pathway from newer immersive tech (social, VR/AR) to worsening aggregate outcomes.

Timing: Chamath framed this as a trend unfolding "over the coming years." As of November 2025, we only have full national data through 2023 and partial/provisional data for 2024, with essentially no settled, population‑level outcome data yet for 2025 in the key domains (youth depression, suicide, medication dependence, etc.). Given data lags and the slow diffusion of truly immersive consumer tech, there has not been enough time—or clean causal evidence—to definitively judge whether things are getting worse because of technology versus continuing long‑running demographic and socioeconomic trends.

Because (1) some mental‑health indicators have slightly improved since 2021, (2) marriage/childbearing trends have continued but are driven by many non‑tech factors, and (3) the causal role of technology remains actively debated with only mixed evidence, the prediction’s core claim cannot yet be decisively confirmed or refuted. Hence the result is best classified as inconclusive (too early) rather than clearly right or wrong.

Chamath @ 00:42:14Inconclusive
aieconomy
In the mature AI market, large foundational language models as a category will generate little to no direct economic value because powerful models will be broadly available for free (or effectively free), making it impossible to sustain high-margin, closed foundational model businesses trained primarily on open internet data.
I think foundational models will have no economic value. I think that they will be an incredibly powerful part of the substrate, and they will be broadly available and entirely free.View on YouTube
Explanation

As of November 30, 2025, there isn’t enough evidence to judge Chamath’s long‑run claim about a mature AI market.

What we observe today (2024–2025) points against the prediction in the short term:

  • Closed, proprietary foundation‑model companies are generating very large direct revenues from their models. OpenAI reports around $10B in annual recurring revenue by mid‑2025 from ChatGPT subscriptions and API usage, excluding Microsoft licensing deals, and is projecting much higher revenue by 2030. (thetechportal.com)
  • Anthropic has reached $3B+ in annualized revenue by May 2025 and is estimated at $5B ARR later in 2025, largely from pay‑per‑token API calls for Claude models; its valuation has risen to around $183B. (cnbc.com) These are exactly the kind of high‑margin, closed foundational‑model businesses the prediction said would be unsustainable.
  • A recent study on open vs. closed models finds that while open models can be up to 84% cheaper to run and perform comparably, they still account for only ~20% of usage and ~4% of revenue; most economic value is currently accruing to closed‑model providers. (itpro.com)

On the other hand, there is movement toward powerful models being broadly available for free or near‑free:

  • Meta’s Llama 3 series provides strong models (up to 70B+ parameters, later 400B‑parameter variants) that are downloadable at zero license fee and usable for most commercial purposes under a community license, even if they are not truly open source. (gigazine.net)
  • Other vendors (e.g., Mistral) have also released high‑quality open‑weight models, and open‑weight downloads and usage are rising quickly. (en.wikipedia.org) This partially supports the idea that capable base models may become a cheap or free “substrate.”

Why the overall forecast is still inconclusive rather than right or wrong:

  • Chamath explicitly anchored his claim to the “mature” AI market. Today’s generative‑AI ecosystem is still in hyper‑growth: industry forecasts expect the total AI market to grow from about $131B in 2024 to roughly $642B by 2029, with generative AI growing at ~90% CAGR—classic signs of an early rather than mature market. (finance.yahoo.com)
  • In past tech waves (e.g., operating systems, databases, cloud infrastructure), it has often taken a decade or more for commoditization and open alternatives to erode margins. We are less than two years from the February 2024 prediction date, so current high revenues and non‑free access don’t yet prove that margin compression and commoditization won’t happen later.

Bottom line:

  • Today’s evidence contradicts the prediction if you interpret it as applying to the 2024–2025 market: foundational models are generating substantial direct economic value, and leading models are not “entirely free.”
  • But because Chamath framed it as a statement about a mature market with no clear time horizon, it’s too early to say whether the long‑run equilibrium will match his view. The trend could still move toward commoditized, mostly‑free base models with value shifting to applications and data, or it could stabilize with durable, profitable closed‑model providers.

Therefore, given the timeline and the explicit “mature market” qualifier, the prediction is best classified as inconclusive (too early to tell) rather than clearly right or wrong.

Chamath @ 00:44:45Inconclusive
aieconomymarkets
Over time, open-source AI models will erode the economic value of general-purpose model providers to near zero, while (a) infrastructure "picks and shovels" providers (especially those with proprietary AI hardware and tokens-per-second services) and (b) owners of valuable proprietary datasets will capture most of the sustainable economic gains from AI.
So my refined thoughts today are sort of what my initial guess was when we started talking about AI a year ago, which is the picks and shovels. Providers can make a ton of money, and the people that own proprietary data can make a ton of money. But I think open source models will basically crush the value of models to zero economically. Even though the utility will go to infinity, the economic value will go to zero.View on YouTube
Explanation

As of November 30, 2025, there is not enough evidence to say Chamath’s long‑run structural prediction has clearly come true or is clearly false.

Why it’s too early to call:

  • The claim is explicitly long term (“over time”) and strong in form (“crush the value of models to zero economically”). Less than two years have passed since the February 2024 podcast, which is short for judging industry‑structure outcomes of this kind.
  • In that time, proprietary general‑purpose model providers have not seen their economic value approach zero. OpenAI’s annualized revenue hit about $10B by mid‑2025 with projections above $12B for 2025, and internal projections plus secondary sales talk value it in the hundreds of billions of dollars. (finance.yahoo.com) Anthropic likewise has reached multi‑billion‑dollar run‑rate revenue and raised at valuations around $60B and then ~$180B in 2025. (sacra.com) That is the opposite of “near zero” economic value so far, but it doesn’t rule out later commoditization.

Evidence on open source vs proprietary models:

  • Open‑source models have advanced rapidly. A 2025 benchmark finds the best open‑source LLMs are now only single‑digit points behind the top proprietary models on quality, while being ~7–8x cheaper per token and often faster—clear support for the technical and cost side of commoditization. (whatllm.org)
  • However, a recent study cited by the Linux Foundation reports that open models account for roughly 20% of usage but only about 4% of AI‑model revenue; enterprises still overwhelmingly pay for closed‑source APIs due to trust, compliance, and switching‑cost advantages. (itpro.com) That means open source has not yet eroded the bulk of model‑provider economics, even if it is exerting price pressure at the margin.

Evidence on “picks and shovels” and proprietary data owners:

  • The “picks and shovels” part of his thesis is strongly supported so far. Nvidia’s market cap has exploded into the multi‑trillion range on the back of AI‑data‑center GPUs, which now constitute the vast majority of its revenue, and hyperscalers are driving unprecedented AI capex. (markets.financialcontent.com) OpenAI’s own long‑term infrastructure deals (e.g., massive, multi‑hundred‑billion‑dollar cloud and data‑center commitments with Oracle and partners under the Stargate project) underline how much value is accruing to compute and data‑center providers. (group.softbank)
  • Data‑rich incumbents are indeed monetizing proprietary content with AI (for example, Thomson Reuters’ AI‑enhanced legal and tax products contributing to solid organic revenue growth in its core segments), but the scale of value captured here is still much smaller than that at Nvidia, cloud hyperscalers, or the leading model labs. (reuters.com) It’s directionally consistent with Chamath’s view but not yet clearly “most” of the sustainable gains.

Net assessment:

  • Central strong claim (“open source will drive the economic value of general‑purpose model providers to near zero”) is not borne out so far: model providers are currently among the most valuable and fastest‑growing companies in the sector.
  • Supporting sub‑claims (infra/picks‑and‑shovels win big; proprietary data is valuable; open source compresses prices and utility goes up) are partially supported by current evidence.
  • Because the prediction concerns the eventual industry structure and gives no explicit time horizon, the present data can’t definitively validate or falsify the end‑state he describes. Hence, the fairest classification today is **“inconclusive (too early)” rather than clearly right or wrong.
Sacks @ 00:49:45Inconclusive
aitechmarkets
Over the next several years, OpenAI will maintain a performance lead over open-source models and other competitors sufficient for it to remain the leading commercial AI model provider and to be a financially successful company.
I do think there is an argument that open AI will stay in the lead and actually do quite well.View on YouTube
Explanation

As of November 30, 2025, parts of Sacks’s prediction are partially aligned with reality, but the forecast explicitly covers “the next several years,” so it’s too early to give a definitive verdict.

On performance and market leadership:

  • OpenAI has continued to ship frontier proprietary models (GPT‑4.5, GPT‑4.1, GPT‑5, GPT‑5.1, o‑series reasoning models). Usage data from Langfuse for October 2024–September 2025 show that the majority of the top‑used models in real applications are OpenAI models (GPT‑4o mini, GPT‑4o, GPT‑4.1, GPT‑4.1 mini, GPT‑5 variants, etc.), suggesting that OpenAI is still a dominant commercial provider on the application layer.【turn0search0】
  • However, competitors have made substantial gains. Anthropic’s Claude Opus 4.5 is reported to beat Google Gemini 3 Pro and OpenAI’s GPT‑5.1 on some coding benchmarks (e.g., SWE‑Bench Verified) and to lead in enterprise AI adoption with about 32% share vs. ~25% for OpenAI.【turn0news13】 Open‑source and non‑OpenAI models (e.g., DeepSeek, Qwen, Llama) now top various academic and community leaderboards; ensembles of smaller open‑source models can outperform GPT‑4.1 on a majority of evaluated datasets in at least one published study.【turn0search1】【turn0academia29】 This makes the idea of a clear, uncontested performance “lead” more debatable.

On financial success:

  • OpenAI’s financial metrics look strong on growth and valuation: it raised about $40 billion at a $300 billion valuation in April 2025 and reported annualized revenue of roughly $12 billion by July 2025, up from $3.7 billion in 2024.【turn0search23】 In July 2025 there were tens of millions of paying ChatGPT subscribers, with projections of continued rapid revenue growth.【turn0news12】
  • At the same time, the company is still losing large amounts of money and is projected to require over $200 billion in additional capital by 2030 to fund compute and infrastructure, according to HSBC estimates.【turn0news14】 Whether this trajectory ultimately counts as long‑term “financial success” is uncertain and depends on outcomes well beyond 2025.

Because (a) the claim is about maintaining a lead “over the next several years,” a period that has not yet fully elapsed, and (b) both competitive performance and long‑run financial sustainability are actively in flux, the most accurate assessment today is that the prediction’s ultimate truth value is inconclusive (too early to tell).

aimarkets
If OpenAI continues to keep even a modest quality lead over open-source models, it will capture the vast majority of the consumer-facing GPT/query market (analogous to Google’s dominance in search), with most users preferring OpenAI’s service over alternatives.
if OpenAI just maintains a little bit of a lead over open source, then it could basically when the vast, vast majority of the call it consumer search or consumer GPT market.View on YouTube
Explanation

Available data shows that OpenAI’s ChatGPT currently has a very large lead in direct consumer chatbot usage, but the prediction is conditional, vague in scope, and not time‑bounded, so it can’t be cleanly scored as fully right or wrong.

On the “market share” side:

  • ChatGPT is by far the most visited AI chatbot site globally, with ~46.6B visits from Aug 2024–Jul 2025; the next ten chatbots combined are under 10B, implying ChatGPT has ~80%+ of traffic among the top players.【0search0】0search1】0search11】
  • Similarweb-based reporting shows ChatGPT getting nearly 6B visits in Aug 2025, about 8× Gemini and far ahead of other rivals.【0search3】
  • Some estimates put ChatGPT at ~60% of the U.S. AI‑chatbot market.【0search5】
  • In October 2025, Sam Altman reported 800M weekly active ChatGPT users, roughly double the combined weekly users of Meta AI, Gemini, Grok, Perplexity, and Claude.【2news13】 This strongly supports the idea that OpenAI has captured a dominant share of consumer-facing GPT/chatbot usage, at least in the narrow sense of standalone assistants.

On the “quality lead over open source” and scope assumptions:

  • Benchmarks generally still show GPT‑4o slightly ahead of open-source LLaMA 3.x models on broad reasoning/knowledge metrics such as MMLU and multimodal tasks, i.e., a modest average quality edge.【1search0】2search9】
  • But top open-weight models (e.g., Llama 3.1 405B, DeepSeek V3/R1) now match or beat GPT‑4/4o on some benchmarks, including MMLU and certain reasoning tasks, indicating that the gap has narrowed and sometimes reverses depending on the test.【2search5】1search2】1search7】1news15】 That makes it unclear whether OpenAI still has the kind of clear, consistent “little bit of a lead” the quote assumes.
  • The prediction also refers to the broad “consumer search / GPT market,” not just standalone chatbots. In that wider space, AI assistants embedded in other products are significant: Meta AI reported almost 500M monthly users by late 2024,【2search4】 ByteDance’s Doubao is the top chatbot in China with over 150M MAUs,【0news14】 and various regional or app‑integrated assistants (Gemini in Google Search, Doubao in Douyin, etc.) capture substantial consumer query volume that doesn’t pass through OpenAI at all.【0search2】0search8】

Because:

  1. the statement is conditional and probabilistic ("if" and "could"),
  2. key terms like “little bit of a lead” and “vast, vast majority” are not precisely defined, and
  3. the relevant market (standalone chatbots vs all consumer AI query experiences) is contested, we can’t decisively say the prediction has clearly succeeded or failed as of late 2025. It is best classified as ambiguous rather than definitively right, wrong, or merely “too early.”
aitech
OpenAI’s custom GPT platform and associated developer network effects will create a self-reinforcing ecosystem around ChatGPT that open-source model ecosystems will find difficult to match, leading to a durable advantage for OpenAI in capabilities and available applications.
So you have a classic developer network effect where you've got OpenAI aggregating hundreds of millions of consumers because they perceive that ChatGPT is the best. Then you've got developers wanting to reach that audience. So they build custom gpts on the OpenAI platform. That actually gives ChatGPT more capability. Yeah. And that's something that open source can't easily catch up with.View on YouTube
Explanation

The prediction was that OpenAI’s custom GPT platform and its developer network effects would create a self‑reinforcing ecosystem that open‑source models would find hard to match, giving OpenAI a durable advantage in capabilities and applications.

1. OpenAI did build a large custom‑GPT ecosystem
OpenAI’s GPT Store launched in January 2024; by early 2024 more than 3 million custom GPTs had been created, with on the order of ~150–160k public GPTs in the store and hundreds to thousands of new GPTs added daily. Academic work (GPTZoo) cataloged over 700k GPT instances shortly after launch, indicating substantial real‑world experimentation and usage around the custom‑GPT platform. (openai.com)
This supports the first part of the claim: OpenAI did generate a sizable developer ecosystem and a broad catalog of applications on top of ChatGPT.

2. But open‑source ecosystems have grown larger and highly competitive
By late 2024, Hugging Face already hosted over 1 million AI models; by August 2025, an empirical study found ~1.86 million models on the platform, documenting extensive “family trees” of fine‑tuned derivatives. (arstechnica.com)
Hugging Face reports massive enterprise usage and positions itself explicitly as a way for businesses to rapidly find and deploy open‑source models, showing that open ecosystems are heavily used in production, not just research. (nutanix.com)
A joint MIT–Hugging Face study cited by the Financial Times found that by 2025 China alone accounted for a larger share of global open‑model downloads than the U.S., underscoring that open‑weight ecosystems are vibrant, global, and scaling quickly. (ft.com)

3. Open models now match or surpass OpenAI on key capabilities
Multiple open or open‑weight models rival or beat OpenAI’s frontier models on important benchmarks:

  • DeepSeek‑R1, an open‑weight reasoning model, scores higher than GPT‑4o on several standardized tests: it achieves ~90.8% on MMLU and 97.3% on the MATH‑500 benchmark, beating GPT‑4o’s 88.7% MMLU and 76.6% MATH scores. (edenai.co)
  • A July 2025 survey of live benchmarks ranks DeepSeek‑R1 as the top open‑source model and 4th overall on Chatbot Arena, with extremely strong reasoning and coding metrics (e.g., ~99% on MATH‑500 and >90% on AIME‑style tasks), explicitly noting that it matches or outperforms OpenAI’s o‑series on several reasoning benchmarks while being far cheaper. (champaignmagazine.com)
  • Meta’s Llama 4 Maverick is reported to offer performance comparable to GPT‑4o on coding and reasoning, and Meta is rolling out Llama across major consumer surfaces like WhatsApp, Messenger and Instagram—making these open models widely accessible to end‑users and developers. (theverge.com)
  • Alibaba’s Qwen3 family (Apache‑2.0 open‑license) and related QwQ / Qwen2.5 models are released as open weights; Alibaba claims Qwen2.5‑Max outperforms GPT‑4o and leading open competitors on multiple benchmarks, and Qwen3 models provide reasoning and multimodal capabilities similar in spirit to OpenAI’s o‑series and GPT‑4o, with downloadable weights for developers. (en.wikipedia.org)
  • Open‑source reasoning models like AM‑Thinking‑v1, built on open Qwen2.5‑32B, report surpassing DeepSeek‑R1 itself on AIME and LiveCodeBench, further pushing the open frontier. (arxiv.org)

Collectively, these results show that by mid‑to‑late 2025, open‑weight models are not lagging far behind ChatGPT; in several high‑value domains (math, coding, reasoning) they match or exceed OpenAI’s best generally available models.

4. Open ecosystems for applications rival custom GPTs
Open‑source platforms provide their own powerful “developer network effects”:

  • Hugging Face’s Hub and Spaces support millions of models and thousands of hosted apps, with tooling (Transformers, Text‑Generation‑Inference, Spaces, etc.) designed to make deploying and sharing LLM‑based applications as easy as using a closed API. (arstechnica.com)
  • Large ecosystems have formed around open models and agent frameworks (e.g., DeepSeek R1, Llama, Qwen) for fine‑tuned assistants, local deployments, and enterprise‑specific agents, often favored because they avoid vendor lock‑in and reduce inference cost—key considerations for businesses choosing platforms. (leanware.co)

Meanwhile, although OpenAI’s GPT Store has hundreds of thousands of GPTs and significant usage, its growth plateaued relatively quickly and it remains tied to ChatGPT’s paid tiers and OpenAI’s policies, limiting its reach compared with globally downloadable open‑weight models that can be deployed anywhere. (originality.ai)

5. Net assessment vs. the original claim
Two years after the prediction (Feb 2024 → late 2025):

  • Yes, OpenAI retains one of the strongest proprietary consumer platforms, and custom GPTs meaningfully extend ChatGPT’s capabilities.
  • But open‑source ecosystems have not been left behind. On benchmarks and in real‑world deployments, open‑weight models like DeepSeek‑R1, Llama 4, Qwen3 and successors have caught up to or surpassed OpenAI’s widely available models in several capability areas, while the open model/application ecosystem (Hugging Face and others) is at least comparable in scale and diversity to the GPT Store.

Because open‑source ecosystems have matched and, in some niches, exceeded OpenAI’s capabilities and application diversity—contradicting the idea that they would find it “difficult to catch up” and that OpenAI would enjoy a clear, durable moat from custom GPT network effects—the prediction is best judged as wrong.

By sometime between February 2025 and August 2025, the quality of leading large language models trained primarily on the open internet—specifically OpenAI’s model, Meta’s Llama, Mistral, and xAI’s model—will have converged such that on standard third‑party benchmarks they achieve roughly the same performance level (no single model having a large, clear quality lead).
I think they're all going to converge to the same quality in the next, probably 12 to 18 months.View on YouTube
Explanation

By the end of the 12–18 month window (roughly Feb–Aug 2025), the gap between major labs’ models had narrowed but had not fully converged to “about the same quality,” and there were models with large, clear leads over others.

Key points:

  • Meta’s Llama 3.1 405B did reach near-parity with earlier GPT‑4-class models on many classic benchmarks, e.g. MMLU, GSM8K, HumanEval and MGSM, often matching or slightly beating GPT‑4o and Claude 3.5 Sonnet on individual tests.(llamaai.online) This is strong evidence for partial convergence between Meta and OpenAI on older, static benchmarks.

  • However, on widely used third‑party human‑preference benchmarks like LMSYS Chatbot Arena, Llama and Mistral still trailed the frontier. Llama 3.1 405B’s text‑arena Elo is around 1260–1270,(rankedagi.com) whereas frontier closed models sit much higher (e.g. GPT‑4.5 and later GPT‑5 variants in the mid‑1400s, Gemini 2.5/3 Pro and Anthropic Claude 4.x similarly high).(analyticsvidhya.com) That ~150–200 Elo gap corresponds to a large win‑rate difference, contradicting the idea that all models are at roughly the same level.

  • Mistral’s best general models also remained noticeably weaker than top OpenAI/Google/Anthropic/xAI models on aggregate benchmarks. Independent leaderboards and evaluations put Mistral Large 2 at about 81% MMLU and a substantially lower Arena Elo than GPT‑4‑class systems, which are ~88–89% on MMLU and rated much higher in human preferences.(trustbit.tech) This again suggests a clear, measurable quality gap rather than full convergence.

  • xAI’s Grok 3, released Feb 2025, is explicitly reported as surpassing other frontier models on several hard benchmarks (AIME, GPQA, LiveCodeBench) and holding the top or near‑top Elo on Chatbot Arena (≈1400+), ahead of GPT‑4o and other leading systems.(twitter.com) That gives xAI a clear lead over Meta’s Llama and Mistral’s models on third‑party, human‑preference metrics, directly contradicting the claim that no model would have a large, clear advantage.

  • New reasoning‑centric benchmarks introduced in 2025 show substantial spread, not tight clustering. For example, AI4Math finds OpenAI’s o3‑mini and DeepSeek R1/V3 above 70% accuracy on challenging university‑level math, while Llama 3.3 70B and GPT‑4o‑mini are below 40%.(arxiv.org) A separate cross‑lingual study on Cantonese/Japanese/Turkish reports that GPT‑4o and Claude 3.5 lead, while Llama 3.1 and Mistral Large 2 lag significantly in fluency and accuracy.(arxiv.org) These independent academic benchmarks show that models from these labs do not sit at a single, indistinguishable quality level.

  • Methodology critiques (e.g., of Chatbot Arena) and claims that open models “are catching up” do not erase the observed quantitative gaps. Papers and articles note that Arena can be gamed and that open vs. closed performance gaps have shrunk to roughly a one‑year lag, with Llama 3.1 reaching parity with earlier GPT‑4 variants.(time.com) But they still describe a meaningful frontier edge for the very best proprietary models over Llama and Mistral, and a strong lead for certain reasoning models (OpenAI’s o‑series, Grok) in 2025.

Taken together, the evidence shows directional convergence (gaps shrank, especially Meta vs. OpenAI on older benchmarks), but not the full convergence with no clear leader that Chamath predicted. There remained sizeable, well‑documented quality differences among OpenAI, Meta’s Llama line, Mistral, and xAI’s Grok as of mid‑ to late‑2025, so the prediction is best judged as wrong.

By sometime between August 2024 and November 2024, OpenAI’s frontier model and open‑source models like Llama and Mistral (trained on the open internet) will have reached essentially the same quality level on common evaluation benchmarks, eliminating OpenAI’s meaningful quality advantage that exists in February 2024.
they're all getting to the same quality code point and they will be there within the next 6 to 9 months.View on YouTube
Explanation

By the August–November 2024 window, open‑source frontier models had essentially matched OpenAI’s then‑frontier model GPT‑4o on the main academic benchmarks, so Chamath’s claim about benchmark quality parity is broadly borne out (even though OpenAI still had some experiential edge in user‑preference tests and product polish).

Timing and models in scope

  • In February 2024, OpenAI’s top public model was GPT‑4 / GPT‑4 Turbo, clearly ahead of open‑source models like Llama 2 and Mixtral on standard benchmarks and in Chatbot Arena rankings.
  • OpenAI released GPT‑4o in May 2024; through mid‑2024 it topped the LMSYS Chatbot Arena leaderboard, beating prior GPT‑4 variants and other proprietary models by a noticeable Elo margin, reinforcing OpenAI’s lead at that time. (arstechnica.com)
  • Meta released the Llama 3.1 family, including the 405B‑parameter model, on July 23–24, 2024, explicitly positioning it as a frontier‑scale open model. (radicaldatascience.wordpress.com) This is within the 6–9 month window from early February 2024 (and certainly in place by August–November 2024).

Benchmark parity: Llama 3.1 vs GPT‑4/4o

  • Meta’s and independent write‑ups report that Llama 3.1 405B’s base or chat variants match or slightly surpass GPT‑4/4o on many standard text benchmarks:
    • MMLU: Llama 3.1 405B ≈88.6 vs GPT‑4/4o ≈85–88.7 depending on setup. (unfoldai.com)
    • GSM8K (math): Llama 3.1 405B ≈96.8 vs GPT‑4o ≈94–96.1. (unfoldai.com)
    • IFEval, ARC, and several other knowledge/reasoning benchmarks show Llama 3.1 405B at or above GPT‑4/4o on many tasks, while GPT‑4o retains a small edge on others like HumanEval and some social‑science MMLU subsets. (unfoldai.com)
  • Meta’s own human evals vs GPT‑4o show roughly comparable quality: Llama 3.1 405B wins ~19% of comparisons, ties ~52%, and loses ~29% against GPT‑4o—i.e., the majority of interactions are ties, with only a modest edge for GPT‑4o. (unfoldai.com)
  • Multiple analyses and news pieces at the time describe Llama 3.1 405B as “competitive with the best closed‑source models” and note that it meets or exceeds GPT‑4o on several headline benchmarks, calling it “one of the best and largest publicly available foundation models” and “the first open‑source frontier model” that can beat closed models on various metrics. (aws.amazon.com)

Mistral and other open models

  • Mistral Large (API‑hosted but trained on web data) launched in February 2024 and was already close to GPT‑4 on MMLU and other benchmarks, though typically a few points behind GPT‑4 on broad general‑knowledge tests (e.g., MMLU ~81 vs GPT‑4 ~86). (dailyai.com) This supports the broader pattern Chamath described: non‑OpenAI models rapidly closing the gap on standard evaluations.

Does this eliminate a “meaningful quality advantage”?

  • On common academic benchmarks that dominated 2022–2023 discourse (MMLU, GSM8K, HumanEval, etc.), the gap between OpenAI’s frontier model (GPT‑4o) and top open models (especially Llama 3.1 405B) had shrunk to low‑single‑digit percentage points by late July 2024, with leadership flipping back and forth depending on the specific test. (unfoldai.com)
  • Industry reporting at the time explicitly framed Llama 3.1 as “on par with” or “competing head‑to‑head with” GPT‑4o and Claude 3.5 Sonnet, rather than a clear tier below, which is consistent with Chamath’s claim that the former OpenAI advantage on those benchmarks had largely disappeared. (aws.amazon.com)

Caveats

  • User‑preference leaderboards like LMSYS Chatbot Arena continued to show OpenAI models (GPT‑4o and successors) at or near the top, with open models slightly lower—indicating that in real‑world “vibes‑based” comparisons, OpenAI still retained a modest edge, especially in overall polish and non‑text modalities. (arstechnica.com)
  • OpenAI later released GPT‑4.1 in April 2025, re‑establishing a clearer performance lead over Llama 3.1 405B on several newer benchmarks like SWE‑Bench and global MMLU, but that falls outside the August–November 2024 window. (aimodels.fyi)

Given that by late 2024 open‑source frontier models (most notably Llama 3.1 405B) were benchmark‑competitive with GPT‑4o—often within a point or two, sometimes ahead, sometimes behind—Chamath’s specific prediction that OpenAI’s earlier benchmark advantage would be gone within 6–9 months is best judged as right, with the nuance that OpenAI still held a small but real edge in many human‑preference and product‑level evaluations.

governmenteconomy
If US office commercial real estate values fall by roughly two‑thirds as implied (from about $3T to $1.8T), resulting in large losses to pension and retirement funds, then within the following few years the US federal government and/or Congress will implement a material structured support or bailout program specifically aimed at cushioning retirees and pensioners from these office‑related losses (rather than allowing those write‑downs to fully hit beneficiaries).
You're not going to see governments let that happen. You're going to see the federal government. There's going to be some action at some point, and it's unlikely the office market is going to suddenly rebound overnight. If this stays the way it is, who's going to fill that hole for retirees and pensioners? Because we're not going to let that all get written down. Someone is going to step in and say, we've got to do something about this, and there's going to need to be some sort of structured solution to support retirees and pensioners, because that's ultimately who ends up holding the bag in this massive write down.View on YouTube
Explanation

As of November 30, 2025, both the trigger and the time window in Friedberg’s conditional prediction have not clearly been met.

  1. Office values have fallen, but not by “roughly two‑thirds.” Nationally, average U.S. office sale prices are down about 37% from 2019 levels as of the end of 2024, with forecasts of roughly another 10% decline over the rest of the decade, not a 60–70% collapse in aggregate values. (businessinsider.com) Academic work on the “office real estate apocalypse” estimates nationwide office value destruction of roughly $500–550B from 2019 levels—substantial, but far short of wiping out two‑thirds of the asset class if the total stock is around the low–single‑digit trillions. (business.columbia.edu) A Washington Post summary of the same research notes about $557B in office value lost from 2019–2023, again implying a drop on the order of tens of percent, not two‑thirds. (washingtonpost.com)

  2. No targeted federal bailout for pensioners’ office losses has been enacted. Congress has held hearings on commercial real estate stress, where industry representatives explicitly said they were not seeking a bailout and instead asked for regulatory and accounting flexibilities. (marketwatch.com) There has been no new TARP‑style or Butch Lewis‑style program specifically designed to compensate retirees or pension plans for office‑sector write‑downs; existing pension assistance laws (e.g., the Butch Lewis Act’s multiemployer relief) predate this podcast and are not office‑specific. (en.wikipedia.org)

  3. The time horizon (“within the following few years”) is not over. The episode aired in February 2024, so fewer than two years have passed—well short of a typical “few years” reading—and the extreme valuation‑collapse scenario that was supposed to trigger a bailout has not occurred.

Because the prerequisite scenario (a ~two‑thirds collapse in U.S. office values severely hitting pensions) has not fully materialized, and the envisaged multi‑year window for a policy response is still open, Friedberg’s conditional prediction cannot yet be judged right or wrong.

economymarkets
Over the ensuing period after February 2024, a significant portion of regional US banks holding office‑backed commercial real‑estate loan portfolios will experience material impairments and financial stress (including stock price declines and/or solvency concerns) as office values fall and loans are written down.
there's a huge amount of equity that's been written off. But in addition to that, there's a lot of debt holders who are in trouble too. And that debt is is held by regional banks. So these commercial loan portfolios are significantly impaired.View on YouTube
Explanation

Evidence since February 2024 shows that many U.S. regional banks with sizable office‑backed commercial real‑estate (CRE) portfolios have, in fact, taken meaningful write‑downs and faced notable financial stress, even though the system overall has remained resilient.

• Office values and fundamentals deteriorated further as forecast: Moody’s projected office values would fall roughly 26% peak‑to‑trough through 2025, and office‑loan delinquencies climbed to multi‑year highs while other CRE segments remained relatively stable, making office the clear weak spot and a concentrated risk for smaller and regional banks. (cfodive.com)

• New York Community Bancorp/Flagstar—an archetypal regional CRE lender—suffered exactly the kind of portfolio impairment and stress Sacks described: a surprise $252m quarterly loss driven largely by multifamily and office loans, a $552m jump in provisions and $185m in charge‑offs, and a 38% one‑day stock plunge that took shares to multi‑decade lows and sparked fears of failure, followed by management turnover and an emergency capital raise. (commercialobserver.com)

• Other CRE‑heavy regionals also showed material impairment and strain. Valley National and Flagstar/NYCB recorded large CRE charge‑offs and provisions; for example, Flagstar logged about $388m of office‑loan charge‑offs and a roughly $930m net loss over the first nine months of 2024, while Valley’s credit charges on CRE cut full‑year 2024 net income by almost a quarter and forced it to shrink CRE exposure and bolster reserves, keeping its stock at a deep discount. (ft.com) KeyCorp’s CRE non‑performing loan rate more than doubled to 5.1%, another sign of significant office‑loan stress at a regional lender. (credaily.com)

• Rating agencies explicitly flagged a broad cohort of regional banks as vulnerable because of CRE (especially office) portfolios. S&P and Moody’s placed multiple CRE‑heavy regionals—Old National, WaFd, Peapack‑Gladstone, F.N.B., First Merchants, Fulton Financial, Valley National and others—on negative outlook or downgrade review, citing high CRE concentrations and the risk that falling property values and refinancing difficulties would impair capital and earnings. In total, about 9 U.S. banks, or ~18% of those S&P rates, carried a negative outlook largely due to CRE exposure, which is a non‑trivial share. (investopedia.com)

• Market performance and analyst commentary show sector‑wide pressure tied to these CRE issues. The KBW Regional Banking Index has persistently lagged large‑bank and broad equity indices, with analysts and Moody’s pointing to regional banks’ outsized share of CRE lending and naming 27 regional/community banks whose CRE loans exceeded 2x tangible common equity; the article highlights Flagstar and Valley National as examples where CRE‑related credit charges drove major profit declines and stock underperformance. (ft.com)

• At the same time, regulators and data indicate the stress has been significant but not systemic: most banks’ overall credit metrics and capital remain solid, Q2 2024 provisions came in below expectations for many institutions, and by late 2025 Reuters characterizes regional banks as having largely “weathered” the CRE storm even as office loans remain a pronounced weak point and delinquency rates in office hit record levels. (spglobal.com)

Putting this together, office‑backed CRE downturn did lead to sizeable write‑downs, elevated reserves, ratings pressure, stock underperformance, and in at least one case near‑solvency concerns at multiple regional banks with concentrated exposure. That fits Sacks’ prediction of significant impairments and financial stress across a meaningful slice of the regional‑bank sector exposed to office loans, even though the impact stopped short of a full‑blown systemic crisis.

economymarkets
From 2024 onward, the US commercial and multifamily real estate sector will experience a prolonged, staggered ‘rolling’ crisis, with waves of distress and defaults occurring over multiple years as loans and leases sequentially come up for refinancing, rather than a single short, acute crisis concentrated in one year.
that's why there's a rolling crisis in real estate is because the debt rolls over time. It's not like everybody hits the wall and has to refinance at the same time.View on YouTube
Explanation

Sacks’ claim was about the shape of the US commercial and multifamily real estate downturn: not a single, short, 2008‑style crash centered in one year, but a drawn‑out, rolling crisis as loans and leases come due and have to be refinanced at higher rates.

Evidence since 2024 strongly matches that description:

  • Data on loan maturities show a large multi‑year “maturity wall” rather than a one‑year cliff. S&P Global estimated about $950B of CRE mortgages maturing in 2024, rising to nearly $1T in 2025 and peaking around $1.26T in 2027, explicitly noting the refinancing challenge will not be resolved quickly. (spglobal.com) The St. Louis Fed similarly highlighted that roughly $1.7T (≈30% of CRE debt) matures in 2024–2026, emphasizing refinancing/repricing risk over those years rather than in a single period. (stlouisfed.org)
  • Other institutional analyses reinforce this staggered, multi‑year timeline. Franklin Templeton, drawing on Trepp data, cites about $1.2T in CRE loans maturing in 2024–25 and another ~$1.8T in 2026–28, again framing it as a wall of maturities extending across several years. (franklintempleton.com) Industry and credit‑data firms (e.g., CRED iQ / Commercial Observer, GlobeSt) likewise show large securitized CRE maturities in 2025–26 and another wave in 2029, not a single‑year lump. (globest.com)
  • Banks and regulators describe the problem as prolonged pressure rather than a short shock. The Financial Times reported U.S. banks facing about $2T of maturing property debt over three years, with office and multifamily singled out as especially troubled. (ft.com) Reuters’ review of regional banks in late 2024 stresses that CRE challenges from deteriorating office loans and refinancing risks are expected to “persist for years” as large volumes of debt roll over. (reuters.com)
  • Distress has indeed been emerging in waves tied to maturities, not as a one‑off collapse. CMBS data show CRE delinquency rates rising through 2024–25, with multifamily delinquencies in particular jumping several percentage points year‑over‑year and office remaining the most stressed. (reddit.com) At the same time, lenders have repeatedly extended loans and pushed maturities out, which GlobeSt notes has literally moved the “maturity wall” from an expected peak in 2024 toward 2026—turning what could have been a single acute event into a drawn‑out refinancing squeeze. (globest.com)

By late 2025, we clearly see (1) a large block of CRE and multifamily debt coming due over multiple years, (2) refinancing at higher rates producing ongoing waves of distress and defaults, and (3) no single short, concentrated crisis year. That pattern is exactly what Sacks described as a “rolling crisis” driven by staggered debt rollovers, so his prediction is best judged right.

governmenteconomymarkets
In response to mounting losses in US commercial real estate (particularly office) that threaten creditors and investors, the US Treasury under Janet Yellen (or equivalent federal authorities if she is no longer in office) will ultimately implement a bailout or support mechanism that protects the main creditors/investors exposed to these real‑estate debts, while avoiding an explicit direct bailout of the banks themselves.
Yeah, I mean Janet Yellen's just going to bail these folks out. I mean, she won't bail out the banks themselves, but she'll bail out the creditors. Obviously the people holding the bag, they'll get bailed.View on YouTube
Explanation

Available evidence through November 30, 2025 shows no U.S. Treasury–led bailout or dedicated support mechanism aimed at protecting the main creditors/investors in troubled U.S. commercial real estate debt, as Jason predicted.

Key points:

  • Janet Yellen and the Treasury/FSOC have repeatedly acknowledged commercial real estate (especially office) stress and losses, but framed it as manageable and not a systemic banking risk, focusing on supervisory monitoring and bank-by-bank risk management rather than new bailout facilities. (ktvz.com)
  • A 2024 House Oversight hearing on the health of commercial real estate markets explicitly records industry representatives saying that the CRE industry is “not here seeking a bailout of any sort” and that “there is no bailout”, with an emphasis on regulatory flexibility and private‑sector adjustments instead. (congress.gov)
  • Global and U.S. regulators (FSB, FSOC) have highlighted vulnerabilities in CRE and rising delinquencies, but their responses focus on enhanced oversight, data, and supervisory guidance, not on Treasury or Fed programs that make CRE creditors whole. (crefc.org)
  • By mid‑2025, office CMBS delinquencies had surpassed their 2008 peak, with commentary explicitly noting that “there’s no bailout coming this time” for office/CRE investors, indicating that creditors are bearing losses rather than being rescued by a federal backstop. (thebsideway.com)
  • No credible reporting or official documentation indicates the creation of any Treasury (or broader federal) program in 2024–2025 analogous to TARP/PPIP that targets commercial real estate loans or CMBS for the purpose of bailing out landlords, bondholders, or other CRE creditors; existing facilities like the Bank Term Funding Program were bank‑liquidity tools, created in 2023 and not designed as CRE‑creditor bailouts. (en.wikipedia.org)

Because the core predicted event—a Yellen‑era Treasury bailout/support mechanism specifically protecting CRE creditors/investors while avoiding an explicit bank bailout—has not occurred and current policy/public statements point in the opposite direction (no such bailout), the prediction is best classified as wrong as of November 30, 2025.