I think the open internet has enough data that there isn't going to be a monopoly on information by someone spending money for content from third parties. I think that there's enough in the open internet to give our all. Give us all kind of, you know, the security that we're not going to be monopolized away into some disinformation age.View on YouTube
As of November 30, 2025, no single company has achieved anything close to a monopoly on AI training data or on informational control via paid content licensing.
On the data supply side, there is an active, multi‑firm race to license training data: OpenAI, Google, Meta, Microsoft, Apple, Amazon and others all buy datasets from publishers, stock-image libraries, and brokers, rather than one firm dominating the market. Reuters describes a "generative data gold rush" in which all of these companies are licensing archives and specialized content, not just a single player. (reuters.com) OpenAI has signed numerous media deals (Associated Press, Axel Springer, News Corp, Future, Vox, The Atlantic, Axios, etc.), with at least one major deal (News Corp) reported as exclusive for that publisher’s journalism, but most others are explicitly non‑exclusive and confined to particular archives. (apnews.com) This gives OpenAI advantages, but not a monopoly over all high‑value human text.
Crucially, the open internet and open corpora remain very large and widely used. Common Crawl alone provides hundreds of billions of web pages as a free, open corpus used in many LLMs. (commoncrawl.org) The Allen Institute’s Dolma dataset (≈3T tokens) is fully open and powers the OLMo family of models, with both data and code released. (allenai.org) Harvard’s public‑domain books corpus and open datasets like The Pile further expand freely available high‑quality text. (wired.com) Competitive open‑ or semi‑open models such as Meta’s Llama 3, DBRX, and others, trained heavily on public data, demonstrate that state‑of‑the‑art systems can still be built without relying on one firm’s licensed archives. (techcrunch.com) Regulators and researchers do worry that large firms could use data and partnership "moats" to entrench power, and that crawler blocking by many news outlets (especially high‑factual ones) may skew future training data. (time.com) But these are warnings about potential concentration, not findings that a single actor already controls information.
Because Friedberg’s claim is forward‑looking ("there isn’t going to be a monopoly" and we won’t be "monopolized away into some disinformation age"), it cannot be definitively verified or falsified after only ~21 months. What can be said is that, so far, the world looks more like his scenario than its opposite: multiple powerful firms, robust open data sources, and no single company with effective monopoly control over training data or information. However, future legal, economic, or regulatory shifts could still change this, so the prediction’s ultimate truth value remains inconclusive (too early) rather than clearly right or wrong.