Chamath @ 00:41:52Right
aiventure
Within the coming few years, operators of websites or apps that accumulate unique, high‑quality datasets will commonly be able to license that data to AI model providers as an incremental revenue stream.
So if you're an entrepreneur building a website or building an app that has really unique training data or really unique data, you'll be able to license and sell that. And that'll be an incremental revenue stream to everything you do in the near future.View on YouTube
Explanation
Evidence by late 2025 shows that licensing proprietary datasets to AI model providers has become a standard, incremental revenue stream for many operators of websites and apps with unique data, matching Chamath’s prediction.
- Reddit explicitly frames AI data licensing as a new business line: its IPO filings describe aggregate contracts of about $203M for licensing user-generated content to AI firms (including Google and later OpenAI), with at least $66.4M recognized as 2024 revenue. Coverage emphasizes that these data deals are a distinct, fast‑growing revenue stream alongside advertising, not a one‑off windfall. (techcrunch.com)
- News publishers and media platforms (e.g., News Corp, Le Monde, Dotdash Meredith, Future, Axel Springer, Financial Times, Prisa Media, The Atlantic) have all signed multi‑year content licensing agreements with OpenAI and others, with reporting explicitly describing these as recurring revenue or “new revenue streams” built on their proprietary archives. The News Corp–OpenAI deal alone is reported to be worth over $250M across five years. (apnews.com)
- Broader cross‑industry trend: Surveys of 2024–2025 “proprietary data licensing” deals show a growing market where many data‑rich operators monetize specialized datasets—Reddit (social discussion), Shutterstock (images), Google–Stack Overflow (programming Q&A), Apple–Shutterstock (images), Tempus (clinical/genomic data), Meta–Reuters (news), and others—specifically for AI training and model improvement. These are presented as structured, repeatable commercial arrangements, not experimental pilots. (datafaire.ai)
- Smaller operators and entrepreneurs are participating. Market reports and Reuters coverage describe companies like Defined.ai aggregating niche, high‑signal datasets (e.g., medical imagery, crime‑scene photos, specialized audio) from individual entrepreneurs and content owners, then licensing them to major AI developers. The revenue is shared with those data owners, demonstrating that even relatively small operators with unique data can now sell into AI‑training pipelines. (investing.com)
- Regulatory and legal disputes reinforce that licensing is now the expected path. Lawsuits by Reddit against Anthropic and Perplexity explicitly contrast alleged unauthorized scraping with Reddit’s licensed deals with Google and OpenAI, underlining that selling access to the dataset is a normal, monetized option that counterparties are expected to pay for. (investopedia.com)
Taken together, by November 30, 2025 (less than two years after the March 2024 episode), there is a broad and growing ecosystem where operators of sites and apps with unique, high‑quality data routinely license that data to AI model providers as a distinct revenue line. While not every entrepreneur will succeed in doing so, it is clearly and commonly possible—and already happening at scale—so the core economic prediction is best judged as right.