Reddit Powers AI Training: Google, OpenAI Use Posts for LLMs

Reddit has emerged as a critical data source for training artificial intelligence models, with the social media platform striking lucrative deals with major tech companies to license its vast repository of user-generated content. In February 2024, Reddit signed a $60 million annual licensing agreement with Google to allow the search giant to train its AI systems using Reddit posts and comments. This was followed by another major content data-sharing deal with OpenAI in May 2024, giving the ChatGPT-maker access to Reddit’s conversational data.

Reddit COO Jen Wong emphasized the company’s pivotal role in AI development at the CES technology conference, stating that Reddit is now “foundational to the training” of large language models (LLMs). The platform’s appeal lies in its unique collection of authentic, conversational content covering virtually every topic imaginable. CEO Steve Huffman explained that Reddit posts contain a wealth of “colloquial words about pretty much every topic” that are constantly updated, making them invaluable for teaching AI systems to think and communicate like humans.

The company, which went public in February 2024, has been aggressively investing in AI across multiple fronts. Beyond licensing its data, Reddit has developed its own AI-powered features, including a translation tool and an AI-enhanced search function. Wong reported that the AI translation feature, expanded to over 35 countries in September including Brazil, Spain, Germany, Italy, the Philippines, and Latin American nations, has driven “accelerated rate” growth outside the United States.

Reddit’s AI ambitions extend to its advertising platform as well. In August 2024, the company acquired Memorable AI, an AI-based advertisement company, to enhance its advertising capabilities. Wong indicated that Reddit plans to leverage AI to create more “creative variants” and make advertisements more authentic to Reddit’s unique culture—what she described as making things “more Reddity.”

Huffman revealed during The Wall Street Journal’s Tech Live event in October that Reddit is in discussions with “just about everybody” regarding potential AI partnerships, including Microsoft. This suggests the platform’s data licensing business model could expand significantly. The strategic pivot positions Reddit not just as a social media platform, but as a critical infrastructure provider for the AI industry, monetizing its 18+ years of accumulated human conversations and knowledge.

Key Quotes

AI itself, more broadly, is incredibly important to everything we’re doing

Reddit COO Jen Wong made this statement at CES, emphasizing how central AI has become to Reddit’s overall business strategy, from product development to monetization through data licensing deals.

foundational to the training of large language models

Wong described Reddit’s current role in the AI industry, highlighting how the platform’s conversational data has become essential infrastructure for developing advanced AI systems like ChatGPT and Google’s AI models.

colloquial words about pretty much every topic

CEO Steve Huffman explained why Reddit content is particularly valuable for AI training, noting that the platform’s authentic, constantly-updated discussions teach machines to communicate more naturally and cover an unprecedented breadth of subjects.

It’s made our core product better. People find a home on Reddit.

Wong discussed how Reddit’s own AI translation features have improved user experience and driven international growth, demonstrating that the company is both selling AI training data and using AI to enhance its platform.

Our Take

Reddit’s dual role as both AI data provider and AI product developer represents a sophisticated strategy that could redefine social media economics. The company has essentially monetized its most valuable asset—years of authentic human conversation—while simultaneously using AI to expand its global reach and advertising capabilities. The $60 million annual Google deal alone demonstrates the premium value placed on conversational training data in today’s AI arms race. However, this raises ethical considerations about whether users who contributed content years ago consented to having their words train commercial AI systems. As Reddit positions itself as “foundational” to LLM training, it may become indispensable to AI development, creating a powerful moat around its business. The acquisition of Memorable AI and plans to make ads “more Reddity” suggest the company understands that AI success requires maintaining platform authenticity while scaling globally.

Why This Matters

Reddit’s transformation into a foundational AI training resource represents a significant shift in how social media platforms monetize user-generated content and highlights the insatiable demand for high-quality training data in the AI industry. As large language models require massive amounts of diverse, conversational text to improve their human-like capabilities, platforms like Reddit that host authentic human discussions become increasingly valuable.

This development raises important questions about data ownership, user privacy, and consent. While Reddit users created this content, they may not have anticipated it would be used to train commercial AI systems. The deals also demonstrate how AI is reshaping business models across the tech industry, with data licensing becoming a major revenue stream.

For the broader AI ecosystem, Reddit’s role as training data provider could influence how future AI models understand context, slang, and nuanced human communication. The platform’s multi-billion comment archive spanning diverse topics provides AI systems with real-world conversational patterns that more formal text sources cannot offer. As competition intensifies among AI companies, access to unique, high-quality training data like Reddit’s becomes a critical competitive advantage, potentially worth hundreds of millions of dollars annually.

For those interested in learning more about artificial intelligence, machine learning, and effective AI communication, here are some excellent resources:

Source: https://www.businessinsider.com/reddit-comments-ai-training-models-google-openai-jen-wong-huffman-2025-1