Meta has been using public Facebook and Instagram posts to train its AI models, dating back as far as 2007, according to revelations from the company’s global privacy director, Melinda Claybaugh, during an inquiry in Australia. This extensive data harvesting includes public posts, photos, and comments from adult users across both platforms, raising significant privacy concerns about how social media content is being repurposed for artificial intelligence development.
The scope of Meta’s AI training data collection is substantial. Any content posted with a “public” audience setting has potentially been ingested into Meta’s AI systems to improve their capabilities. This includes everything from old mirror selfies to status updates, comments, and shared photos spanning nearly two decades of social media history.
Regional disparities in user rights have emerged as a critical issue. While European Union users can opt out of having their data used for AI training purposes, no such option exists for users in the United States or Australia. This creates a two-tiered system where privacy protections depend on geographic location rather than universal user rights.
Meta’s approach to data usage follows specific guidelines. According to the company’s privacy center, they use “public posts and comments on Facebook and Instagram to train generative AI models for these features and for the open-source community.” The company explicitly states it doesn’t use posts with audience settings other than “public” for AI training purposes. Additionally, Meta claims it doesn’t train on private posts or direct messages, though the company does acknowledge using data from AI-related interactions, such as searches for AI stickers, questions asked to Meta AI, and conversations with AI characters.
Meta’s Chief Product Officer Chris Cox has previously defended this approach, explaining at Bloomberg’s Tech Summit that the company’s text-to-image model, Emu, produces “really amazing quality images” specifically because Instagram contains numerous photos of “art, fashion, culture and also just images of people and us.” He emphasized that Meta doesn’t train on private content shared only with friends.
Beyond user-generated content, Meta is exploring additional data sources. The company is reportedly considering deals with news publishers for access to more training data, including news articles, photos, and video content. This strategy mirrors moves by competitors like Google and OpenAI, which have already secured deals with numerous news publishers for AI training data. Users concerned about their content being used for AI training should ensure their audience settings are not set to “public” for future posts.
Key Quotes
We use public posts and comments on Facebook and Instagram to train generative AI models for these features and for the open-source community. We don’t use posts or comments with an audience other than Public for these purposes.
This statement from Meta’s privacy center defines the company’s official policy on AI training data. It establishes the boundary between what Meta considers fair game for AI training (public posts) versus protected content (non-public posts), though critics argue users never consented to this use when originally posting.
We don’t train on private stuff, we don’t train on stuff that people share with their friends, we do train on things that are public.
Meta’s Chief Product Officer Chris Cox made this statement at Bloomberg’s Tech Summit, attempting to reassure users about the company’s data practices. However, the definition of ‘public’ content encompasses billions of posts users may have shared without anticipating they would become AI training material.
We may use the data from your use of AI stickers, such as your searches for a sticker to use in a chat, to improve our AI sticker models.
This admission from Meta’s September 2023 blog post reveals that even interactions with AI features themselves become training data. This creates a feedback loop where using Meta’s AI tools generates data to improve those same tools, raising questions about informed consent.
Our Take
Meta’s admission reveals the uncomfortable reality that social media platforms are sitting on goldmines of training data they’re now exploiting for AI development. What’s particularly concerning is the retroactive nature of this data usage—content posted years ago under different expectations is now fueling AI systems users never knew would exist.
The geographic inequality in opt-out rights is especially troubling and likely unsustainable. As AI regulation evolves globally, Meta may face pressure to extend EU-style protections worldwide or risk fragmented compliance nightmares and user backlash.
This situation also highlights a broader industry pattern: AI companies are scrambling for training data, leading to increasingly aggressive harvesting practices. Meta’s consideration of news publisher deals suggests publicly available user content alone may not suffice for competitive AI development. The AI industry’s insatiable appetite for data will continue testing the boundaries of privacy, consent, and fair use in ways regulators are only beginning to address.
Why This Matters
This revelation about Meta’s AI training practices represents a watershed moment for digital privacy and AI development. The fact that nearly two decades of public social media content has been repurposed for AI training without explicit user consent highlights the evolving relationship between social media platforms and artificial intelligence companies.
The geographic disparity in opt-out options exposes fundamental questions about data rights and regulatory frameworks. EU users enjoy protections under GDPR that American and Australian users lack, demonstrating how regulation directly impacts individual privacy rights in the AI era. This could accelerate calls for comprehensive AI and data privacy legislation in countries without such protections.
For the broader AI industry, Meta’s approach signals how valuable user-generated content has become as training data. The company’s willingness to mine 17 years of social media posts underscores the competitive pressure AI companies face to access diverse, high-quality datasets. This trend will likely intensify as AI models become more sophisticated and data-hungry, potentially transforming social media platforms from communication tools into massive AI training data repositories. The implications extend beyond privacy to questions about content ownership, fair compensation, and the ethical boundaries of AI development.
Recommended Reading
For those interested in learning more about artificial intelligence, machine learning, and effective AI communication, here are some excellent resources:
Recommended Reading
Related Stories
- Photobucket is licensing your photos and images to train AI without your consent, and there’s no easy way to opt out
- Meta’s AI advisory council is overwhelmingly white and male, raising concerns about bias
- Outlook Uncertain as US Government Pivots to Full AI Regulations
- Jenna Ortega Speaks Out Against Explicit AI-Generated Images of Her
Source: https://www.businessinsider.com/facebook-instagram-posts-meta-ai-training-how-to-opt-out-2024-9