OpenAI is engaged in a contentious legal battle to prevent discovery of files belonging to six current and former employees, including co-founder and former chief scientist Ilya Sutskever, in a high-profile copyright lawsuit brought by the Author’s Guild and prominent authors.
In a heavily redacted letter filed Thursday in New York federal court, lawyers for the Author’s Guild revealed that OpenAI has objected to designating six individuals as “custodians” in the lawsuit—a legal term referring to people who possess relevant evidence that must be turned over during discovery. While both sides have agreed on 24 custodians after multiple meetings, they’ve reached an impasse over these six names.
The disputed custodians include Ilya Sutskever, who played a pivotal role in the dramatic November 2023 firing of CEO Sam Altman before leaving OpenAI to launch a competing AI model company; Qiming Yuan, the pretraining data lead; technical staff members Jong Wook Kim and Shantanu Jain; former research scientist Cullen O’Keefe; and former science communicator Andrew Mayne, who has publicly written about the importance of books as training data for large language models.
The Author’s Guild lawyers argue they believe Sutskever possesses documents relevant to the lawsuit and are now asking the judge to compel OpenAI to include all six individuals in the discovery process. OpenAI has not publicly explained why it opposes including these employees, particularly Sutskever, whose involvement in the company’s early AI development and training processes could prove crucial to the case.
All six exhibits attached to the legal letter have been fully redacted at OpenAI’s request. The company justified these redactions by claiming the exhibits contain proprietary source code and “discussions between OpenAI employees describing detailed processes for training and testing ChatGPT models.”
This lawsuit is part of a growing wave of copyright litigation against OpenAI, with plaintiffs including bestselling authors George R.R. Martin (Game of Thrones), Jonathan Franzen, and David Baldacci. The central allegation is that OpenAI violated copyright law by using millions of books without permission to train its AI models, including ChatGPT. The outcome of these cases could have far-reaching implications for the AI industry’s approach to training data and intellectual property rights.
Key Quotes
discussions between OpenAI employees describing detailed processes for training and testing ChatGPT models
OpenAI cited this as justification for redacting all six exhibits in the court filing, suggesting the company is particularly concerned about revealing internal processes for how it trained its AI models using potentially copyrighted material.
Mayne has written publicly about the importance of books as training data for LLMs
This statement from the court documents highlights why former science communicator Andrew Mayne is a key custodian—his public statements acknowledge the critical role of books in training large language models, potentially supporting the authors’ copyright claims.
Our Take
OpenAI’s fierce resistance to including Ilya Sutskever in discovery is telling. As a co-founder and former chief scientist who left under dramatic circumstances, Sutskever likely possesses communications that could illuminate OpenAI’s early decisions about training data—decisions that may not have fully considered copyright implications. The company’s willingness to fight this battle in court, rather than settle quietly, suggests either strong confidence in its legal position or deep concern about what discovery might reveal. This case exemplifies the AI industry’s reckoning with intellectual property: companies moved fast to build transformative technology, but may now face consequences for how they sourced the data that made it possible. The irony is stark—OpenAI advocates for AI transparency and safety publicly while fighting to keep its own practices secret in court.
Why This Matters
This legal battle represents a critical test case for the AI industry’s relationship with copyrighted content. The fight over Ilya Sutskever’s files is particularly significant because he was instrumental in OpenAI’s early development and would likely possess internal communications and documents revealing how the company approached training data acquisition and usage.
The case highlights the growing tension between AI innovation and intellectual property rights, with authors arguing that their creative works were used without permission or compensation to build billion-dollar AI systems. OpenAI’s aggressive stance in blocking access to these employees’ files suggests the company may be concerned about what internal documents could reveal about its training data practices.
The outcome could set precedents affecting the entire generative AI industry, potentially forcing companies to change how they source training data, negotiate licensing agreements with content creators, or face substantial liability. With multiple similar lawsuits pending against AI companies, this case could determine whether the current AI boom continues unimpeded or faces significant legal and financial constraints that reshape the industry’s business model.
Recommended Reading
For those interested in learning more about artificial intelligence, machine learning, and effective AI communication, here are some excellent resources:
Recommended Reading
Related Stories
- Elon Musk Drops Lawsuit Against ChatGPT Maker OpenAI, No Explanation
- OpenAI CEO Sam Altman Hints at Potential Restructuring in 2024
- OpenAI’s Valuation Soars as AI Race Heats Up
- Sam Altman’s Bold AI Predictions: AGI, Jobs, and the Future by 2025
- Elon Musk Warns of Potential Apple Ban on OpenAI’s ChatGPT