OpenAI Fights to Keep Ilya Sutskever's Files Secret in Lawsuit

OpenAI is engaged in a contentious legal battle to prevent discovery of files belonging to six current and former employees, including co-founder and former chief scientist Ilya Sutskever, in a high-profile copyright lawsuit brought by the Author’s Guild and prominent authors.

In a heavily redacted letter filed Thursday in New York federal court, lawyers for the Author’s Guild revealed that OpenAI has objected to designating six individuals as “custodians” in the lawsuit—a legal term referring to people who possess relevant evidence that must be turned over during discovery. While both sides have agreed on 24 custodians after multiple meetings, they’ve reached an impasse over these six names.

The disputed custodians include Ilya Sutskever, who played a pivotal role in the dramatic November 2023 firing of CEO Sam Altman before leaving OpenAI to launch a competing AI model company; Qiming Yuan, the pretraining data lead; technical staff members Jong Wook Kim and Shantanu Jain; former research scientist Cullen O’Keefe; and former science communicator Andrew Mayne, who has publicly written about the importance of books as training data for large language models.

The Author’s Guild lawyers argue they believe Sutskever possesses documents relevant to the lawsuit and are now asking the judge to compel OpenAI to include all six individuals in the discovery process. OpenAI has not publicly explained why it opposes including these employees, particularly Sutskever, whose involvement in the company’s early AI development and training processes could prove crucial to the case.

All six exhibits attached to the legal letter have been fully redacted at OpenAI’s request. The company justified these redactions by claiming the exhibits contain proprietary source code and “discussions between OpenAI employees describing detailed processes for training and testing ChatGPT models.”

This lawsuit is part of a growing wave of copyright litigation against OpenAI, with plaintiffs including bestselling authors George R.R. Martin (Game of Thrones), Jonathan Franzen, and David Baldacci. The central allegation is that OpenAI violated copyright law by using millions of books without permission to train its AI models, including ChatGPT. The outcome of these cases could have far-reaching implications for the AI industry’s approach to training data and intellectual property rights.

Key Quotes

discussions between OpenAI employees describing detailed processes for training and testing ChatGPT models

OpenAI cited this as justification for redacting all six exhibits in the court filing, suggesting the company is particularly concerned about revealing internal processes for how it trained its AI models using potentially copyrighted material.

Mayne has written publicly about the importance of books as training data for LLMs

This statement from the court documents highlights why former science communicator Andrew Mayne is a key custodian—his public statements acknowledge the critical role of books in training large language models, potentially supporting the authors’ copyright claims.

Our Take

OpenAI’s fierce resistance to including Ilya Sutskever in discovery is telling. As a co-founder and former chief scientist who left under dramatic circumstances, Sutskever likely possesses communications that could illuminate OpenAI’s early decisions about training data—decisions that may not have fully considered copyright implications. The company’s willingness to fight this battle in court, rather than settle quietly, suggests either strong confidence in its legal position or deep concern about what discovery might reveal. This case exemplifies the AI industry’s reckoning with intellectual property: companies moved fast to build transformative technology, but may now face consequences for how they sourced the data that made it possible. The irony is stark—OpenAI advocates for AI transparency and safety publicly while fighting to keep its own practices secret in court.

Why This Matters

This legal battle represents a critical test case for the AI industry’s relationship with copyrighted content. The fight over Ilya Sutskever’s files is particularly significant because he was instrumental in OpenAI’s early development and would likely possess internal communications and documents revealing how the company approached training data acquisition and usage.

The case highlights the growing tension between AI innovation and intellectual property rights, with authors arguing that their creative works were used without permission or compensation to build billion-dollar AI systems. OpenAI’s aggressive stance in blocking access to these employees’ files suggests the company may be concerned about what internal documents could reveal about its training data practices.

The outcome could set precedents affecting the entire generative AI industry, potentially forcing companies to change how they source training data, negotiate licensing agreements with content creators, or face substantial liability. With multiple similar lawsuits pending against AI companies, this case could determine whether the current AI boom continues unimpeded or faces significant legal and financial constraints that reshape the industry’s business model.

OpenAI Fights to Keep Ilya Sutskever's Files Secret in Lawsuit

Key Quotes

Our Take

Why This Matters

Recommended Reading

Recommended Reading

Artificial Intelligence: A Modern Approach (4th Edition)

Deep Learning

Hands-On Machine Learning

OpenAI Fights to Keep Ilya Sutskever's Files Secret in Lawsuit

Key Quotes

Our Take

Why This Matters

Recommended Reading

Recommended Reading

Artificial Intelligence: A Modern Approach (4th Edition)

Deep Learning

Hands-On Machine Learning

Related Stories