Judge Rejects OpenAI Discovery Request in NYT Copyright Lawsuit

OpenAI is facing mounting legal challenges over its use of copyrighted content to train ChatGPT, with a federal judge recently rejecting the company’s broad discovery requests in its high-profile lawsuit with The New York Times. The Times sued OpenAI in December 2023, alleging the AI company used its articles without permission to train its generative AI models.

As the case enters the discovery phase, Judge Ona T. Wang denied OpenAI’s request to access information about how the Times uses generative AI tools, including the outlet’s own AI development efforts and its views on the technology. The judge deemed these requests irrelevant to the copyright infringement claims at hand.

In her ruling, Judge Wang offered a compelling analogy: if a copyright holder sued a video game manufacturer for infringement, the manufacturer wouldn’t be entitled to “wide-ranging discovery concerning the copyright holder’s employees’ gaming history, statements about video games generally, or even their licensing of different content to other video game manufacturers.” This comparison effectively illustrates why OpenAI’s requests exceeded the scope of relevant discovery.

The case has already seen complications, with legal filings revealing that OpenAI engineers accidentally deleted evidence that Times lawyers had gathered from the company’s servers. The Times’ legal team spent over 150 hours searching through OpenAI’s training data for instances of infringement, storing their findings on virtual machines created by OpenAI. While most of the data has been recovered, the incident raised questions about evidence preservation, though Times lawyers stated there’s no reason to believe the deletion was intentional.

This lawsuit is part of a broader wave of copyright litigation against OpenAI. The company faces dozens of similar cases from media organizations including the New York Daily News, the Denver Post, and The Intercept. However, not all cases have proceeded—a federal judge recently dismissed lawsuits from Raw Story and AlterNet because the outlets failed to demonstrate “concrete” harm from OpenAI’s actions.

OpenAI also faces legal action from authors, including comedian Sarah Silverman, who filed a complaint in 2023 alongside over a dozen other writers. They allege OpenAI illegally used their books to train ChatGPT “without consent, without credit, and without compensation.” Silverman acknowledged the challenge ahead, noting that OpenAI represents “the richest entities in the world” with significant policy influence.

Meanwhile, some media organizations have chosen partnership over litigation. Axel Springer, Business Insider’s parent company, and other publishers have licensed their content to OpenAI in deals worth tens of millions of dollars, highlighting the industry’s divided approach to AI training data.

Key Quotes

If a copyright holder sued a video game manufacturer for copyright infringement, the copyright holder might be required to produce documents relating to their interactions with that video game manufacturer, but the video game manufacturer would not be entitled to wide-ranging discovery concerning the copyright holder’s employees’ gaming history, statements about video games generally, or even their licensing of different content to other video game Manufacturers.

Judge Ona T. Wang used this analogy in her ruling to explain why she rejected OpenAI’s broad discovery requests about the New York Times’ use of AI technology, establishing that such information is irrelevant to the copyright infringement claims.

Much of the material in OpenAI’s training datasets, however, comes from copyrighted works — including books written by Plaintiffs — that were copied by OpenAI without consent, without credit, and without compensation.

This statement from the complaint filed by Sarah Silverman and other authors in 2023 encapsulates the core allegation against OpenAI—that the company systematically used copyrighted material to train ChatGPT without proper authorization or compensation to creators.

They are the richest entities in the world, and we live in a country where that’s considered a person that can influence, practically create policy, let alone influence it.

Comedian Sarah Silverman made this observation during a podcast with Rob Lowe, highlighting the power imbalance between individual creators and well-funded AI companies, and acknowledging the difficulty of taking on OpenAI in court given its substantial resources and influence.

Our Take

The judge’s video game analogy is particularly instructive, revealing how courts are approaching AI copyright cases through familiar legal frameworks rather than treating them as entirely novel. This suggests the legal system may not grant AI companies special exemptions from established copyright principles simply because their technology is innovative.

The accidental deletion of evidence, while reportedly unintentional, underscores the challenges of digital discovery in AI cases where massive datasets are involved. It also highlights potential vulnerabilities in how AI companies manage and preserve evidence.

Most significantly, the split between publishers who sue and those who partner reveals an industry at a crossroads. Those licensing content are betting on collaboration and guaranteed revenue, while litigants are fighting for broader principles about creator rights. The ultimate resolution of these cases will likely determine whether AI development proceeds through licensing partnerships or continues relying on broad interpretations of fair use.

Why This Matters

This legal battle represents a pivotal moment for the AI industry as courts begin defining the boundaries of fair use in the age of generative AI. The outcome will likely establish precedent for how AI companies can legally acquire training data and whether using copyrighted content without permission constitutes infringement.

The judge’s narrow interpretation of discovery requests suggests courts may limit how AI companies can use litigation to gather competitive intelligence about their opponents’ AI strategies. This could prevent companies from using copyright lawsuits as fishing expeditions into competitors’ technology development.

The stakes extend far beyond OpenAI and the Times. If courts rule that training AI models on copyrighted content requires licensing, it could fundamentally reshape the economics of AI development, potentially favoring well-funded companies that can afford extensive licensing deals while creating barriers for smaller competitors. Conversely, a ruling favoring OpenAI could accelerate AI development but potentially undermine content creators’ rights and revenue streams. The divergent approaches—some publishers suing while others partner—reflect the industry’s uncertainty about how to monetize content in the AI era.

Judge Rejects OpenAI Discovery Request in NYT Copyright Lawsuit

Key Quotes

Our Take

Why This Matters

Recommended Reading

Recommended Reading

Artificial Intelligence: A Modern Approach (4th Edition)

Deep Learning

Hands-On Machine Learning

Judge Rejects OpenAI Discovery Request in NYT Copyright Lawsuit

Key Quotes

Our Take

Why This Matters

Recommended Reading

Recommended Reading

Artificial Intelligence: A Modern Approach (4th Edition)

Deep Learning

Hands-On Machine Learning

Related Stories