AI Legal Developments

Journalism vs. AI: The New York Times' Landmark Copyright Battle Against Microsoft and OpenAI

The New York Times sues Microsoft and OpenAI over unauthorized AI training on its articles—raising critical questions about copyright, journalism, and the future of information.

Galleo Team8 min read
Journalism vs. AI: The New York Times' Landmark Copyright Battle Against Microsoft and OpenAI

A Publishing Powerhouse Challenges AI Giants

When The New York Times filed its copyright infringement lawsuit against Microsoft and OpenAI in December 2023, it marked a pivotal moment in the ongoing tension between traditional media and technology companies. Unlike other AI copyright cases, this battle pits America's newspaper of record—with its 170-year legacy and distinctive journalistic voice—against two of the world's most powerful technology companies.

The lawsuit, filed in the Southern District of New York, represents the first major news organization to take legal action against AI companies over training data. At stake is not just compensation for past use but the very future of how journalism is valued, consumed, and monetized in an age of artificial intelligence.

Unique Concerns for Journalism

The Times' lawsuit highlights concerns that are particularly acute for news publishers in ways that distinguish it from other AI copyright cases:

Verbatim Reproduction and Competitive Threat

Unlike cases involving fiction where style imitation is the concern, The Times presents evidence that ChatGPT can reproduce its articles nearly verbatim—including distinctive headlines, phrasings, and factual reporting. This creates a direct competitive threat, as the AI can potentially serve as a substitute for The Times' subscription-based content.

Court filings include examples where ChatGPT reproduces entire passages from specific articles, sometimes with minimal prompting. In one instance, the AI reportedly generated text that matched a Times article word-for-word, including unique phrasings and structure that could only have come from that specific piece.

Undermining the Subscription Model

The Times operates on a subscription-based business model that has successfully transitioned the publication into the digital era while many newspapers have failed. The lawsuit argues that AI models trained on its content threaten this model by allowing users to access Times-quality content without paying for subscriptions.

As The Times' legal team stated in court filings: "These tools were built by copying and using millions of The Times's copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more."

Attribution and Misinformation Concerns

The lawsuit also raises unique journalistic concerns about attribution and accuracy. The Times alleges that AI models may present information derived from its reporting without attribution, essentially "laundering" the source of information that required significant journalistic resources to develop.

More troublingly, the AI sometimes presents hallucinated or inaccurate information while attributing it to The Times—a phenomenon described in the lawsuit as "hallucinating content with false attributions." This represents a distinct harm to a news organization whose credibility is its most valuable asset.

Failed Negotiations and Business Relationship

A distinctive aspect of The New York Times case is that it followed failed business negotiations. Unlike individual authors who had no prior relationship with AI companies, The Times engaged in lengthy discussions with OpenAI about potential licensing arrangements before filing suit.

These negotiations reportedly broke down when the parties couldn't agree on appropriate compensation, with The Times arguing that its high-quality journalism, fact-checking processes, and editorial standards provide uniquely valuable training data that commands premium licensing fees.

Microsoft's involvement adds another dimension, as the company has made significant investments in OpenAI and integrated its technology into products that compete directly with news providers for attention and advertising revenue.

First Amendment Dimensions

The case introduces complex First Amendment considerations that aren't present in other AI copyright lawsuits. As a news organization, The Times enjoys constitutional protections for its journalistic work. However, the defendants may argue that restricting AI training data also raises First Amendment concerns about limiting access to information.

This creates a fascinating constitutional tension: does the First Amendment primarily protect the press's ability to profit from its work, or does it protect tech companies' ability to process publicly available information?

Judge Sidney H. Stein, who is presiding over the case, will need to consider these competing constitutional claims alongside traditional copyright analysis.

Industry-Wide Implications

The outcome of this case could reshape the relationship between news publishers and technology companies in several ways:

  1. Licensing frameworks: A victory for The Times could establish mandatory licensing models for news content used in AI training, creating new revenue streams for struggling publishers.

  2. Content paywalls: Publishers might implement stronger technical measures to prevent AI crawling of their content, potentially fragmenting the web further.

  3. Differentiated legal treatment: Courts might establish that factual journalistic content receives different treatment under fair use analysis than creative works like fiction.

  4. Verification partnerships: To address hallucination concerns, AI companies might form partnerships with publishers to verify AI outputs on factual matters.

  5. Media concentration: Smaller publications without resources to litigate might face further disadvantages if only major players can negotiate favorable AI licensing terms.

What Distinguishes The Times' Legal Position

Several factors give The New York Times case unique characteristics compared to other AI copyright lawsuits:

  1. Market recognition: The Times has a clearly established market for licensing its content to other businesses, strengthening its claim that AI training represents a lost licensing opportunity.

  2. Content identification: The ability to clearly demonstrate that specific articles were used in training and can be reproduced provides stronger evidence of both use and harm.

  3. Commercial alternatives: The Times can point to existing licensing relationships with other technology companies, demonstrating that workable commercial alternatives to unauthorized use exist.

  4. Resource advantage: Unlike individual authors, The Times has the legal and financial resources to pursue protracted litigation against well-funded opponents.

What Happens Next

As the case proceeds through the courts, several key developments are anticipated:

  1. Motion to dismiss stage: Microsoft and OpenAI have filed motions to dismiss, arguing that training AI on publicly available articles constitutes fair use. Judge Stein's ruling on these motions will provide the first indication of how the court views these novel legal questions.

  2. Discovery challenges: If the case proceeds to discovery, determining exactly which Times articles were used in training datasets presents technical challenges that could shape how future cases approach evidence gathering.

  3. Potential settlement: Given the high stakes for both sides, settlement remains possible. Any settlement could establish industry norms for AI licensing of news content.

  4. Appeal likelihood: Regardless of outcome, the losing side will almost certainly appeal, potentially bringing these questions to the Second Circuit and eventually the Supreme Court.

Looking Ahead: Journalism in the AI Era

Beyond the legal outcome, this case represents a critical moment in defining how journalism will function in an AI-dominated information landscape. At its core are questions about the economics of quality journalism: Who will pay for the reporting, fact-checking, and editorial oversight that produces reliable information if AI systems can extract and repackage that value without compensation?

For news organizations beyond The Times, the case represents a potential lifeline in an era of declining revenues and newsroom contractions. If successful, it could establish that the value journalists create cannot be appropriated without compensation, potentially creating new revenue streams to support newsgathering operations.

For AI companies, the case highlights the need to develop sustainable relationships with content creators rather than treating their work as freely available training material. The outcome could accelerate efforts to develop alternative training methods or establish fair compensation models.

As Judge Stein considers these complex issues, his decisions will help shape not just copyright law but the future of how information is created, valued, and shared in our society. With dueling motions currently before the court, all parties are awaiting rulings that could provide initial guidance on these unprecedented legal questions.


This article is provided for informational purposes only and does not constitute legal advice. Businesses and individuals should consult with qualified legal counsel regarding their specific circumstances.