Authors Sue Adobe Over AI Training With Pirated Books

Adobe Faces Lawsuit Over Alleged Use of Pirated Books in AI Training

Tech giant Adobe finds itself embroiled in controversy as Oregon author Elizabeth Lyon files a class-action lawsuit accusing the company of using illegally obtained books to train its SlimLM artificial intelligence model. The case shines new light on the ongoing battle between content creators and tech companies over copyright protections in the AI era.

The Core Allegations

Lyon, who writes nonfiction writing guides, claims Adobe incorporated her pirated works into SlimLM's training data without permission or compensation. Court documents allege Adobe relied on problematic datasets that trace back to Books3 - a collection containing approximately 191,000 copyrighted books allegedly scraped from pirate sites.

The complaint specifically targets SlimPajama-627B, the dataset Adobe acknowledges using for SlimLM's development. While publicly available, Lyon's legal team argues SlimPajama inherits copyright issues from its predecessor RedPajama, which directly incorporated Books3 content.

"Adobe took shortcuts," says Lyon's attorney Mark Rifkin. "They built commercial products using stolen creative work while bypassing proper licensing channels."

Industry-Wide Implications

This lawsuit doesn't exist in isolation. Several major tech players now face similar legal challenges:

Apple confronted allegations last September regarding its Apple Intelligence system
Anthropic settled a $1.5 billion case with authors just last month
Salesforce received complaints in October about its AI training practices

The pattern suggests an industry-wide reckoning may be coming regarding how AI companies source their training materials.

Why This Case Matters

The outcome could reshape how tech firms approach AI development moving forward. Currently, many rely on massive datasets scraped from various online sources with questionable copyright status. A ruling against Adobe might force companies to:

Implement stricter vetting processes for training data
Develop new methods to compensate content creators
Potentially limit what materials they can legally use

The stakes extend beyond financial penalties - it's fundamentally about determining fair compensation models for creative work fueling today's AI revolution.

The timing couldn't be more critical as generative AI becomes increasingly dependent on vast quantities of text data.

Key Points:

Adobe faces class-action lawsuit over alleged use of pirated books in SlimLM training
Case centers on controversial Books3 dataset containing ~191K copyrighted works
Similar lawsuits emerging against Apple, Anthropic and Salesforce
Outcome could redefine copyright standards for AI training materials
Potential billion-dollar implications for tech industry practices

Authors Sue Adobe Over AI Training With Pirated Books

Adobe Faces Lawsuit Over Alleged Use of Pirated Books in AI Training

The Core Allegations

Industry-Wide Implications

Why This Case Matters

Key Points:

Enjoyed this article?

Related Articles

Disney Flexes Copyright Muscle as Google Bans AI-Generated Characters

Tencent Cloud Shifts AI Pricing Strategy: Free Trials End as Costs Rise

OpenAI Snags Coveted GPT.com Domain in Strategic Brand Move

Amazon Bets Big on AI Content Licensing Amid Copyright Storm

Reddit Bets Big on AI Search to Revolutionize Online Q&A

Design Startup Flora Lands $42M Boost to Revolutionize Creative Workflows

Popular Articles

TSMC Reports Record Revenue, AI Growth Fuels Optimism for 2025

Nano Banana: AI Image Editor

ByteDance Unveils Trae: A New AI IDE for Chinese Developers

ChatGPT Atlas - AI-Powered Browser

Claude AI Assistant Launches on Slack to Boost Team Productivity

Main Pages

Content

Others