Authors Sue Adobe Over AI Training With Pirated Books
Adobe Faces Lawsuit Over Alleged Use of Pirated Books in AI Training
Tech giant Adobe finds itself embroiled in controversy as Oregon author Elizabeth Lyon files a class-action lawsuit accusing the company of using illegally obtained books to train its SlimLM artificial intelligence model. The case shines new light on the ongoing battle between content creators and tech companies over copyright protections in the AI era.
The Core Allegations
Lyon, who writes nonfiction writing guides, claims Adobe incorporated her pirated works into SlimLM's training data without permission or compensation. Court documents allege Adobe relied on problematic datasets that trace back to Books3 - a collection containing approximately 191,000 copyrighted books allegedly scraped from pirate sites.
The complaint specifically targets SlimPajama-627B, the dataset Adobe acknowledges using for SlimLM's development. While publicly available, Lyon's legal team argues SlimPajama inherits copyright issues from its predecessor RedPajama, which directly incorporated Books3 content.
"Adobe took shortcuts," says Lyon's attorney Mark Rifkin. "They built commercial products using stolen creative work while bypassing proper licensing channels."
Industry-Wide Implications
This lawsuit doesn't exist in isolation. Several major tech players now face similar legal challenges:
- Apple confronted allegations last September regarding its Apple Intelligence system
- Anthropic settled a $1.5 billion case with authors just last month
- Salesforce received complaints in October about its AI training practices
The pattern suggests an industry-wide reckoning may be coming regarding how AI companies source their training materials.
Why This Case Matters
The outcome could reshape how tech firms approach AI development moving forward. Currently, many rely on massive datasets scraped from various online sources with questionable copyright status. A ruling against Adobe might force companies to:
- Implement stricter vetting processes for training data
- Develop new methods to compensate content creators
- Potentially limit what materials they can legally use
The stakes extend beyond financial penalties - it's fundamentally about determining fair compensation models for creative work fueling today's AI revolution.
The timing couldn't be more critical as generative AI becomes increasingly dependent on vast quantities of text data.
Key Points:
- Adobe faces class-action lawsuit over alleged use of pirated books in SlimLM training
- Case centers on controversial Books3 dataset containing ~191K copyrighted works
- Similar lawsuits emerging against Apple, Anthropic and Salesforce
- Outcome could redefine copyright standards for AI training materials
- Potential billion-dollar implications for tech industry practices


