Authors have now joined the growing ranks of concerned creators suing tech developers over their much-hyped generative AI technology. And a pair of copyright class action suits recently filed on behalf of authors is raising broader questions about the most effective way to protect creators and creative industries—including authors and publishers—from the potentially disruptive aspects of AI.

Filed on June 28 and July 7 by the Joseph Saveri Law Firm on behalf of five named plaintiffs (Mona Awad and Paul Tremblay in one case, and Christopher Golden, Richard Kadrey, and comedian Sarah Silverman in the other), the suits claim that Microsoft-backed OpenAI (creators of ChatGPT) and Meta (creators of LLaMA) infringed the authors’ copyrights by using unauthorized copies of their books to train their AI models, including copies allegedly scraped from notorious pirate sites. While the authors’ attorneys did not comment for this story, a spokesperson for the firm suggested to Ars Technica that, if left unchecked, AI models built with “stolen works” could eventually replace the authors they stole from, and framed the litigation as part of “a larger fight for preserving ownership rights for all artists and creators.”

The authors join a spectrum of increasingly concerned creators on whose behalf the Saveri law firm has filed similar copyright-based lawsuits in recent months. In November 2022, the firm filed suit against GitHub on behalf of a group of software developers. And in January, the firm sued three AI image generators on behalf of a group of artists. Those cases are still pending—and, like most copyright cases involving new technology, they have divided copyright experts. Those who lean in favor of the tech side claim that using unlicensed copyrighted works to train AI is fair use. Those on the content creator side argue that questions of ownership and provenance cannot simply be waved away without major, far-reaching implications.

Neither Meta nor OpenAI has yet responded to the author suits. But multiple copyright lawyers told PW on background that the claims likely face an uphill battle in court. Even if the suits get past the threshold issues associated with the alleged copying at issue and how AI training actually works—which is no sure thing—lawyers say there is ample case law to suggest fair use. For example, a recent case against plagiarism detector TurnItIn.com held that works could be ingested to create a database used to expose plagiarism by students. The landmark Kelly v. Arriba Soft case held that the reproduction and display of photos as thumbnails was fair use. And, in the publishing industry’s own backyard, there’s the landmark Google Books case. One lawyer noted that if Google’s bulk copying and display of tens of millions of books was comfortably found to be fair use, it’s hard to see how using books to train AI would not be, while also cautioning that fair use cases are notoriously fact-dependent and hard to predict.

“I just don’t see how these cases have legs,” one copyright lawyer bluntly told PW. “Look, I get it. Somebody has to make a test case. Otherwise there’s nothing but blogging and opinion pieces and stance-taking by proponents on either side. But I just think there’s too much established case law to support this kind of transformative use as a fair use.”

Cornell Law School professor James Grimmelmann—who has written extensively on the Google case and is now following AI developments closely—is also skeptical that the authors’ infringement cases can succeed, and concurred that AI developers have some “powerful precedents” to rely on. But he is also “a little more sympathetic in principle” to the idea that some AI models may be infringing. “The difference between AI and Google Books is that some AI models could emit infringing works, whereas snippet view in Google Books was designed to prevent output infringement,” he said. “That inflects the fair use analysis, although there are still a lot of factors pointing to transformative use.”

Somebody has to make a test case. Otherwise there’s nothing but blogging and opinion pieces and stance-taking by proponents on either side.

Whether the AI in question was trained using illegal copies from pirate sites could also be a complicating factor, Grimmelmann said. “There’s an orthodox copyright analysis that says if the output is not infringing, a transformative internal process is fair use,” he explained. Nevertheless, some courts will consider the source, he added, noting that the allegedly “unsavory origins” of the copies could factor into a court’s fair use analysis.

In a June 29 statement, the Authors Guild applauded the filing of the litigation—but also appeared to acknowledge the difficult legal road the cases may face in court. “Using books and other copyrighted works to build highly profitable generative AI technologies without the consent or compensation of the authors of those works is blatantly unfair—whether or not a court ultimately finds it to be fair use,” the statement read.

Guild officials go on to note that they have been “lobbying aggressively” for legislation that would “clarify that permission is required to use books, articles, and other copyright-protected work in generative AI systems,” and for establishing “a collective licensing solution” to make getting permissions feasible. A subsequent June 30 open letter, signed by a who’s who of authors, urges tech industry leaders to “mitigate the damage to our profession” by agreeing to “obtain permission” and “compensate writers fairly” for using books in their AI.

But a permissions-based licensing solution for written works seems unlikely, lawyers told PW. And more to the point, even if such a system somehow came to pass there are questions about whether it would sufficiently address the potentially massive issues associated with the emergence of generative AI.

“AI could really devastate a certain subset of the creative economy, but I don’t think licensing is the way to prevent that,” said Brandon Butler, intellectual property and licensing director at the University of Virginia Library. “Whatever pennies that would flow to somebody from this kind of a license is not going to come close to making up for the disruption that could happen here. And it could put fetters on the development of AI that may be undesirable from a policy point of view.” Butler said AI presents a “creative policy problem” that will likely require a broader approach.

On that score, there is growing agreement that the potential threat posed by AI to creators must be addressed—and with urgency. The striking writers of the Writers Guild of America (now joined by SAG-AFTRA, who went on strike on July 13) are at the forefront of pushing for guardrails on the use of AI via their labor contracts, for example. And this week, the Washington Post reported that the Federal Trade Commission is probing OpenAI for potential breaches of consumer protection law, sending the company some 20 pages of questions and record requests about its practices—including about how the company obtains the data it uses to train its AI.

Such approaches are more likely to yield progress for creators than copyright infringement litigation, lawyers told PW, though copyright law will certainly inform the debate. “Copyright law is not a good place to look for comprehensive solutions to big policy problems,” Grimmelmann said. “But it enables us to ask important questions.”