Generative AI vs. Copyright

The balance between copyright and free speech is being challenged by generative AI (GAI), a powerful and enigmatic tool that mimics human responses to prompts entered into an internet search box. The purpose of copyright law, according to the U.S. Constitution, is “to promote the Progress of Science and useful Arts, by securing to Authors and Inventors the exclusive Right to their exclusive writings.” The problem is that GAI’s ability to incentivize progress and innovation threatens the entertainment industry’s dependence on copyright to protect creative works.

Copyright law strikes a balance between those who create content and the public’s interest in having wide access to that content. It does this via granting authors a limited monopoly over the dissemination of original works by giving them the exclusive right to reproduce, distribute, and create derivative works based on copyrighted material. However, the concept of exclusive rights doesn’t really apply to artificially intelligent robots and computers scraping ideas and facts from public websites.

Because copyright does not protect ideas, facts, procedures, concepts, principles, or discoveries described or embodied in works, copying alone doesn’t constitute copyright infringement. To prove copyright infringement, one must prove that the defendant had access to the copyrighted work and that the defendant’s work is substantially similar to protected aspects of the first work.

For AI output to infringe upon a book, it must have taken a substantial amount of copyrightable expression from the author’s work. When it comes to text, GAI is an artful plagiarist. It knows how to dance around copyright. The predictive model emulates, it doesn’t copy. Insofar as text generated in response to a prompt is not substantially similar—a legal term of art—to the data it is scraping, it is not an infringement. In other words, don’t overestimate the value of litigation.

The fair-use doctrine is another limitation on the exclusive rights of authors. Its purpose is to avoid the rigid application of copyright law in ways that might otherwise stifle the growth of art and science. Fair use is highly fact specific. Which is another way of saying it’s a murky and contentious area of the law.

Several cases decided before the advent of GAI suggest fair use encompasses the ingestion and processing of books by GAI. For example, in 2015, in Authors Guild v. Google, the court ruled that Google’s digitizing of books without consent to create a full-text searchable database that displayed snippets from those titles was a transformative use that served a different purpose and expression than the original books.

“

When it comes to text, GAI is an artful plagiarist. It knows how to dance around copyright. The predictive model emulates, it doesn’t copy.

”

Fair use favors transformative uses. However, over time, the concept evolved from using a protected work as a springboard for new insights or critiquing the original to taking someone else’s photographs or other images and including them in a painting and declaring it a fair use.

In 2023, in Andy Warhol Foundation for the Visual Arts v. Goldsmith, the U.S. Supreme Court held that the claim to fairness is severely undermined “where an original work and copying use share the same or highly similar purposes, or where wide dissemination of a secondary work would otherwise run the risk of substitution for the original or licensed derivatives of it.” AI-generated works can devalue human-created content, but is that the kind of economic harm contemplated in the Supreme Court’s decision?

To sum up, on a case-by-case basis, courts must determine if substantial similarity exists and then engage in line drawing—balancing free expression and the rights of creators.

The tension between GAI and copyright will work itself out over time. While the publishing industry understandably has concerns, it must actively shape the future of AI through lobbying for legislation, licensing books to train AI, and creating bespoke AI models with its own curated data sets. That requires putting publishers at the center of AI discussions about the credibility of information, attribution, bias, compensation, and transparency.

In an age of disinformation, an author’s brand, a publisher’s imprint, and the goodwill associated with them are valuable assets. I believe the industry is less vulnerable than many think. But, to quote Nick Lowe, “Where it’s goin’ no one knows.”

Lloyd Jassin is a publishing attorney, former publishing executive, and coauthor of The Copyright Permission & Libel Handbook.

Generative AI vs. Copyright

An attorney examines what artificial intelligence will mean for the publishing industry