The Internet Archive (IA) this week asked a federal judge to weigh in on a discovery dispute in the copyright infringement lawsuit over its program to scan and lend copies of books.

In an August 9 filing, IA attorneys told the court it is seeking monthly sales data for all books in print by the four plaintiff publishers (Hachette, Penguin Random House, HarperCollins, and Wiley) dating back to 2011. But the publishers, IA lawyers told the court, have balked at the sweeping request reportedly countering that the request is well beyond what the case calls for.

In their pre-motion filing, IA lawyers insist the sales data is crucial to its fair use defense.

“Plaintiffs claim that the Internet Archive’s digital library lending has a negative effect on the market for or value of the works. The Internet Archive disagrees, and wishes to bring forward evidence showing that lending had little or no effect on the commercial performance of the books being lent, compared to books that were not lent,” IA lawyers told the court. "Specifically, in order to show that lending had little or no effect on commercial performance, the Internet Archive wishes to compare the commercial performance of books that were available for digital lending with books that were not available for digital lending."

IA lawyers also attempt to explain the massive, sweeping scope of their request, conceding that they do not need a decade's worth of monthly sales data for “each and every book” but only for the 127 works included in the suit as well as "one or more" books that could be deemed “comparable” for each the 127 titles under scrutiny. But since the plaintiffs have "declined to identify books they regard as comparable,” IA attorneys claim, they should be compelled to produce data about all books so that the Internet Archive can "identify books it regards as comparable" and the parties can then "debate, on a level playing field, whether such books are or are not comparable.”

In a footnote, IA attorneys told the court they are seeking data including the “number of physical copies embodying works by distribution channel; the prices for those physical copies; the number of e-books embodying works by distribution channel; the number of e-book loans divided by distribution channel; prices for any transaction related to e-books; income from sales of physical books by distribution channel; and income from e-book transactions by distribution channel.”

[T]he Internet Archive wishes to compare the commercial performance of books that were available for digital lending with books that were not available for digital lending.

The IA argues that monthly sales data (rather than annual sales data) is necessary because “sales of a particular book change so drastically within a year.” Further, without monthly data there is no way to tell the true impact on sales of the short-lived National Emergency Library initiative, the controversial initiative in which the IA unilaterally removed lend limits on scanned titles during the early months of the pandemic in 2020.

In the filing, IA attorneys also suggest that accommodating the daunting request is not unduly burdensome.

“This is commercial data stored in databases, indexed by book," the IA argues. "Plaintiffs were able to provide data about the works in-suit by accessing such databases; this motion simply seeks the result of querying the same systems for a larger set of books.”

The copyright infringement lawsuit was first filed on June 1, 2020, in the Southern District of New York, and is being coordinated by the Association of American Publishers. The AAP has compared the IA's scanning and lending efforts to those of the world's largest pirate sites. The plaintiff publishers are seeking damages for infringement as well as to shut down the IA’s scanning and lending program and to have any infringing scans destroyed.

Lawyers for the Internet Archive counter that their decade-old program is "sheltered by the fair use doctrine" and is designed to operate like a traditional library, guided by an untested theory called controlled digital lending (CDL). Under CDL the IA and its partner libraries scan legally acquired print books and make the scan available for lending in lieu of the print under rules designed to mimic traditional library lending: only one user can borrow a scan at a time; the scans are DRM-protected against copying and to enforce limited lend periods; and the scan and the corresponding print book from which the scan is derived are not allowed to circulate at the same time in order to maintain "a one-to-one owned-to-loan" basis.

At press time, a date has not been set for a hearing.