The Past 25 Years of E-books

From the year after PW’s 125th anniversary issue, 1998—the year when the first handheld e-readers came out, the first library lending of e-books was launched, and e-books first got ISBNs—to today’s robust global e-book ecosystem and the innovations that keep pushing it in new directions, e-books have always been part of the conversation, good or bad.

The hype dates back much further than that.

A Very Old New Idea

The first published concept of an e-reader is generally credited to the writer Bob Brown, who in 1930 put out a manifesto advocating for the development of “readies” (reed-ees) to compete with movies, which had not long before launched “talkies.” He wanted “a simple reading machine which I can carry or move around, attach to any old electric light plug and read hundred-thousand-word novels in 10 minutes if I want to, and I want to.”

The first actual electronic books are generally credited to Michael S. Hart, a student at the University of Illinois who had access to a mainframe computer and Arpanet, the precursor to the internet. He started by creating digital text of the Declaration of Independence and publishing it on the network in 1971, which launched a robust e-book publishing venture called Project Gutenberg. There were actual e-books (not just that first pamphlet) produced for pleasure reading and distributed online for free 50 years ago. Project Gutenberg, a volunteer-driven project, is still going strong today, currently boasting over 60,000 books, mostly those whose U.S. copyright has expired.

I’m old enough to remember that first e-book. But many reading this article may at least remember when e-books began to flourish in the CD-ROM era. You may not remember DynaText, created in 1990 for delivering really big e-books, like technical manuals for aircraft on CD-ROM. But I hope some of you remember Voyager, which launched Expanded Books in 1991. (A shout-out to one of its founders, Bob Stein, who is still expanding our vision of publishing to this day.) I would argue that Expanded Books, and several of its underlying technologies, really raised the bar—and raised expectations—for what e-books could be, including video, audio, and interactivity. Real books, on steroids. The future had arrived.

The First Generation of E-books

The 1990s saw a proliferation of e-books and e-book developments. Bibliobytes, a website for distributing free and for-sale e-books over the internet, was founded in 1993. (What, you thought Jeff Bezos invented that? Amazon was founded in 1994, to sell print books.) HTML, the markup language of the web—and of almost all e-books, unless you consider PDFs e-books—became a standard in 1995. Project Gutenberg published its 1,000th e-book in 1996. E-Ink, the display technology used to this day in e-readers like the Nook, Kobo, and Kindle, was invented in 1996 and commercialized in 1997, just as PW was about to celebrate its 125th anniversary. The e-book era was off and running.

When people think of e-books, they don’t just think of the digital content. They also think of the device on which they read it. The first handheld e-readers, NuvoMedia’s Rocket eBook and the SoftBook reader, came out in 1998, and others, like Mobipocket and the Sony Reader, soon followed. NetLibrary, one of the leading early distributors of e-books to both the academic and general public markets (and which, significantly, enabled e-books to be lent by libraries), was also founded in 1998. E-books got ISBNs in 1998. And Google was founded in 1998. Lots was happening when PW hit 125!

But Houston, we had a problem. Each of those e-readers used its own proprietary markup system. Publishers that wanted to distribute their e-books widely had to make a different version of the e-book for each e-reader—different electronic files, each coded in a different way. This was unsustainable.

Dismantling the Tower of Babel

The Open eBook Forum, a group composed of volunteer technologists, many employed by the e-reader companies or the service providers that created the e-book files, was formed to develop an open standard format that all e-books could use. The result, known as OEB and formally named the Open Ebook Publication Structure, was published as OEBPS 1.0 in 1999.

One of the great virtues of OEB was that it was a standard built on standards: XML markup for the content, based on the HTML vocabulary and rendered by CSS, with Dublin Core metadata, packaged in a .zip container. For those unfamiliar with these acronyms, they are fundamental technologies, all open standards, very widely understood and used, all still dominant today. OEB was designed to be the opposite of a proprietary format—one format, free and open, that everyone could use.

Did that suddenly put those proprietary e-reader companies out of business? Not at all. They could convert OEB to their own proprietary formats rather than force publishers to do that, or data conversion firms could do the job. It mostly worked. There was, predictably, a transitional phase for the industry to retool and retrain. Along the way, OEB evolved as real-world experience prompted improvements. The Open eBook Forum became the IDPF, the International Digital Publishing Forum, in 2004. Finally, in 2007, what was then OPS (Open Publication Structure) 2.0 was released—and renamed EPUB.

Another notable event in 2007: Amazon launched the Kindle. Which didn’t use EPUB.

The Kindle was based on a proprietary format: Mobipocket, which Amazon had acquired rather than reinventing e-reader software. For many years, as the other e-readers evolved to support EPUB (or went out of business), publishers still had to provide two different formats for their e-books: the Kindle format for Amazon and EPUB for everybody else. It wasn’t the Tower of Babel anymore, but it was a pain in the neck. (Today, Amazon wants publishers to provide EPUB 3, which it converts to what is really a constellation of Kindle formats for its various devices.) EPUB is now maintained by the W3C, the Worldwide Web Consortium, which maintains the web standards on which EPUB is still based, like XML, HTML, and CSS. The IDPF became a part of the W3C in 2017.

To pick up on that blithe “or went out of business” comment: the e-book ecosystem changed from a group of small e-reader companies like Mobi, Rocket eBook, and SoftBook, and an e-reader from a big company, the Sony Reader, to today’s ecosystem dominated by big players, some of which sell devices, some of which provide e-reader software, and some of which do both.

Amazon is far and away the biggest; its various models of Kindle are very popular, and Kindle software enables Amazon’s e-books to be read on phones, laptops, and tablets as well. Apple iBooks and Google Play Books are both big players, each selling millions of e-books (but not selling devices). Kobo, having originated in Canada’s big Indigo bookstore chain, is also a big player, especially internationally; it is now owned by Rakuten, the Japanese publishing giant. Barnes & Noble’s NOOK is also popular, with a suite of devices and, unsurprisingly, a big selection of books.

There are a great many other e-reader systems and software in the world today; those I mentioned are just the most prominent. For example, OverDrive’s two e-readers are very widely used in public and school libraries, and SimplyE, a free, open source e-reader app developed by the New York Public Library, is also notable. Adobe Digital Editions software was once dominant, but has faded.

Why Can’t I Share My EPUB and Read It on All My Devices?

Despite the diversity mentioned above, this is no longer the Tower of Babel. Today most publishers are able to create one EPUB and distribute it to all of those retailers and aggregators. And with the prominent exception of Amazon, the actual e-book files that the players mentioned above actually deliver to end users are mostly EPUB files. Why can’t they be opened everywhere?

This is where I have to use the acronym never to be spoken: DRM, which stands for digital rights management. I work with lots of technologists. In that world, DRM is anathema. The technology world, and certain sectors of publishing (more on this below), are undergoing a major transition to openness. This is a Very Good Thing. In that world, DRM is viewed as access prevention software, which is precisely what it is.

But commercial publishers depend on DRM to protect their intellectual property. One of the virtues of digital publications is the ease with which they can be duplicated and distributed. While commercial publishers have embraced the digital world, that aspect of it can be seriously threatening: piracy is a real problem. They can’t sell a single book to a single person without the assurance that the digital file won’t get duplicated and shared in an ever-expanding flood of duplicates that undermines the sale of that book. (The reality is that DRM is often cracked; but the problem of piracy would be far worse without it.)

Most people mistakenly assume that commercial publishers are the ones who apply DRM to their books. Actually, no. The EPUBs that they send to all the players discussed above are DRM-free. That’s because each of those players needs their own DRM. They’re the ones that actually sell the books; they need to make sure the books can only be opened within their own walled gardens, their collection of e-readers, apps, and software—and further, to control that the e-books can only be opened, with minor exceptions, by the purchaser.

So what we have now is an e-book ecosystem that is almost entirely built on free, open standards, often with open source software, that is intricately interconnected, but with toll gates. Those toll gates are essential to the retail book supply chain: they enable everybody in that supply chain—from the retailers who actually sell the books, to the distributors and aggregators, the publishers, and the authors—to get paid.

But this is not true of all books. Because scholarly and scientific publishing has made such a significant commitment to open access in the past few years, new payment methods are evolving. These are largely in place for journals, particularly scientific journals: the cost of publication of an article is covered up-front by what are called APCs, article processing charges, paid by the authors, typically from the funds supporting their research. Books have proven much more complicated. Humanities authors don’t get funding like scientists do; they can take years to produce a single book; and the cost of publishing a book is many times that of publishing a journal article.

Today, in both scientific and humanities scholarship, funding models for books are evolving. A variety of complex contractual arrangements have been developed between large publishers and large library systems that bundle the costs of subscriptions and the costs of publication in order to make all the books (and journals) open access. University presses and other smaller scholarly publishers are developing innovative “subscribe to open” systems that enable books to be published open access when enough libraries commit to contributing to the publication costs of a book. What had been an unfulfilled expectation—e-books should be easy to share—is becoming an underappreciated success in the eyes of the general public (but very much appreciated by scientists and scholars).

The Other Kind of Access: Accessibility

Providing access to books for people who have physical or cognitive disabilities that prevent them from reading print effectively, known as the print disabled, has been a longstanding, and long-neglected, problem in publishing. Decades ago, providing that access was extremely labor-intensive. Books were narrated and recorded (this was before audiobooks became common) or rekeyed as Braille, or OCR’d, proofread, and encoded in special tagging systems that made them usable with assistive technology, like screen readers. None of the solutions was ideal; all were expensive; and especially in education, students often couldn’t obtain proper course materials until the semester was well underway.

The advent of digital publishing was a huge step forward. Now, digital files could be obtained to eliminate the need to re-keyboard and re-proofread books, and those files could be encoded in the special coding systems for assistive technology. But hardly any books were originally published in those accessible forms; publishers were even reluctant to provide the digital files to services that could make them accessible, for fear of piracy. To be used by assistive technology, they couldn’t have DRM. And even when publishers did provide the files, a lot of remediation still needed to be done, because they almost always lacked essential aspects like image descriptions and proper navigation and coding. That still entailed a lot of work, a lot of cost, and a lot of delay.

What I think is one of the most important underappreciated successes of e-books is that EPUB is now designed to be properly accessible in the first place: books can be “born accessible.” They don’t require special coding, they just need to use the standard EPUB specifications correctly. So if they have been tagged properly with HTML; if they have proper navigation, like a linked table of contents, and clear reading order; if they have proper metadata; and, if there are images, they have image descriptions—they are accessible out of the gate. The same EPUB everybody can buy can be accessible for print-disabled users. At the same time and the same cost.

This is happening today. Especially in higher education, and to a lesser extent in scholarly and trade publishing as well, many publishers have made the commitment to make their EPUBs accessible, and many of the prepress and data conversion vendors who produce most of those EPUB files have learned how to make them accessible in the first place. (The exceptions are what are called “fixed-layout” e-books: many cookbooks and children’s books, comics, manga, and graphic novels. We have not yet solved the problem of making those accessible, but there is ongoing work in the task forces and working groups in the W3C to get this done.)

For most trade books and academic books, I would count this as a realized expectation: that e-books should be accessible. It is still underappreciated, and it’s far from universal, but the fact that standard EPUBs can be fully accessible is a huge success.

What About All Those Cool Things Expanded Books Did?

As e-books became mainstream—today, virtually all trade and scholarly books are published as EPUBs in addition to print (another underappreciated success)—the original expectation that they would replace print, because they would provide so much better functionality, gradually faded. Despite all those books being available as e-books, e-books have plateaued in the past few years at about 20%–25% of sales. And even as audiobooks have recently surged, almost no trade e-books have video, audio, or interactive features, despite the EPUB spec enabling those features—they’re still “just normal books.” Talk about an unfulfilled expectation!

This should not be a surprise. The traditional print book is a wonderful format for a typical novel or nonfiction trade book, as well as for most scholarly monographs. E-books are a welcome complement—I’m far from alone in buying, for travel, an e-book for my Kindle of a book I’m reading in print at home—but most commentators, including me (and I serve as the W3C Global Publishing Evangelist!), don’t expect them ever to replace print.

The underappreciated success of this vision for e-books is being realized outside of trade, primarily in higher education, and in training and some scholarship as well. Most of the leading higher education publishers are moving away from print in favor of online platforms that in fact do provide the sophisticated media and interactive features that most e-books lack. Many of those publishers have built those platforms on an EPUB foundation and still call the books e-books. And there are online scholarly e-book platforms that provide these features to enable researchers to convey their research in dynamic ways impossible in print.

The Good Work Continues

It’s a common mistake to assume that new developments displace what came before. What we have now is a much richer, more complex publishing ecosystem than when PW turned 125, thanks to e-books. Will we still have print when PW is 175? Of course. What you may be surprised to hear me say is that I think we will still have EPUBs that are very much the same as today’s EPUB 3. There are now many millions of EPUB 3s in the world, and virtually all e-reading systems and software are built for EPUB 3. When the IDPF, in one of its last acts, published EPUB 3.1, it was not backwards compatible with EPUB 3.0. Big mistake: it was virtually ignored by the industry. That was corrected after the IDPF became a part of the W3C. Today’s EPUB 3.2 is backwards compatible with EPUB 3.0—and it’s getting very broad acceptance. It does what most people need it to do—including being able to be accessible for the print disabled.

Does that mean the W3C just shuts down work on it? Not at all. There is in fact a new, improved version, EPUB 3.3, in the late stages of development. But the charter for the EPUB 3 Working Group was very strict on an important principle: EPUB 3.3 has to be backwards compatible with EPUB 3.2, the current version. It works; don’t break it.

At the same time, lots of work is being done both to further improve EPUB and potentially to move beyond it with alternative, not replacement, specifications. Making fixed layout EPUBs accessible is one example of an improvement underway. As for moving beyond EPUB without displacing it, the W3C developed an updated spec for a publication manifest, using different, and more modern, coding than the one in EPUB; the new audiobooks standard is the first profile of that manifest. It is hoped—and I expect—that it will soon be used to address both the needs of “visual narratives” (e.g., comics, manga, graphic novels), and, separately, to standardize the pathbreaking work being done in those higher education platforms I mentioned that offer browser-based content, media, and interactivity.

Our e-book publishing ecosystem just keeps getting better.

Bill Kasdorf is principal of Kasdorf & Associates, founding partner of Publishing Technology Partners, W3C Global Publishing Evangelist, and a columnist for Publishers Weekly.