As the book industry continues its transition to digital delivery, the term "Big Data" has been showing up in discussions of publishing strategies going forward. Yesterday the Book Industry Study Group’s Annual Making Information Pay conference, held at the McGraw-Hill Auditorium in Manhattan, offered a program that outlined the broad nature of Big Data—the ability to “tame vast amounts of data,” according to keynote speaker Jake Freivald of Information Builders—how to manage it and how Big Data can help publishers run their businesses more efficiently.

While Freivald’s presentation provided a broad conceptual outline of Big Data and how businesses can effectively make use of a growing and vast amount of raw data, another presentation by Peter Collingridge, founder of the U.K. digital vendor Bookseer, offered a presentation on what he called “little data,” that seemed a bit more easily comprehensible and applicable to the classic trade book marketing problem—how do you know if your marketing and promotional campaigns are working?

Big Data is a bit of a catchall phrase to indicate the ever-increasing amount of raw data and information generated by digital networks, sensors and data capturing technology. As publishing and, well, every other business and public service transitions to digital delivery, these systems are generating, “more information than we know what to do with,” according to Freivald, v-p, corporate marketing Information Builders, a business intelligence and data analysis firm. “Information is a strategic asset,” Freivald emphasized, “but what do you want to do with it.”

From Terrabytes to a 'Lottabytes'

What is Big Data? “It's all the stuff we do online,” Frivald said. Wikipedia defines BD as massive “datasets too awkward to work with,” and Freivald pointed to data collected through retail and community sites online—“every interaction in an Apple Store, even from a brief encounter,” for instance. There’s data from mp3 downloads, satellite imagery, financial market info, documents, customer interactions and shopping data, government data feeds; the list and sheer volume is growing; seems endless and probably is. In fact, Big Data can be characterized by three critical aspects of magnitude: volume, velocity and variety.

Freivald said Big Data is the process of “taming vast amounts of data sets using new technology that opens fresh approaches to making decisions,” and joked that data analysis has gone from managing “terrabytes” of information to “lottabytes” of information. All this data continues to grow, generated by disparate systems in a wide variety of structures, and publishers need to be able to think strategically how to manage it and use it to model business practices for the future. Key to wrestling meaning out of “Big Data” are new systems and technology aimed at “data visualization,” using software management to crunch these vast amounts of raw information and spit out “visualizations” of it at the other end. These visualizations usually take the form of dashboards, online panels that present information in ways that allow managers to easily create maps, charts, graphs and easily digestible visual representations of what the data means. And most importantly, Freivald said, visualization turns Big Data into information that business managers—CEOs, marketing and promotion personnel and store managers—can use and manipulate. Big Data is not just for specialized data analysts or the IT department. “People who really need this information aren’t necessarily data analysts,” he said.

His point is that this kind of data, along with the technology to analyze it, can be used to create all manner of predictive models about customer and selling behavior, store traffic and shopping trends, indeed a whole range of models that can help publishers make better decisions about what they do. Freivald’s presentation was followed by Kyle Marx, v-p of business analytics at ReaderLink, formerly Levy Home Entertainment, a book distributor specializing in distributing books to non-book channel retailers like Kmart, Walmat, Rite-Aid, Target, Safeway and other mass market retailers. ReaderLink has more than 23,000 accounts, visited by more than nine million shoppers everyday, and Marx outlined how the company has evolved in dealing with a growing amount of data and how the firm has worked to use it strategically.

Intuition vs. data driven decisions

But the presentation by Bookseer cofounder Collingridge offered an easily understood illustration of data analysis that almost anyone can see as valuable. He descrivbed Bookseer as a tool for “fixing” book marketing. While he called his own process, “Little Data”—it's unclear what magnitude of data differentiates Big and Little Data—Bookseer seems to do a lot of what Freivald talked about. Bookseer offers the ability to track all kinds of data that publishers have come to be obsessed with and display these feeds across a timeline, superimposing data feeds one over the other, in a way that provides context and illustrates the active relationship between one media event and another and ultimately their relationship to book sales. Bookseer is a web-based application that can simultaneously track Amazon book standings, Google searches, Nielsen Bookscan sales reports, BitTorent activity, Twitter, Facebook, NPR and other media outlets, and the list goes on. The service can then provide a visual graph, layered with all these feeds in a visual presentation that allows a marketer to, say, easily assess whether a TV appearance, a price discount, an NPR interview, a newspaper review or whatever, actually causes a spike in book sales, print or digital.

“Publishing has always been driven by supply,” Collingridge said, quoting Penguin CEO John Makinson, “rather than demand.” Collingridge said Bookseer allows publishers to replace “intuition,” about what is happning in the book marketplace, with real-time data that will help them “understand demand. It’s hard to know what marketing strategies work and which ones don’t, since there’s no proof.” Bookseer just may be the answer to that vexing problem.

There were also presentations by digital publishing consultant Brian O’Leary on the quality and best practices about Metadata, the critical product information publishers must provide to digital content for it to be found in the digital marketplace, as well as a presentation by Kyusik Chung of Goodreads, an online reading community with eight million users, on how the site facilitates book discoverability. O’Leary gave preliminary results from a report on Metadata that will be delivered at BEA—apparently 1/3 of publshers surveyed don’t check the accuracy of their metadata at all and he expressed concern that metadata is often altered “downstream” after its input in ways that seem somewhat mysterious, ultimately affecting the publishers' ability to sell books effectively. Chung offered his own version of Big Data—Goodreads' eight million members generated 20 million visits to GR last month. GR members also have “bookshelves,” and are asked to mark and place books in their accounts; nearly 300 million books have been “shelved” and about 16 million books a month are “discovered” in this way on the reading site. Goodreads has also engineered a “recommendation engine”—once a reader “shelves” 20 books, the software points to other books the reader may like and the service can be surprisingly useful. Indeed Chung said, members interest shows a tendency towards the “long tail,” finding backlist titles, while emphasizing that how readers find books, “depends, and there are different tools for the head and for the long tail.”

Making Information Pay closed with an entertaining presentation by New York Times reporter Charles Duhigg, author of The Power of Habit: Why We Do What We Do in Life and Business, a look at research into how the brain functions to form habits as well as the cultural considerations that drive the formation of human habits. A habit, he explained, can be scientifically defined as a behavioral “loop” that includes a cue (say, stress), a routine (smoking), and a reward (stress relief), in a process that goes around, repeating those steps over and over again . Duhigg offered a genial and smart discussion of research into how habits are formed and also offered a tentative link to habit research and the appeal of addictive videogames like Angry Birds and Halo as well as the history of Procter and Gamble’s development of Febreze, a cleaning solvent that removes odors. (In an earlier presentation, BISG executive director Len Vlahos said the increasing adoption of tablets over dedicated e-readers suggested that now, “the competition for leisure time,” between reading and, say, playing Angry Birds, was a “problem now that leisure activities are available all the time with multimedia tablets.")

Duhigg provided far too much thoughtful detail (not to mention amusing anecdotes) to recount here, but he suggested that studying the Habit loop, the videogames and, yes, the development of Febreze, could offer some clues to helping the act of book reading compete with the distracting multimedia delights of videogames and the tablets we use to access them. Duhigg suggested that videogame designers have discovered that when people are bored with a game or a section of a game, making that section more difficult—providing surprises—makes people more engaged with them. “There’s a surprise on every page of a difficult novel and that can be an advantage when it comes to forming habits,” he said.