Thanks to a new generation of software and computing tools, in the future, book marketing will be determined by data rather than by intuition.

If you're a publishing executive and you haven't been reading about Big Data, then you soon will be. Big Data is just what it sounds like—data collections of such enormous size they are awkward, expensive, and impossible to process with conventional computing. Big Data also refers to the ability to use distributed computing—parsing out these huge data sets and processing them simultaneously on multiple computers—plus new software tools and deep analysis to create new kinds of predictive business models that will drive the decisionmaking in the future.

But Big Data is also a broad and informal term used to refer to the vast amounts of raw data generated by global online networks and an ever-increasing variety of data-capturing digital technologies. "All the stuff we do online" is how Jake Freivald, v-p, corporate marketing at Information Builders, characterized it. His company, a business intelligence and data analysis firm very much involved in Big Data, spoke on the topic at BISG's recent Making Information Pay conference.

Big Data and Visualization

According to Edd Dumbill, program chair for Strata and the O'Reilly Open Source Convention, an ongoing series of conferences focused on Big Data, BD can be characterized by the "volume, variety, and velocity" of the data sets under question. In Big Data, these three aspects are typically so huge and the data so varied, conventional data processing is inadequate. At the same time, "visualization," or the ability to output all this data in some kind of visual and easily digestible form, is a critical element in the process that makes the results of Big Data useful to business managers and not just data specialists.

"Visualization is simply making data visual," explains Julie Steele, content editor for O'Reilly Media's Strata conferences. "It's using artistic elements, graphs, and charts to make data comprehensible. Visualization reduces large data sets to visual trends or patterns and relationships, with the goal of better decisionmaking, to get a better understanding of the world and of the future." The ability to process vast troves of data on customer behavior—data sets that would have been too expensive or taken far too long to process in the past—offers the potential to create predictive models for any business, and certainly for book publishing.

In an era when more people than ever are shopping online and consumers are making use of digital apps, e-books, and digital reading devices, all of which capture and transmit a wide variety of usage data back to publishers and retailers, "Big Data holds the promise of helping publishers make better decisions," Steele said. Publishers can get feedback on how long a reader stays on a certain page or why readers have stopped reading on a certain page. "E-books allow you to modify pricing, and data analysis will let you see how the market responds in real-time, and make changes," Steele says. Indeed the aggregation, processing, and deep analysis of this kind of data set gives publishers the ability to tie consumer purchases to a promotion, to their friends' purchases, to reviews, and more..

Apache Hadoop and Distributed Computing

If O'Reilly Media's Strata conferences are any measure of the interest—Steele said O'Reilly held two Strata conferences in 2011 and is planning to hold five (in New York, London, San Francisco, Santa Clara, Calif., and China) in 2012—interest is growing as business leaders learn how Big Data can improve or even completely transform their businesses for the better. But even as interest in Big Data grows and businesses begin to look for ways to make use of it, Big Data is just beginning to assert its impact on decision making. Because of the technical expertise required as well as the scale of data collection, the initial focus on Big Data has been driven by big companies, though that is changing as smaller tech startups focus on the category. And while some book publishers and retailers are making use of Big Data approaches, most experts contacted for this story said it's still the very early days of book publisher involvement. So early, in fact, that book publishers either aren't generally on board or are simply unaware of the benefits of using Big Data techniques.

Todd Lipcon, an engineer with the Apache Hadoop project management committee, emphasizes that most publishers will make use of Big Data by way of vendors and middlemen rather than try to set up their own processing in-house. Apache Hadoop is described as the core technology driving the adoption of Big Data and the ability to quickly and economically process the huge datasets. "Smaller companies may not need to use it but they still need to think about the kinds of data they do need to collect and whether collecting more data could help their business," says Lipcon.

Lipcon says that any company currently tracking social media is probably using Hadoop, and he pointed to companies like Amazon and Attributor, the digital piracy monitoring service, as other book-related companies that use Hadoop. The technology is also used by media ventures like Hulu, LinkedIn, Rakuten and Spotify, according to the Hadoop Wiki. The technology, Apache Hadoop, is an open source software framework—it is public and anyone can use it without a fee—designed to take on these tasks; it's generally acknowledged to be the market leader. Created six years ago by Doug Cutting—it's named after software lingo ("patching") and his son's toy elephant—Hadoop software (including an important component, Hadoop MapReduce) allows a user to take a huge and variegated data file, split it up into equal parts, distribute it to a series of networked computers or clusters (from 20 to 1,000 or more one expert said) and process the data faster and cheaper than ever.

Books and Big Data

"Publishers aren't quite there yet," says Dumbill, referring to the number of publishers he sees at the Strata conferences. Utilizing Big Data projections, he says, will mean a shift from "operational applications to creative and profitmaking applications, in other words, the ability to uncover [new business] opportunities. You have to dig deeply into this information and it will change how a business views data. You will find stuff that you might have missed in the past."

But Dumbill also makes it clear that Big Data "can also be controversial. You may find out that the grand old men that are supposed to know how the business works may not know it so well. It's a completely different way of doing business."

Big Data does require specific skills and it has led to the rise of the "data scientist," Dumbill says, a new category of specialist that offers "a recombination of old skills like math and statistics with programming skills, an entrepreneurial instinct and an investigative flair." Indeed, every expert with whom PW spoke emphasized the need for companies to be both creative and flexible.

"It's not about computers making decisions," Steele explains. "It's humans making decisions in collaboration with machines." In addition, Big Data is not designed only for data specialists. "Most businesses should give as many people in the organization as possible a feel for using data," Dumbill says, and while the process should be collaborative, it also "has got to start from the top, the leadership and the owner, or there will be resistance from the organization."

"Information is a strategic asset," Freivald said during his presentation at the MIP conference. "We collect a lot more information than we know what to do with. Businesses need to collect information that makes sense." Information Builders has only worked with one publisher, Walsworth, and the book distributor ReaderLink, but Freivald says, "We'd love to work with more."

The company deals primarily with firms from industries like manufacturing, health care, and finance, industries that easily generate the huge quantities of data that require a Big Data approach. But he was quick to cite the importance of publishers collecting marketing data—who is your audience and what do they want—as well as data on manufacturing processes and the inefficiencies around it.

Most importantly, while Freivald emphasizes that "data scientists" are necessary—"they know how to make Big Data accessible"—the results from Big Data are ultimately not aimed at just the data specialists. "Data scientists help structure the data so a store manager can look at the results and say, these products should be displayed differently," Freivald says. He emphasizes the importance of visualization and says the results of Big Data analysis are generally outputted through "dashboards," or online display panels, that present information in the form of maps, pie charts, graphs, and other easily digestible visual representations of what the data means. "Rows of columns and figures are hard to understand," he says. "If the end result is displayed as a point on a map, or a bar chart, it will help people make better decisions about their business. If the staff can't understand it, it's worthless."

Two Startups

Bookseer and CoverCake, two recent startups focused on data collection in the book market, have positioned themselves (with differing initial results) as Big Data solutions for book publishers, offering to provide data that will help them decide which promotions are working and which one's are not.

Launched less than a year ago, Bookseer is a market analysis and intelligence firm based in London and New York, designed to provide data services—with a focus on book marketing—specifically to the publishing market. Started by U.K. publishing veteran Peter Collingridge and technologist Stephen Betts, Bookseer can track and collect data from a wide variety of media outlets—from Twitter, Facebook, and blogs to Amazon sales, Nielsen BookScan sales reports, Google searches, and BitTorrent. After collecting the data, the Bookseer technology can superimpose a timeline of each data feed in a visual outline that allows a publisher to essentially connect the dots. If the marketing department launched an ad campaign or an author is appearing on Good Morning America, Bookseer can provide evidence that a campaign or media appearance very likely caused a spike in print or e-book sales—or didn't. Collingridge says Big Data has the potential to make book marketing a "demand-driven" practice rather than one driven by supply or by a publisher's intuition. Collingridge says, "[Publishers] have lots of marketing that doesn't work and yet they keep throwing money at it because there's never been a way to measure this stuff. Now, if something's not working, we can see it and try something else."

Bookseer is a subscription service that provides publisher clients with a password-protected logon and access to data collection results in real time. Subscriptions start at roughly $10 per title per month, with a 500-title minimum, or $5,000 a month.

"One hour after the data is collected, our clients can get access to see the impact of that data in real time," Collingridge says in a phone interview with PW. "If a publisher is launching a Facebook ad campaign or multiple promotions, it can see who is clicking where and what can be correlated to actual sales. If the publisher has launched multiple campaigns, it can stop the ones that aren't working and extend those that are."

"Our vision is that the Big Data approach is required," continues Collingridge, who also appeared at the recent BISG Making Information Pay conference. "We can track millions of points of data and each little point can be useful." And while this seems like the answer to a sales and marketing executive's dreams, Collingridge says his company has a big problem—Bookseer can't find enough clients. Although the venture has worked on projects for the U.K.'s Big Four publishers—Random House, Hachette, HarperCollins and Penguin—"It's been a struggle to get clients."

A veteran publishing figure in print as well as in digital publishing, Collingridge founded the digital publishing vendor Apt Studio and is a cofounder of the U.K. app developer Enhanced Editions. In a candid response, Collingridge says Bookseer grew out of the failure of the app publishing business. "At first the app business seemed to be working," he says, "but it was the data we were capturing from the apps that was really interesting."

"The apps business wasn't working because the marketing wasn't working," he tells PW. Consumers weren't buying enough apps and the publishers weren't promoting them properly, he says. "The book business is moving from a B2B industry to a B2C market and publishers know nothing about demand marketing or about consumers." Collingridge says, "So we decided to see if we could show them a better way."

Bookseer was designed for "publicists not M.B.A.s," Collingridge says. "The publishing people who get it are addicted to it and check it constantly." But he also laments, "Some publishers think they just haven't got the time to monitor the system properly." Nonetheless, Collingridge is convinced that he's created a marketing tool that's "a laser-guided missile system. But Bookseer also marks a cultural change. We underestimated how scared the industry might be of software that shows where people aren't doing a good job. But it also shows who is good at their jobs."

Collingridge says Bookseer had venture capital investors lined up but lost them because of struggles to attract customers for the service. Bookseer continues to operate, though now with a skeleton crew—Collingridge and Betts—running it from London and New York; in fact, Collingridge has taken a job as v-p of new product development at Safari Books Online. Still, Collingridge emphasizes, "We aren't prepared to let [Bookseer] go. It's not paying us right now but we believe in it. We're just a little bit early in the marketplace."

Similar in some ways to Bookseer, CoverCake offers publishers the ability to track and collect data from multiple social media platforms and output the data in the form of graphs and an aggregate score based on the total number of comments being made about a book on social media platforms. The service can even track the popularity of genres as well as that of publishers or individual authors based on tracking comments made on social media, from Amazon reviews to GoodReads, Twitter, and Facebook. Founded by Sujee Maniyam and Brian Sathianathan, CoverCake has signed a deal with Books-A-Million to use the CoverCake technology to enhance its merchandising and customers' shopping experience.

Maniyam, who has worked as a Big Data consultant, emphasizes that tracking books on social media is all about Big Data. "What would be Big Data for Google or Facebook isn't what it will be for everyone else," he says. "Any file size that can push the limits of computing is Big Data."

CoverCake, Maniyam says, operates to some degree like the brand monitoring used by big brands like Coke, Pepsi, or General Motors, "but it's more complicated for books because each book is different. There are thousands of books and we have to treat each book as a separate brand."

Maniyam points to a truth about book marketing these days: "Readers are increasingly online and very influential on social media, and publishers are actually engaging social media users." How do publishers decide who they should reach out to send advance galleys or monitor? Maniyam says that CoverCake "can find the 100 social media users a publisher wants to engage." He points to the book and movie promotions around the bestselling YA series the Hunger Games, noting, "70%–80% of the social media conversation is about the movie, but we can pull out who is talking about the movie and who is talking about the book."

CoverCake has now started tracking blogs, and Maniyam says that, like Bookseer, they are also tracking media outlets and can map the effect of a wide variety of media appearances on sales channels. "The publishing industry doesn't generate data like a telecom," Maniyam says, "but with so many readers online, it's getting closer. Publishers may think they don't have to deal with Big Data, but really it's that they just don't need to build an in-house system—that takes serious engineering."

Looking forward, he says, "We're building a next-generation system. Publishers won't need to have three or four accounts to track all this stuff," he says. "They need to focus their marketing dollars on TV and on social media and we believe we can give them guidance on how to do it."