Ben Blatt's fascinating new book, Nabokov's Favorite Word is Mauve, combines statistical analysis and literature. Using a database of thousands of books and hundreds of millions of words, Blatt answers everything from what are our favorite authors' favorite words to which contemporary writer uses the most clichés to the controversial topic of adverb usage. Read on for a look at some of Blatt's findings.

The first literary mystery to be solved by numbers was a 150-year-old whodunit finally put to rest in 1963. Two statistics professors learned of the long-running debate over a dozen contested essays from The Federalist Papers, and they saw that they might succeed where historians had failed. Both Alexander Hamilton and James Madison claimed to have written the same 12 essays, but who was right?

The answer lay in how each writer used hundreds of small words like but and what, which altogether formed a kind of literary fingerprint. The statisticians painstakingly cut up each essay and counted the words by hand—a process during which “a deep breath created a storm of confetti and a permanent enemy.” And by comparing hundreds of word frequencies, they came up with a clear answer after so many years of speculation: the contested essays were distinctly the work of James Madison.

Since 1963, similar methods have continued to yield major findings. Take, for instance, last year’s revelation that Shakespeare collaborated with Christopher Marlowe. And in the meantime, the technology involved has leapt from scissors and paper to computer and code, giving rise to a whole new field of study—the digital humanities.

In my new book, Nabokov’s Favorite Word is Mauve, I use simple data to whiz through hundreds of classics, bestsellers, and fan fiction novels to explore anew our favorite authors and how they write. I uncover everything from literary fingerprints and favorite words and tics, to the changing reading level of NYT bestsellers and how men and women write characters differently.

If you have a body of literature, stats can now serve as an x-ray. Here are a few fascinating examples from Nabokov’s Favorite Word is Mauve:

Writing Advice

There is a lot of writing advice out there. But it’s hard to test, and it’s often best to judge someone not by what they say but what they do. Novelists may tell their adoring fans to do one thing, but do they actually follow their own advice? With data, we can find out—looking at everything from the overuse of adverbs to Strunk and White’s advice against qualifiers like “very” or “pretty.”

One of my favorite examples comes from Elmore Leonard’s 10 Rules of Writing, where Leonard offers the following rule about exclamation points: “You are allowed no more than two or three per 100,000 words of prose.” A writing rule in the form of a ratio is a blessing for a statistician, so I ran with it. Does Leonard practice what he preaches?

From a strict numerical view, no. Leonard wrote over 40 novels which totaled 3.4 million words. If he were to follow his own advice he should have been allowed only 102 exclamation points his entire career. In practice, he used 1,651—which is 16 times as many as he recommends.

But looking deeper, we find that Leonard did follow the spirit of his own rule. Below are 50 novelists, representing a range of classic authors and bestselling authors. Elmore Leonard beats out everyone.

And the picture gets even more interesting when we look at how Leonard’s use changes over time. The chart below shows the number of exclamation points that Leonard used in each one of his novels from the start of the career. He loved the exclamation point as a novice, but he slowly weaned off of it over time.

Interestingly, after he delivered his exclamation point rule in 10 Rules for Writing, his use decreased even further (the one exception was Leonard’s sole children’s novel). He may have been a zealot: no one I looked at uses exclamation points at a rate lower than two or three per 100,000. But Leonard practiced what he preached: he got closer to his magic ratio than any other writer, especially in his final stretch of novels.

He was also on to something. I parsed through thousands of amateur fan-fiction stories online and found they were not only enthusiastic about their story universes, but for exclamation points as well. The average published author relies on about 1/4th as many exclamation points as the average amateur writer.

How Cliché

The book world loves a good list: bestsellers, award winners, “best of the year” lists. But what other superlative lists are there to uncover out there in the literary world? In Nabokov’s Favorite Word is Mauve, I decided to ask: who uses the shortest sentences, the most adverbs, writes at the lowest grade level or relies on the most clichés?

I took all expressions mentioned in the 2013 book by Christine Ammer titled The Dictionary of Clichés. These are phrases like “fish out of water,” “dressed to kill,” and “not one's cup of tea”—4,000 phrases in total. To my knowledge, Ammer's book is the largest collection of English language clichés. I then scanned through the complete bibliographies of the same 50 authors mentioned above to see who used the most clichés.

The answer: James Patterson.

You’d expect some recency bias in the dictionary of clichés (Jane Austen’s characters, unfortunately, weren’t ever described as “dressed to kill”). So I also looked at every single book that ranked on Publishers Weekly's bestselling books of the year since 2000. James Patterson can't blame his time period alone. Even compared to his contemporaries in genre and time, Patterson comes in with five of the 10 most clichéd books. He’s clearly making it work, though. Of those PW lists, Patterson has 16 books, more titles than any other writer.

Start with a Bang

In response to a question on Twitter about her favorite first sentence in literature, novelist Margaret Atwood answered: Call me Ishmael. “Three words. Power-packed. Why Ishmael? It’s not his real name. Who’s he speaking to? Eh?”

Atwood emphasized the brevity of the Moby-Dick opener, and she is similarly concise in her own work. I compared the median length of her opening sentence to that of the 50 authors in the exclamation point chart above. Only one author, Toni Morrison, beats her out.

Opening sentences are far from an exact science, but keeping them short and powerful by rule of thumb is a smart place to start. Drawing from a range of sources, I assembled a list of the consensus top 20 opening sentences in literature. And of that list, 60% of the openers are short when compared to the book’s average sentence length.

But when you look a much wider sample of literature, most authors in practice opt for long openers. In 69% of all of the books I looked at, the opening sentence is longer than the average sentence throughout the rest of the book. It might be that authors like Toni Morrison and Margaret Atwood are on to something as they keep their openers “power-packed.”

Beach Weather

In one last example, let's return to Elmore Leonard's 10 Rules of Writing. His first rule (#1!) is “Never open a book with weather.” Apparently Leonard had strong feelings about this trope, but anyone who’s ever heard too many plays on the old saw, “it was a dark and stormy night,” will know where he’s coming from.

Leonard again lives up to his own advice. But there’s one author who completely flouts it, and it’s an example I love.

Danielle Steel, known for selling hundreds of millions of books, should also be known for talking about the weather. She started her first book off “It was a gloriously sunny day and the call from Carson Advertising came at nine-fifteen.” She’s never looked back.

Nearly half her of introductions involve weather—mostly benign, positive weather (“perfect deliciously warm Saturday afternoons,” “perfect balmy May evening”, “absolutely perfect June day,” or simply: “The weather was magnificent.”). But like Patterson she has made her rule-breaking choice work. It’s a distinctive style that’s all her own—and it’s a quirk that at least this reader would never have been able to pin down without having been able to run the numbers first.