Can Big Data Predict the Next Bestseller?

Opennel » Big Data » Can Big Data Predict the Next Bestseller?

Whether it’s music, sports, or the IT industry, everyone has begun to harness the power of big data.

And the publishing industry is not to be overlooked. Publishing, like the music business, is built on big-ticket hits.

It’s not easy, however, to predict which books will be bestsellers. It’s remained an enigmatic art form that only the most astute critics and publishing houses can master, relying on gut instinct and educated guesses.

These faculties can be helpful to the industry at times, but when it comes to first-time authors, they almost always make mistakes.

How do you combat this problem?

If only a computer algorithm could reliably identify best-selling texts with an accuracy of at least 80%…

But we do, don’t we? The subject of a new book is the bestseller-o-meter. Jodie Archer, a former research lead on literature at Apple, and Matthew l. Jockers, an associate professor of English at the University of Nebraska-Lincoln, have written The Bestseller Code: Anatomy of the Blockbuster Novel. The claimed outcome of the algorithm is based on a 30-year track record of accurately predicting New York Times bestsellers.

The workings – How does this algorithm work?

The bestseller-ometer is an attempt to identify characteristics of best-selling fiction on a large scale, which can be accomplished by examining a large body of literature – say, over 20,000 novels. This project is a data-driven rebuttal to conventional wisdom about the secrets behind bestselling fiction. This, however, raises the possibility that, in the future, publishers will turn to this technology to help them bypass the traditional methods of selecting a potential bestseller.

The dawn of an idea – but was it really?

The algorithm developed by Jockers and Archer isn’t the first attempt to use big data to improve the quality of books. Inkitt, a Berlin-based startup that was behind the “first novel selected by an algorithm,” closely monitors reader reactions to stories posted on its web platform in order to identify potential bestsellers.

Jellybooks, a London-based company founded in 2011, tracks reader engagement in the literary production cycle right before books are published, using software that readers download in exchange for early access to a title.

The bestseller-ometer, on the other hand, distinguishes itself by combining literary scholarship with computational power. The Bestseller Code details the meticulous considerations that went into teaching a machine to read and unpack the micro-decisions involved in writing best-selling fiction at the level of diction and syntax.

The algorithms reflect the analytical and interpretive decisions made while reading a single book in depth. Repetition, word usage patterns, allusions, and thematic emphases are all things that are looked for.

The elements – What the algorithm uses

The algorithm frequently employs authoritative “voice,” as well as spare, plainspoken, often colloquial prose and declarative verbs that denote action-oriented, take-charge characters.

The other less common elements are what Archer and Jockers discovered as “narrative cohesion.” Best-selling authors are notorious for their use of narrative cohesion. For example, one-third of John Grisham’s novels are usually devoted to his signature subject – law and lawyers.

The secret to bestselling – according to bestseller-ometer  

There were also some unexpected discoveries, such as the fact that sex rarely sells. In fact, it is a polarizing topic among audiences, and it is usually limited to a small percentage of the best-selling material.

Take, for example, Fifty Shades of Grey, which was packed with heinous erotica and a surprising plot twist. As a result, this book should not have been a bestseller.

However, Jockers and Archer discovered that the book’s central theme and subject was human connection, which is a common theme in all bestsellers. Fifty Shades of Grey was a bestseller because it centered on the idea of emotional intimacy between characters.

The drawback

Publishers are often hesitant to invest in unknown authors because bestselling authors such as J.K. Rowling and John Grisham have already made their mark. That’s where the bestseller-ometer comes in. The main concern with this algorithm, however, will be that writers will be able to write pieces to satisfy the algorithm’s needs without having any literary background.

The situation can go one of two ways: a good book that deserves to be a best-seller, or a bad book that meets the algorithm’s needs but not those of literature.


Big data, as I previously stated, is engulfing every aspect of life. With increased use comes increased demand – in any industry, it appears.

In the world of big data, every day is an adventure. Working with data, detecting issues, solving problems, and forecasting the future are all part of the job. If you’re interested in this field and think it’s the right fit for you, get a big data certification and start your journey into this exciting world.