How Big Data Can Tell You Which Book to Read Next

If you enjoy to read, but still haven't decided on your next book to cozy up with on that upcoming rainy day, your smartphone might be able to suggest one. Artificial intelligence (AI) through big data is now able to rank literature for deciding which will be the next bestseller - a kind of recommendation system, not based on metadata, but from the patterns and themes found in books.

Publishers around the globe are mining all kinds of data, including the books themselves, in search of the magic formula for evaluating a books market potential. By more informed marketing, publishers hope to target their clients better.

Machines Think from Data

But, what is under the hood of such an AI? The answer is big data and machine learning (ML). The concept of big data doesn't just mean lots of data, but more that data comes from many different data sources and types (e.g., audio, video, images, text, etc.) that are often unstructured (unlike traditional databases with well-defined fields). ML are statistical based algorithms that utilize this set of multi-type unstructured data to predict class membership. This is possible by either knowing ahead of time which classes exist and training the ML algorithm by example (supervised learning) or letting the algorithm itself discover the underlying patterns (unsupervised learning).

ML are general classes of algorithms. Methods include embedded vector space techniques (PCA, KNN, and SVM), decision tree based techniques (CART and Random Forest), gradient and Bayesian-based methods, artificial neural networks (ANNs), and others. There are many tutorials on machine learning methods that can also be very helpful.

ANNs were amongst the first algorithms to be applied to solve problems in artificial intelligence as long ago as the 1940s. For many reasons their use has waxed and waned, yet received a resurgence of interest in Deep Learning. The unprecedented advance of Deep learning has lead to what the NYTimes calls the great awakening given Google's ability to translate text into more than 100 languages.

How AI Uncovers Sentiment and Emotions from Text

Imagine automatically extracting the sentiment or emotional impact of a literary work. To understand a text, what is called natural language processing (NLP), AI algorithms first find a mathematical representation that a machine can understand and that contains maximal information about the text. A simple representation, called "bag-of-words," as the name implies, is a collection of words that appear together, but with no other particular nexus, from which the frequency of word-groups could be ascertained. This may provide enough information for classifying themes, but would fail miserably of understanding sentences if word-order is important. Two representations that can quantify information associated with sentence word order are Word2Vec and GloVe. For issues required for NLP representations, see this tutorial, while a tutorial from TensorFlow on Word2vec is found here.

Once sentences are converted to a meaningful representation, a language model is needed that discerns positive emotions from negative emotions. One method would be to use a supervised learning procedure with deep neural networks, as has been done to understand movie reviews. Another way is to allow the deep neural network discover the emotional patterns by itself. This is the true power behind deep learning, its ability to teach itself, and with more big data, learn more!

By the end of this process, the ML can understand the major themes (from the word groupings) and the emotion throughout the text being analyzed. For the AI application that will recommend a novel, these factors are the fundamental ingredients.

Recommending the Popular Novel from Big Data

So, how does the AI determine what we want to read? It turns out that we need certain emotional patterns to keep us engaged and interested while reading a novel. Kurt Vonnegut first described the curves of emotional plotlines in 1995. Now, with the help of AI sentiment and emotion analysis, such plotlines can be extracted quantitatively. By combining these plotline curves and Jodie Archer and Matthew Jockers from the Stanford Literary Lab claim to be able to detect the next novel everyone wants to read. From creating Animal Farm summaries to discovering who will be the next Danielle Steel, AI is revolutionizing what and how we will read in the future!

Follow Me

Carol M. Evenson

Data Security Consultant at Evenson Corporate Consulting
Carol Evenson is a data security consultant specializing in cloud management and process analysis. She currently assists organizations within the continental US and UK.
Follow Me

Leave a Reply