Computers and History: Beyond Science Fiction

A recent BBC article asks the provocative question: can computers replace historians?

But here is the biggest claim so far – crunching through the big data of history can help us spot patterns and work out where the world is heading next. That is what Kalev Leetaru, a data scientist at Washington’s Georgetown University, believes may be possible. Using a tool called Google Big Query, designed for interrogating vast collections of data, he has been sifting through a database of events stretching back to 1979. This is GDELT, which has collected media reports of events from innumerable sources in more than 100 languages for 35 years. “What we did here,” Leetaru explains, “was use this tool to shove in a quarter of a billion records and use this massive piece of software to just in a few minutes sift out the patterns in this data.” What he says he found was complex patterns of events repeating themselves over the years. He has looked at recent events in Egypt , in Ukraine, in Lebanon and tried to draw common patterns. …

This project does sound like science fiction and indeed Leetaru talks of Isaac Asimov’s writings about psychohistory as an inspiration. But the idea that computing power and artificial intelligence can now start replacing some intellectual disciplines as well as routine physical work will no doubt be greeted with anxiety as well as scepticism.

Skepticism arguably ought to be default response, as the article is premised on a fundamental misunderstanding of what history is. We can see why when we look at the proof of concept post the BBC piece links to, titled “Beyond Psychohistory.” For those unfamiliar with Isaac Asimov’s science fiction novels, psychohistory is a method utilized by future societies in Asimov’s Foundation series to analyze history with the mechanical precision of classical physics. All science fiction, however, is a creature of the science of its time. And the science of the 1940s that animated Asimov’s ideas has now been superseded by new discoveries. Peter Turchin, a biologist turned historian, summarizes the defects of Psychohistory in a recent post:

Asimov wrote Foundation in the 1940s – way before the discovery of what we now call ‘mathematical chaos.’ In Asimov’s book, Hari Seldon and psychohistorians develop mathematical methods to make very precise predictions years and decades in advance. Due to discoveries made in the 1970s and 80s we know that this is impossible.

In Asimov books Psychohistory, quite appropriately, deals not with individuals, but with huge conglomerates of them. It basically adopts a ‘thermodynamic’ approach, in which no attempt is made to follow the erratic trajectories of individual molecules (human beings), but instead models averages of billions of molecules. This is in many ways similar to the ideas of Leo Tolstoy, and indeed to cliodynamics, which also deals with large collectives of individuals.

What Asimov did not know is that even when you can ignore such things as individual free will, you still run against very strict limits to predictability. When components of a dynamical system interact nonlinearly, the resulting dynamics can become effectively unpredictable, even if they are entirely deterministic. For complex systems like human societies this possibility becomes a virtual certainty: they are complex and nonlinear enough and, therefore, must behave chaotically and unpredictably. This is, by the way, why weather cannot be predicted more than a few days in advance (and in Connecticut, where I live, not even a day in advance).

Analytic predictioneers such as Jay Ulfelder do not think about societies in terms of what Leetaru dubs “psychohistorical equations governing all of human life.” Asimov assumed a certain scheme of aggregation (the “thermodynamic” approach) that does not accurately model the connections between components and subcomponents in the system. See, for example, the way Ulfelder looks at political risk scenarios in China. China is not simply the sum of all Chinese, but rather a system of coupled and interactive components. Furthermore, Turchin suggests that Asimov, as a fiction writer, grew uncomfortable with the implications of the Seldon Plan and psychohistory for human free will. Thus, Asimov added devices like The Mule, a being that could negate the powers of psychohistory by derailing the future that Seldon predicted. As it turns out, we are all Mules.

Consider the “golden triangle” that artificial intelligence and cognitive science researcher Ron Sun defines in his book on cognition, computers, and society for an example of why this kind of perspective is not particularly useful for analysts. First, humans have needs, which arise prior to strategic behavior. We utilize conscious thought to realize those needs in environments. Yet our actions also change the underlying environment as well, which consists of both social (other humans) and natural (both artificial and natural systems that humans inhabit). There are multiple possible ways of conceptualizing the links between all three elements of this triangle.

If an analytic tool really did exist that could model humans with all of the determinism suggested by the “thermodynamic” conception of history in Asimov’s book, we would live in a dramatically different world. One perhaps devoid of many of the things we like to think make us human. This doesn’t mean we ought to simply give up and avoid trying for prediction and causal regularities. But what it does mean that we ought to look to science, not science fiction, when we build tools and market them to industry, government, and academic users. Part of a scientific mindset is the recognition as well that analysis, contrary to the BBC article’s title, is not solely an engineering process.

Take, for example, FiveThirtyEight.com’s woes applying GDELT to Nigerian kidnappings. As Caerus Analytics chief Erin Simpson noted, the context of the data matters: “[l]earn the data generating process. Learn the coding rules. Match it against some real world reporting. THEN publish #GDELT.” Ulfelder had argued much earlier that both the data generating process and the data itself had significant problems and uncertainties. Hmm….understanding how data is generated and categorized? Understanding the significant issues with the data itself? Sounds like a job for some of those historians that BBC thinks computers might put out of business. :)