Big Data = Dropping the Big One?


Eminent network scientist Laszlo Barabasi recently penned an op-ed calling on fellow scientists to spearhead the ethical use of big data. Comparing big data to the atom bomb, Barabasi persuasively argued that the technology and methodologies he and other social network theorists had created had far outstripped societal controls on its use.

Barabasi’s op-ed is part of a growing backlash against big data technologies and methodologies While Barabasi and historian of science George Dyson have the historical perspective, technical insight, and scientific stature to write insightfully about the problems of pervasive data collection and algorithms that structure human decisions, other criticisms have been less than edifying. Frustrated Harvard Business Review blogger Andrew McAfee recently called on pundits to “stop sounding ignorant about big data.” Big data, McAfee points out, is held to unrealistic standards and often the victim of strawmanning. Critics expect big data to eliminate uncertainty (spoiler: it doesn’t), falsely overestimate the power of qualitative thinking, make broad criticisms against quantification itself, and overestimate the willingness of big data advocates to automate important decisions. Listening to some critics talk, you’d think that Palantir or Recorded Future = Skynet.

While insightful in many aspects, Barabasi’s op-ed also fails to fully investigate the real implications of his Hadoop ~ ICBM analogy. Many scientists sought to influence the use of nuclear weapons, understandably believing themselves the most well-informed about the dangers they posed. However, even the most effective of their well-meaning efforts were superseded by Cold War politics. It is within the American political system — teetering between fear of terrorism, fear of big government, love of capitalism, and fear of capitalism — that big data’s societal impact will be decided. And if the rising tide of anti-science sentiment is any proof, politicians couldn’t care less about science or the men and women who practice it.  Scientists are no longer viewed as unimpeachable figures of authority — and to some extent it’s doubtful they ever really were in predictably populist America.

Second, if big data is a weapon of mass destruction, you aren’t going to see Hans Blix suddenly busting down the doors of startups for snap inspections of Apache software or NoSQL. The only thing inherently more “dual use” than offensive cyber tools are big data technologies and methodologies. They are quickly becoming an integral part of modern business, academic research, and intelligence practice. Barabasi and others are correct that in a world in which the individual is more vulnerable than ever to government and corporate usage of data science, we arguably should try to mitigate current and potential harm. The problem with analogizing data to nukes (besides the fact that Google never destroyed a Japanese city) is that the former are clumsy weapons of last resort that even bitter enemies had a stake in controlling and the latter are ubiquitous aspects of modern life.

While Barabasi and others may have pioneered the techniques industry and government demand, big data has long since ceased to be a purely academic endeavor. The men and women who use them mostly aren’t scientists. Big data is heavily driven by corporate and government needs. Even the most talented PhDs often leave the academy to pursue higher salaries and greater freedom in the corporate world. Perhaps the best big data analogy is not to the atomic science of Einstein or Oppenheimer, but to the mathematics of Newton, Leibnitz, and Fourier. Were they alive today, even these eminent scientists would be powerless to prevent their mathematics from being used for military operations research on how to kill more efficiently or from being inputted into faulty and investor-bankrupting financial models. A Taylor Series or a differential equation — once out in the wild — belongs to anyone with a pen, paper, and calculator. Likewise, with open-source tools like Python machine learning library scikit-learn, anyone with the requisite technical training can utilize some canonical data science techniques.

Big data is certainly both marvelous and terrifying. It offers the opportunity to make money, make new scientific discoveries, and enhance political endeavors from development to national security. It also puts the individual at the mercy of companies and governments. But at the end of the day it is “neither a atomic bomb nor a holy grail.” It should neither be held to unrealistic standards nor feared as a weapon of mass destruction. And everyone who cares about the ethics of data — from the scientist to the layperson — must understand that control over its use is a function of the messy and dysfunctional domestic political scene and the anarchic international system.

What do you think?

CNBC Features CTOvision’s Bob Gourley On Advanced Cyber Threat Reporting

From Crowdsourced Militias to Dread Bitcoin Pirates