Challenges and Opportunities in Big Data From Industry and Academia Panel

After government executives announced 0ver $200 million in Big Data initiatives last week, a panel of thought leaders from industry and academia moderated by New York Times technology and innovation reporter Steve Lohr discussed the current state and exciting future of Big Data research. The panel helped demonstrate why the research and development initiatives were so important and what they can hope to accomplish.

The first speaker was Alex Szalay,  professor in the  Department of Physics and Astronomy of the Johns Hopkins University working with the Sloan Digital Sky Survey (SDSS). The SDSS is an attempt to map the entire northern sky, and has  already obtained multi-color3D  images of  over 930,000 galaxies and 120,000 quasars or active, highly-energetic galactic nuclei. This information has been made available to researchers and the general public, but like publicly available gene sequencing data, it is overwhelmingly large and difficult to work with. Szalay described some of the advances and possibilities Big Data analytics bring to astronomy using this deluge of information, such as measuring the resonance frequencies of the universe to  paint a picture of the Big Bang. Szalay concluded that, with the rise of Big Data in astronomy and physics, researchers must now dive into computer and data science as early as possible to build a multidisciplinary understanding.

Next was Dr. Daphne Koller of the Stanford AI lab, studying machine learning. Dr. Koller spoke about how Stanford offers free online classes to hundreds of thousands of students and, by mining data such as they watch lectures and what questions they ask, how we can discover interesting new patterns about people really learn. A better understanding of the learning process can make a tremendous global difference, as research has shown that tutored students perform better than 98% of lectured students. Information from projects such as Stanford's can help provide that personalized experience to more people who don't have the resources or opportunity to be tutored one-on-one, and machine learning can help because data driven analysis often finds patterns which human researchers miss. One example was machine learning algorithms outperforming pathologists at diagnosing breast cancer cells under a microscope by finding more patterns in a wider range of tissues. Mining the data, however, is only the first step, and often analysis is 75% of the work, highlighting the importance of Big Data analytics.

Dr. James Manyika, a director of the McKinsey Global Institute, presented next on the findings of last year's “Big data: The next frontier for innovation, competition, and productivity” report. Currently, every company of appreciable size, roughly 1,000 employees or more, has at least as much data as the library of congress, and the market is full of inefficient structures meant to compensate for imperfect understanding of data, such as the current model of car insurance which puts drivers in broad categories rather than looking at granular driving habits. Given the growth of the Big Data in industry, the report said that the American workforce would need 1.5 million more data literate workers and between 140,000 and 190,000 with deep analytics skills, a major gap. Despite all of the economic advantages to harnessing Big Data, Dr. Manyika warned about possible risks, such as losing sight of the deeper meaning behind the data. He feared that, as we get a better understanding of patterns, we will put less effort into discovering their underlying causes.

The final speaker was Dr. Lucia Ohno-Machado, the Founding Chief of UC San Diego's Division of Biomedical Informatics. Director of the Biomedical Research Informatics for Global Health Program, and Editor in Chief of the Journal of the American Medical Informatics Association. Dr. Ohno-Machado noted how patients today give a tremendous amount of data to health care providers and hospitals, data that could be of great use to researchers if assembled together, such as solving the puzzle behind autism. She proposed that patients could donate data like they donate tissue or organs, with informed consent, to researchers. If properly handled and analyzed, such data can bring great advances in knowledge without violating privacy, but, like the other panelists, she stressed the need for more multidisciplinary data scientists in her field.