Genomics is a critically important discipline in contemporary biomedical research that takes a comprehensive approach to determining and understanding the structure and function of DNA to advance our understanding of life and our ability to make lives better for all. For IT professionals we can contribute to this domain through Bioinformatics.
Bioinformatics, the interdisciplinary field focused on storing, retrieving, organizing and analyzing biological data and knowledge, is an intellectually stimulating topic for technologists. There is always more to learn, and progress in this field can have wide-ranging, incredibly virtuous impact.
We have been reporting on bioinformatics in our Government Big Data Newsletter (subscribe here) since it started three years ago. Some of the first use cases for Big Data we began highlighting were in this field, and the winners of our Government Big Data Solutions Award last year were selected for an incredibly powerful Hadoop-centric bioinformatics research system at the National Cancer Institute. There is so much going on in this field.
More recently, we noted a press release from the National Institutes of Health (NIH) on the topic of Big Data Centers of Excellence. The concept of centers of excellence for Big Data solutions is also catching on in industry (for example, Sears Holdings, the firm that owns Sears, Kmart, and Land's End, has successfully leveraged their own center of excellence called MetaScale- we are interviewing MetaScale next week). So it is great seeing this activity, we are sure it will pay off for the nation and for humanity overall.
We had a chance to talk with one of NIH's thought leaders on these Big Data centers. Dr. Mark Guyer is the deputy director of the NIH's National Human Genome Research Institute (NHGRI), and is involved in the initiation of a new NIH effort in data science, known as Big Data to Knowledge (BD2K). NHGRI is the NIH component which managed the mapping the human genome and has continued to facilitate breakthrough after breakthrough in this field. The BD2K initiative is, however, much broader than genomics, and aspires to address all data types of interest to all of the components of NIH. In our conversation with Dr. Guyer we ask for more context about why the NIH has decided that centers of excellence like this are called for at this point in time and sought more context we hope will be of use to technologists seeking to design other large scale "Big Data" solutions or centers of excellence.
CTOvision: Dr. Guyer, thank you for the time today. We wanted to start by asking a few questions about the recent press release from NIH announcing big data centers of excellence. Can you give us your views on why this initiative is called for at this time?
Dr. Guyer: These centers of excellence are the first of several interrelated actions that are planned under what we call the Big Data to Knowledge or BD2K program. The need for the centers of excellence articulated in the recent release comes about from very significant advances that have occurred over the last decade in biomedical research. For most of the history of biomedical research, the limiting factor was acquiring data to analyze. Now, new technologies have been developed that can generate enormous amounts of data, so data generation is no longer the limiting factor. For example, the ability to generate genomic sequencing data has improved more than a million-fold since the beginning of the Human Genome Project. Across biomedical research, the rate limiting stage now how to operate on the data. The goal of BD2K is to enhance the ability to analyze the ever-expanding amount of data of all types, enabling, by the end of this decade, a quantum leap in the ability of the biomedical research enterprise to maximize the value of the growing volume and complexity of biomedical data. The centers of excellence will help further the discipline of data science in the community by helping to develop and disseminate innovative analysis methods, tools and techniques.
CTOvision: Seems like your organization has always analyzed data, and has long been a pioneer in analyzing large quantities of data fast. What has changed that would require centers of excellence?
Dr. Guyer: Your readers may be interested in a 2012 report from the Working Group on Data and Informatics of the NIH's Advisory Committee to the Director, NIH that provides important context and recommendations in this area. The report captures the fact that NIH has long used centers of excellence as approaches to critically important challenges in many areas, and then recommends NIH leverage this historically successful model to enhance innovation in this area.
CTOvision: We have noted several organizations in industry standing up similar activities. For example, Sears Holdings formed a subsidiary specifically to be a center of excellence for Big Data (MetaScale). Cloudera has established a program for teaching organizations how to stand up their own centers of excellence. And Gartner has been reporting on new disciplines around centers of excellence across industry. Does activity from outside government inform your approach to your centers of excellence? How so?
Dr. Guyer: Activities from outside government most definitely inform our approach, and there are many methods for this to occur. Strategically I already mentioned the NIH director's advisory board. Operationally we also leverage tried and true methods like issuing requests for information from the public, including industry (which you can find at our website here). We also regularly hold workshops and other forums to help with information exchange. By the way, we have recorded a number of recent and upcoming workshops and provide those online so others can continue to engage with us even if they can't attend in person. I would also like to underscore that commercial firms can apply for grants in this area, as long as they (like everyone else) play by our rules about open information and understand the requirements, and we encourage any who can add value to apply to our process.
CTOvision: Your mission is critically important to us all and we know these centers of excellence will be focused on supporting your research. But intuitively the knowledge the produce and lessons they learn will probably be broadly applicable, potentially being very virtuous for organizations elsewhere in government and industry. Will these centers be empowered and allowed to engage across government to share lessons learned?
Dr. Guyer: We definitely see widespread data sharing, including across government, as key to rapid progress. The power of research comes from its results being shared openly and broadly. For example, the rapid pre-publication of the data from the Human Genome Project , including from research centers in government, academia, industry and internationally, was responsible for the rapidity with which this ground- breaking effort was accomplished.
CTOvision: Thank you very much for the time, I know our readers will really appreciate this.
Dr. Guyer: My pleasure.