Everyone is talking about the data scientist. Harvard Business Review called it the “Sexiest Job of the 21st Century“. Friends from across industry and academia have also been using this term and I have met so many real data scientists that are making big differences for their organizations.
Except in the federal space. A near-universal gap exists across all Government Organizations these days. There are no real data scientists. Everyone needs one, and almost no one has one.
An ideal Data Scientist would be someone who understood the core mission and needs of the organization. He would have an advanced degree in computers. He would be an accomplished statistician. He would understand the legal and political implications of everything that he did. He would know how to access all the old legacy data that is hidden away in the data-center, wrapped in formats that are no longer accessible. He would be a visionary who could see what the future needs would be, and he would pick all the right data-recovery tools to enable it. Oh by the way, he would also be frugal and stretch the budget so that his unfunded requirements could, in fact, be funded.
Why is this position so important? Even though our technology is getting better and better, the gap between meaningful data and the end user is getting bigger! The more money we throw at it, the worse the problem appears to be. Ask any of the Federal CTO’s and you will probably get the same answer: trying to get the right information to the right people is getting harder every year. The problem isn’t collecting the information. It’s sifting through it with clarity and precision.
As we apply one solution on top of another solution, is there really anyone left in the organization that fully understands the assumptions that have been made along the way? Can anyone really say, with certitude, that they know HOW the algorithms sorted the data and served up the answers? Only our above-described Data Scientist would have a clear understanding of these important filters on the information, and what the impact is on the end users.
What will fix this? The biggest points of optimism we have seen are technologies that allow a focus on humans. For example, when it comes to infrastructure, foundations based on Cloudera can be configured to ensure human, mission-focused capabilities like search, data providence, and rapid iterative query can be built in. And tools like Pentaho can be used to ensure all the data is connected and that it is presented to users in very user friendly ways (Comprehensive, unified solutions like this that pull together all of todays technologies will be a huge help. ). This focus on users is what will empower agencies. This is not going to turn every analyst into a data scientist. In fact, it might actually remove the need for data scientists in the federal space. Sure we can have statisticians, and subject matter experts, and computer scientists.
But if tools focus on users, do we really need data scientists?
Here are some more thoughts via friends on Twitter:
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
— Josh Wills (@josh_wills) May 3, 2012
“Want to be a data scientist? Learn: algorithms, statistics, linear algebra, SQL, Python, Java, Scala.”
— Andrea Villanes (@AndreaVillanes) April 11, 2014
What Makes the Perfect #DataScientist: http://t.co/WlKrlEZa9j by @Sooraj_Shah > #BigData #Analytics Talent with Domain Expertise
— Kirk Borne (@KirkDBorne) April 11, 2014
If you ever build a Pivot table in Excel, you are permitted to change your title to "Data Scientist"
— . (@cloud_opinion) April 8, 2014
"Actual work in Data Science entails having to speak truth to power (not fun, but the essence of the role)" @pacoid http://t.co/XHhlpZKKez
— dataScienceRetreat (@dataScienceRet) March 31, 2014
Related Reading