Artificial Intelligence (AI) continues to become a focus for many enterprises and these organizations are increasing realizing how important it is to have the right people and skills in place. In particular, there has been a significant increase in demand for data scientists. Companies are searching and competing for increasingly scarce data scientists as the talent gap for these skills continues to widen. This begs the question: are data scientists always needed by these organizations? Many companies often confuse the role of a data scientist and data engineer. While these different roles share some traits and skills, at their core these jobs have two very different skill sets that are not easily interchangeable.
Data Scientists vs Data Engineers
About a half decade ago, we saw the emergence of the Data Scientist position. Data scientists traditionally have backgrounds in advanced math and statistics, advanced analytics, and increasingly machine learning / AI. The focus of data scientists is to extract useful information from a sea of data, and glean useful insights from huge piles of information. Data scientists usually have learned programming out of necessity in order to run advanced analysis on data. As a result, the code that data scientists have usually been tasked to write is of a minimal nature (R is a common language for them to use) and they work best when provided clean data to run advanced analytics on. A data scientist is a scientist who creates hypothesis, runs tests and analysis of the data, and then translates their results for someone else in the organization to easily view and understand.
Data engineers traditionally have a programming and technology background, and have previously been involved with data integration, middleware, and extract-transform-load (ETL) operations. The data engineer’s core skills are focused around big data and distributed systems, use programming as a core function of their job, and use programming languages such as Java, Python, Scala. Data engineers are are given data from a wide range of systems in structured and unstructured formats and need to use their programming, integration, architecture, and systems skills to clean all the data and put it into a format and system that data scientists can then use. In this way, a data engineer is an engineer who designs, builds and arranges data.
Where Does the Data Scientist fit in your Organization?
Many organizations need both data scientist and data engineer roles if they are trying to address problems that require data science solutions. However the ratio between the two is rarely 1:1. For most organizations, it makes sense to have more data engineers than data scientists. Reason being, it simply takes more work to move and clean data than it does to conceptualize data models and run analyses against the data sets.
The organizational reporting structure for the data scientist is incorrect in most organizations. Data scientist roles frequently report to the technical team. However, this doesn’t make sense. The data scientist isn’t (usually) asking technology-specific, implementation-specific questions and data analyses. Often the challenges the data scientist is facing are line-of-business specific. As such, the data scientist should report to the strategic decision-making parts of the business that represent the specific lines of business that the data scientist is assisting.
If Data Scientists are Business-Centric Roles, Will we see Business-centric Tools for Data Scientists?
If we truly see data science and data engineering as separate roles in the organization, then it makes sense to think of the tools they need as separate as well. Rather than engineering and programming-centric tools, data scientists need data science-centric tools. Right now there’s a growing collection of these tools, often emerging from data or predictive analytics environments, that suit the needs of data scientists. However, it’s possible that even more business-centric tools might be appropriate, especially as the data scientists become more embedded with the line of business. For example, decades ago if you wanted to operate on large volumes of data in a spreadsheet-like format, this involved programming, but tools like Excel introduced things like pivot tables and now business managers are going hog-wild with all sorts of analyses. And, as the talent gap for data scientists continues to widen, there is no doubt that we will see new tools created out of necessity to allow non-technical (read: business) people to run, test, and analyze data. Strategic business managers will begin to learn data science, without needing or wanting programming or data integration experience. Traditional data scientists will still be needed to run very complex analysis of data. But for the most part, basic analysis will fall more to the business unit due to increasingly easy-to-use tools.
To read more about this topic and other subjects covered go to https://www.cognilytica.com/. We regularly write in detail about relevant topics related to artificial intelligence.
As a master facilitator and connector, who is well connected in the technology industry, Kathleen regularly meets with innovators in key markets and gets the opportunity to see the latest and newest technologies from game changing companies.You can learn more about her firm at Cognilytica and find her on Twitter at: @kath0134