Until recently, data science was a mostly academic pursuit and the subject of papers rather than practice. Over time, data science became an applied science with data scientists being paired with data engineers to develop production systems. We are now entering a new phase where much of the work being performed by data scientists (hyperparameter optimization, algorithm selection, etc.) is becoming automated.

The rise of APIs
Not surprisingly but somewhat ironically, in this age of automation, the question has arisen as to whether human beings will be part of the process of analyzing and uncovering data or if machines will be carrying that task out themselves.

In fact, the very work that data scientists have been called on to develop is today helping to replace them. Automation is removing the need for developers to be paired with traditional data scientists. The vehicle that is accelerating this transition is the API or Application Programming Interface, the mechanism by which different software platforms talk to each other.

8 top open-source community and data tools
The value of embedding data virtualization

So far, a number of companies have launched automated machine learning APIs that succeed in presenting data science as an API offering, for example, AWS Sagemaker and Google Cloud AutoML.

Melvin Greer, chief data scientist for the Americas at Intel, has observed that data scientists spend about 85% of their time doing prep work.  Solutions like AWS Sagemaker or Google AutoML, radically change the role of the data scientist by taking on the heavy lifting required to build, train, and deploy machine learning (ML) models quickly, making these capabilities simpler and allowing developers with limited machine learning expertise to implement high-quality models specific to their company’s business needs.

The role of humans
For the last few years, we have been reading news stories about the data scientist shortage. In reality, those headlines might be overblown. While it’s true that there’s a great demand for this talent, and students are actively pursuing this field in the hopes of capitalizing on a lucrative career, as APIs increasingly get more sophisticated, fewer businesses will need to rely on (and want to pay for) the traditional (and expensive) data scientist skillset.

That role, the kind with a Ph.D., is starting to morph because anyone with access to an API can take on tasks that were once handled exclusively by data scientists and attain the same results. A data engineer who builds the data pipeline will no longer need to work with data scientists; they will just need to have access to an API. While the job of the data scientist won’t become obsolete, this level of automation will allow them to focus on higher-value and less technical work, such as helping companies identify new opportunities to grow the business, defining business problems, and figuring out how they can make the best use of the data they have. This move to automation will also likely prompt a return by data scientists to academia and the pursuit of academic research initiatives. 

The future of data science: Machine scale vs. human scale
With the era of big data, the demand for employees who could work with new tools and scale of data became paramount. And while companies have ramped up, they have come to the realization that the increasing demand for data, the amount of data that is being generated, and the ability to achieve strategic outcomes with that data, surpasses human scale.

The commodification of the data science layer, however, has now moved the battle to the data layer itself. Businesses are now looking for ways to access unique data to feed into their machine-learning or artificial-intelligence systems.

As these systems become more robust, humans will rely on them more. Meanwhile, as we are already seeing, businesses are recasting themselves as data companies — where companies like Allstate are no longer characterized as an insurer, but rather a “customer-centric data company.”  The motivation is there for many more companies to become data-driven and given the sheer volume of data, automation of the capabilities of data scientists is inevitable.

Gartner had predicted that by 2020, more than 40% of data science tasks would be automated.  While we aren’t quite there yet, as businesses make the leap from big data to AI, and automation becomes increasingly sophisticated — with every major cloud vendor already investing in some type of Auto ML initiative — fewer organizations will need the traditional data scientist, and data engineers will be able to harness the power of a Ph.D. through APIs.