- 16th April 2018
- Posted by: Manolis
DJ Patil, the first U.S. Chief Data Scientist, has joined the Advisory Board of DataScience.com, an enterprise platform designed to foster scalable collaboration between business, IT and data science teams. According to William Merchan, Chief Strategy Officer of DataScience.com, “He will be an active advisor to our product and engineering teams, and with his experience at LinkedIn, Greylock and The White House, he really has a pulse on the landscape.”
But beyond the data-powered advances in artificial intelligence and machine learning, lies an important challenge for both scaling startups and modernizing enterprise companies — what good is data if you can’t find it, deploy it, or make it meaningful for your customers?
I asked Mr. Patil all the questions that a CTO, CIO or VP of Engineering would love. But, as Mr. Merchan says, “DataScience is part of a trend toward internal marketplaces of knowledge,” so these issues are also important for non-technical executives, and other business leaders to understand.” We centered around four key themes that leaders should know about building a sustainable data science program.
Forget the data scientist ‘shortage’ — focus on building the data science function.
“First, data science in business is a relatively new phenomenon. Everyone wants to hire a data scientist because there’s the potential of huge returns. Everyone wants to be a data scientist because the pay is good and the job opportunities appear to be endless.” There are a lot of industries where the ability to use data well is going to change lives — I’m a big proponent of big data in healthcare, for instance.”
Patil continues, “But in this moment, most companies should be asking, ‘how do we do more with the resources we have?’ Data scientists are often being underutilized; their work sits on a laptop unused, or a business stakeholder doesn’t understand it, or it’s not getting shared with the rest of the team to inform future projects. We need the processes to improve. Now more than ever before, there are technologies that focus on making data science a central business function rather than something executives just talk about.”
The key driver for success is how well companies integrate data science and engineering teams.
The heart of the matter is how to integrate data science and engineering workflows. Both data scientists and engineers are highly technical, but the tools are distinctive and valuable in their own ways.
“So many companies are struggling to maximize their return on their data science investment, and the divide between software engineering and data science is at the heart of that problem. If your company can’t get the work of data scientists — like the predictive models that power product recommendation engines or credit approvals — into production in a timely manner, you’re losing opportunities to improve your business, and probably quite a bit of money.”
“It’s a relatively new space, but the data science platform market is directly addressing this issue. When you’ve got data scientists who primarily code in Python and R, and use open source tools like scikit-learn, and then you ask them to hand off the models they’ve built to software engineers who write code in Java, and adhere to a waterfall model for testing and rollout, the time between hand off and implementation could be months. These two groups are never going to be perfectly aligned, but a platform that allows data scientists to deploy a model behind a REST API so an engineer can place that API code where it’s needed — without rewriting or refactoring — can be a huge time saver.”
Tools solve part of the challenge, but shifting to good software engineering habits and practices are quite another. “Every data science team needs to be committing code to a repository on a regular basis. There’s a reason software engineers strive for continuous integration: It prevents the bugs and merge conflicts that arise when an engineer waits too long to commit new code to the master branch. If you have multiple data scientists working on a single project and they aren’t syncing changes, that’s going to slow down the entire process of deploying a finished product. You need a workflow where it’s easy to track changes and update code. A software platform that integrates with data science tools and the version control solutions software engineers already use everyday, like GitHub or Bitbucket, is one possible solution.
Use interactive tools to integrate data science teams with non-technical teams.
Integrating teams is going to be a big challenge, since this will mean upending existing workflows between different stakeholders. We know that people in organizations often resist new experiences, and data science teams will have to make the case to other parts of the enterprise. I asked Mr. Patil about how data scientists can present information to stakeholders — especially non-subject matter experts — that can be more persuasive.
“I think we’re hopefully moving away from a world where data science results live in PowerPoint presentations. The amazing thing about putting data science work into production is that the results of that work become dynamic. Why give a stakeholder a static data table when you can give him or her an interactive application or a tool they can use as needed? There are plenty of tools that make it easy for a data scientist to do this — for example, Shiny is an open source package that allows data scientists coding in R to build things like web apps without any development skills. With a web app, a non-technical user can get the information he or she requires by toggling controls or filling out fields. Models built by data scientists are also being integrated with the dashboards non-technical teams already use, like call center software or marketing automation systems.”
That’s an attractive scenario. However, I’ve encountered many companies and government agencies where the data is riddled with errors, or isn’t normalized, especially if the company grew through M&A. “If there are flaws in your source data, however, you’re always going to have a problem.” He continues, “Luckily, as the availability of data increases and companies get smarter about collecting, storing, and managing it, it will be easier for data science teams to fill gaps in their analyses.”
Don’t believe the hype — humans will be necessary for a long time.
Finally, we wound back to the beginning about the headline-grabbing hype of artificial intelligence and machine learning. Having seen many hyped technologies come and go, I wondered what he thought was overhyped about data science.
“One of the predictions that keeps getting bandied around lately is that data scientists are going to automate themselves out of a job — or automate everyone else’s jobs. That simply isn’t true; if you look at cutting-edge applications of artificial intelligence at some of the world’s biggest companies, like Google or Facebook, it’s very clear that humans still need to be in the driver’s seat and in supportive roles. For example, Facebook is hiring 3,000 workers in the next year to do the things its AI-powered content filtering system can’t do.”
“On the data science side, in order to learn, AI systems need to be fed huge amounts of data by humans who understand context. In order for Google’s DeepMind to beat a human player in Go last year, researchers had to give the system 30 million game moves from human players and then pit the system against itself to generate moves capable of beating a human. What I’m saying is, we’re still nowhere near unleashing AI that creates, trains, and monitors itself with no oversight.”
“Instead, I think what we’ll see are advances in AI and machine learning that give data scientists the tools they need to do more, faster. That’s going to be a great value proposition for companies trying to bring data science into everything they do.”
DataScience.com is one of those startups that grew out of recognizing an internal need. They did not have sufficient tooling at enterprise scale that could interact with the business, data science and engineering stakeholders — so they built a company.
• Is building or buying a data science platform on your 2018 technical roadmap?
• For leaders who are considering where to make your first set of investments, do these challenges resonate with you?
• What approaches are you considering?