- 16th April 2018
- Posted by: Manolis
The field of data science is fairly young and evolving extremely rapidly. Finding people who can harness the tornado of big data tech is a major challenge. One of the up and coming vendors who are making data science more accessible is Domino Data Lab.
Datanami recently talked with Nick Elprin, the co-founder and CEO of Domino Data Lab, a data science software company based in San Francisco. Here is an edited transcript of the conversation.
Datanami: Hi Nick, thanks for talking with us today. Please start out by telling us about Domino Data Lab, and your vision of what a data science platform should be.
Nick Elprin: We launched our product in beginning of 2014, so it’s almost been four years that we’ve been in the market. When we launched, we didn’t see other companies, other products talking about the problems we were solving. About two years ago we saw more competitors coming into the space.
From our perceptive, we’re seeing the market mature. What we’re seeing with the companies we’re working with is a shift away from being focused on individual data scientists. It used to be, OK we’ve got to hire data scientists, we’ve got to make sure the data scientists have access to great tools and algorithms. Now it’s much more about, as an organization, how do we create more scalability, repeatability, more industrialization of our process for doing data science so that we can scale, so we can add more people, get commensurate more impact.
We see the tradition being like what software engineering went through in the 80s and 90s where it really matured as a discipline. That’s the whole thing we’re trying to facilitate.
DN: How do your customers even know that what they’re looking for is a data science platform?
NE: When we first launched, we had fairly visionary customer, classic early adopters, reach out to us. They told us they were encountering the problems that we address, and they hadn’t seen anybody solve them yet. They would look online, do Google searches, and find us.
In the beginning, they didn’t even necessarily have the words to describe the problems they have. But they would say things like “We need ways to increase collaboration among data scientists” or “We need ways to experiment and develop models faster” or “As an organization, we’re having a hard time scaling because our data scientists re-inventing the wheel instead of building upon each other’s past work,” things like that.
At the beaning of this year, Gartner did a Magic Quadrant on data science platform, and that really represented a shift in how mainstream this is becoming. After Gartner called us a visionary in that Magic Quadrant, more organizations knew about us, found us. So they’re reaching out to us proactively to say, we need something like what you guys are doing.
DN: What is Domino Data Labs’ vision for a data science platform?
NE: If you are a business and you want to actually take advantage of building data science models, building predictive models to automate parts of your business to do product recommendations, set pricing, optimize your supply chain – it’s very fertile ground for ways to use these new techniques.
If you want to do that, fundamentally you need two things: You need a way to enable a team of data scientists to rapidly build and experiment and iterate to develop these models. And then you also need to deploy them, to operationalize them, to productionalize them, so they’re actually impacting business decisions.
DN: How would you compare today’s data science platforms to the BI tools from the 90s and 00s?
NE: They (data science platforms) are the center, the hub, the nexus, where everting related to being a model-driven business or using predictions and using models to drive your organization – where all of that happens.
With business intelligence, you had analytics and executives looking at dashboards. But the best you could do was to present information to a human for that human to make a decision. And what data science is allowing is automated decision making. But for that work, these models that are making the recommendations, making the predictions, they have to be integrated into the existing technology systems making decisions.
If you’re an insurance company and have a new way of pricing a policy for someone who applies for car insurance, that model using all the great machine learning techniques have to be integrated into the system that generate quotes and set prices and assess risk
So our vision of a data science platform is the system that allows data scientists to rapidly develop and experiment and build great new models. And it’s also, in the same place, easy ways to deploy and operationalize those models so the business can very rapidly integrate these new more powerful capabilities into their business.
DN: We have a ton of great technology and a plethora of machine learning frameworks. So what’s the stumbling block for data science? Is it stringing it all together?
NE: That’s exactly our point of view, which is the real problem is not the absence of a modeling framework. Microsoft has one and Amazon has one and Google has some and there are other product companies like H2O that make these great algorithms that you can download that are open source.
The big challenge is an organizational challenge. If you as a business want to accelerate your rate of testing ideas and developing more models, and you just hire a bunch more people, those people don’t have a system they can work in that allows them to effetely scale, to collaborate, to reuse past work rather than reinventing the wheel.
I mentioned software engineering earlier. There’s this famous book from the 1980s, “The Mythical Man Month.” It was all about how you can’t just add software engineers to a project and expect it to linearly go faster. You need ways of orchestrating what do people do, how do projects get done, what is a mature or disciplined way of developing software. And that led to the notion of the software development lifecycle and that evolved into agile.
Our view is that data science is going through a similar kind of maturation process. It is not just about hiring a ton of data scientists. It’s about how do we give them a place and a way to work that actually lets us scale that team and that capability.
DN: Do you have dependencies to underlying frameworks? Or do you let data scientists work at a higher abstracted level?
NE: That’s kind of the beauty of the approach we’ve taken, and we think we’re the only one on the market who does this. We’ve built what we describe as a truly open platform. What we mean by that is as any of those new modeling frameworks come out, if Google is releasing a new edition of Tensorflow or whatever they’re doing we’ve built our platform in such a way that a data scientist can use those things within our environment without us needing to change anything. We’re totally extensible, open and tool and language agnostic.
That’s beneficial for a couple of reasons. One is it makes it easy to adopt our platform. Data scientists don’t’ have to learn a modeling framework. They can keep using the ones they’re already using. It also future-proofs our platform so the companies who use us don’t have to worry if a new innovative thing comes out next year, I can onboard that and use that as well and I won’t be stuck in a legacy tool.
DN: What specifically does your data platform do?
NE: Our product does three things. The first one is it provides a workbench for data scientists. It gives them a place to more easily run experiments, test ideas, track their results, compare which results are better and worse so they can know if they’re making progress. Basically it lets them accurate experimentation, and that’s key to doing research faster and building models faster.
The second thing that we do is we have a have a way to productionalize or deploy or operationalize those models that get built in our workbench. We will automate a lot of the work required to package up the model and make it a production-grade API or service that can be integrated into business systems.
The third piece we provide is a collaboration hub. So all the work being done by individuals and teams across the whole organization is tracked, collaborative, sharable, reproducible. And that allows organization to scale so that people aren’t re-inventing the wheel. Think of it like GitHub for data science.