- 16th April 2018
- Posted by: Manolis
One of my strongest beliefs is that companies that learn to make the most of their data by effectively building, managing, and evolving their data supply chains will gain a lasting competitive advantage. With so much data now available, companies have to treat their data as one of their most valuable assets. These data supply chains must operate as smoothly as any other system or distribution network.
Yet data supply chains present unique challenges. It’s very difficult to get a data supply chain working seamlessly because it must gather data from many sources, distill it into a useful form, and then be able to deliver the specific subset as needed to the business. Data is not one-size-fits-all, so your data supply chain must be as flexible as your data is diverse.
To build the best data supply chains, companies should recognize an asset they already have in their inventory. And it’s one they often overlook, as there is one repository at almost every company that is woefully underutilized as a source of business insights: Backups.
Backups don’t just have to sit on a shelf and be pulled in only when other data is lost. In fact, they can drive innovation. How? Well, the whole process of what is now called data protection has become far more sophisticated. In this story, we’re going to use Commvault as an example of how data protection systems have created a central and comprehensive repository of data that can not only serve as a backup, but can also be the foundation for new ways of using data to create value.
In other words, we will explore how a modern data protection platform can help you build and run a data supply chain that supports new types of apps, AI, and data science.
How data protection has become a comprehensive data platform
In the past, data protection was all about backups. We all remember floppy disks and how no great late 80s tech movie could avoid involving some drama about the state of a backup. But for the enterprise writ large, backups have served as a key form of insurance. The whole backup system existed as a worst-case scenario setup, a way to transfer data to a safe place and then restore it if something went wrong.
But we need to expand how we think about backups to catch up with today’s technology. In the modern world, data protection platforms have gone far beyond traditional backups in the following ways.
Creating metadata catalogs. Today, a massive amount of metadata is captured, so companies know much more about where data came from and how it is being used. These catalogs can help companies:
- Analyze data usage
- Understand growth of data
- Track down data
- Observe and monitor data sprawl
- Establish thresholds and institute alerts about capacity limitations
- Use REST APIs to add data to a dynamic index (for example, adding GPS data to an entity such as an asset)
Using data crawls. Data protection platforms can also empower companies to crawl their data and create an index of the results usable by anyone in the business, to find and categorize people, products, locations, and other vital information, such as:
- Entity identification and extraction
- Harvesting of data related to a particular analysis or AI use
- Identification of data needed for regulatory compliance
Establishing better search functionality within the data. Data protection platforms can create inverted indexes to make their data more searchable. Commvault’s dynamic index creates such indexes to make searches go faster.
Serving as a transformation engine. The data within the platform can help to drive innovation across the business, as its accessibility allows users from data science to development to:
- Work with data masking
- Perform live Dev/Tests on cloud data
- Employ appropriate redaction techniques on data, while still being able to use data while it’s live and relevant
Operating as a workflow engine. Once the platform is fully operationalized, companies can create workflows using visual coding and simplified methods to automate to expedite processes, including standard workflows and processes as well as third-party integration with platforms such as ticketing systems.
Analyzing the use of data over time. Finally, because of the nature of data protection platforms, users can get multiple viewpoints of the same dataset across time to see what has occurred with it. Such temporal analysis offers valuable insights.
What these platforms and data lakes have in common
When we look at the capabilities a data protection platform like Commvault offers, we see that it has many properties that people have been striving to gain from data lake projects, such as:
- All important data kept in a repository with a common metadata layer
- Ensuring data is indexed and searchable
- The ability to run transformation jobs to analyze and distill data, and to use a workflow engine to manage execution of such jobs
- API access to data, supporting processing and retrieval
Granted, there some key aspects of data lakes missing from data protection platforms, such as programming models for creating and running advanced analytics, and the ability to create new engines such as SQL engines and other machine learning technology that runs on Hadoop.
But when you include data protection platforms as part of your data infrastructure, you gain a tremendously powerful component in a data supply chain. The platforms might not do everything, but they do a lot, and no one data repository can actually provide companies with everything they need.
Putting a data protection platform to work
Now let’s imagine how applications, AI, and data science can be all made more powerful with a data protection platform. Here’s what these platforms provide.
Understanding what you have. You have a comprehensive view and index of your data. There’s no more guessing about what you have and what’s missing. This can be helpful, for instance, when you’re in an app and want to know everything about a customer, or in a data science context and need context about the data. The platforms provide a metadata repository that aids understanding.
Getting access to all the data. Because of its basis in providing data recovery, data protection platforms have all your data. Once you’ve understood there might be something interesting in a particular dataset, the platform can give you direct access to the data itself and not just the metadata. This is a huge advantage as you can get access to a lot of data that you couldn’t access otherwise. This expedites results, as applications, AI, and data scientists don’t have to wait around for data to be delivered — it’s readily available.
Extracting nuggets. Data protection platforms break through barriers. We all know that some data is harder to find and mine for value than others. By consolidating all your data in one place, this ornery data becomes more manageable. For instance, if you want to find all the places in your data where a product or customer was mentioned, you can run a crawl through the platform and retrieve relevant data, and use it to feed analysis, apps, or AI.
Looking back in time. As I mentioned earlier, a temporal analysis that companies gain from data protection platforms is invaluable. You can see how data is changing over time, monitor key trends, document and track changes, and perform analysis based on this information, allowing you to make better decisions based on historical data.
Performing metadata analytics. The same temporal analysis can also be used on your metadata. Companies can look back at all metadata and understand the changes and relationships between data sets, as well as who has accessed data and when to get a better sense of the most vital data to the business.
A backup plan that is anything but
The great thing about a data protection platform is that it is created and updated automatically. Companies still have to work on the data to distill it and put it to use, but with such a platform, you’re starting with an incredibly powerful view of all the important data in your enterprise in one place.
Data protection platforms offer ready access to a vast amount of historical data that can add an untapped dimension to your data supply chain. In my view, app developers, AI experts, and data scientists who have access to a data protection platform will crush those who don’t have access to one.