- 16th April 2018
- Posted by: Manolis
Is Big Data a big threat to personal privacy? That can be an easy conclusion to draw if you follow headlines about privacy breaches caused by data leaks. But if you look beneath the surface, you see that the picture is more complicated. Data itself is not the problem, poorly managed data is. To protect both your Big Data and privacy, proper data management is critical.
Big Data and Privacy in the News
Recent reports of hacks against companies like Yahoo!, Los Angeles County and even the Navy may give consumers pause. They suggest that organizations that collect Big Data about their customers or the constituents they serve are easy targets for attackers seeking to steal and exploit private data.
If the data stored by tech giants and government agencies isn’t secured, is there any hope at all for the privacy of data collected by smaller organizations that lack the resources of these most recent cyberattack victims?
Data Is Not the Problem
The simplistic reaction to attacks like the ones described above is to assume that no data can be kept safe, and the only way to protect privacy is to avoid Big Data altogether.
But that would be like refusing to ride in cars because they sometimes have accidents, or never going outside because you might be allergic to bees.
When attacks like the ones above happen, data is not the cause or the problem. The real issue around Big Data and privacy is poor management of data.
Data Privacy and Security Challenges
In order to use data securely and protect privacy, you need to understand the privacy challenges associated with Big Data.
When you collect and analyze Big Data, you create a number of attack vectors that exist at different levels of the infrastructure that stores and processes the data. The vectors include:
- The physical and/or virtual servers that host the data.
- The databases in which the data lives permanently.
- The RAM in which the data is stored when it is not at rest.
- The network over which the data is transferred.
- The data analytics platform you use, like Hadoop or Spark.
- The user accounts that exist to work with the data.
If an attacker is able to break past the defenses that protect any one of these potential attack vectors, private data could be compromised.
Responsible Data Management is Key to Combining Big Data and Privacy
That’s why it’s so important to manage data carefully. With so many layers involved in data storage and analytics, a well-organized data management strategy is essential for making sure that data moves around efficiently and with minimal risk.
Sure, security monitoring and analytics tools are an important resource for helping to secure data, of course. But they’re only part of the solution. Security tools on their own aren’t a guarantee against privacy problems.
To mitigate the risk of a breach, you need to keep your Big Data operations as lean and efficient as possible. Data integration between different systems and formats should happen automatically in order to minimize the chance that a human error could create an attack opportunity. In order to keep your attack vector narrow, you should minimize the number of data analytics tools that you use.
This requires a data management strategy that automatically converts data from the format in which it is born into a format that a solution like Hadoop or Spark can ingest. And you should strive to deliver data analytics results in real time so that you can discard data after it is no longer relevant, thereby minimizing the risk of the data being leaked.