Overview of Big Data Security
It is expected that data volumes will rise ever higher in future. A study from IBM predicted that there would be as many as 2.72 million data scientist workers ready to support companies cope with this amount of data, and this is proven accurate. The increased usage of big data would affect the understanding and application of business intelligence and its security by Organizations.
Security is a significant concern these days, independent of fields and technologies. As compared to other areas Big Data to have securities issues and attacks happening every single minute, these attack can be on different components of Big Data, like on stored data or the data source.
So what is Big data security? Its the collective term for all the measures and tools used to guard both the data and analytics methods from attacks, theft, or other malicious activities that could cause a problem or negatively affect them. Like other forms of attacks, the big data can be compromised either by attacks originated from online or offline spheres.
What are the Challenges in Securing Big Data?
Below are some common challenges –
- Vulnerability to fake data generation
- Struggles of granular access control
- Often “points of entry and exit’ are secured, but data security inside your system is not secure.
- Data Provenance
- Securing and protecting data in real-time
Every company has big data in its future, and every company will eventually be in the data business.
Author, Thomas H. Davenport
Why is Big Data Security important?
Today almost every organization is thinking of adopting the Big Data as they are seeing the potential and utilizing the power of Big Data, they are using Hadoop to process these large data sets. And securing your data is the most important step they are concern about, independent of organization sizes, everyone is trying to secure their data.
As a Hadoop process, multiple types of data which are combined and stored in Hadoop data lake, and then the stored information is processed accordingly. As it saves a different kind of data from various sources, so we need to make security essential as almost every enterprise that are using big data has some form of sensitive data, which need to be protected. Sensitive data can be the user’s credit card details, banking details, passwords.
As we know, big data is not small things, and also we can’t describe it in the context of size, as the size is one of the main features of Big Data. So to secure it, someone can construct various strategies like keeping out unauthorized users and intrusions with firewalls, making user authentication reliable, giving training to end-user training, and many others.
Big Data Security Best Practices
Today large and variety of data are collected and stored than ever making big data a solution to every industry’s need. And securing big data is not such an easy task, but we can formulate some best practices to secure it, below are some best practices for big data security –
As discussed above the flow of big data, its comprised of three layers: Incoming, stored, and outgoing data. So we can construct our security accordingly, and make it more secure.
- Continuously monitoring and auditing all access to sensitive data(firstly find out all the sensitive data).
- Alert and react to intrusions and unlawful actions in real-time.
- Use the Latest Antivirus Protection.
- Secure Data Storage, we can implement a technique called secure untrusted data repository (SUNDR).
- Reliable hardware and software configurations.
- Implement the principle of least privilege (PoPL).
- Implement the data-in-transit and at-rest encryption throughout the Hadoop clusters.
- Enable encryption for NoSQL database and all entry points.
- Implement intrusion protection systems and intrusion detection systems.
- Safeguard data by data encryption while at rest
- Fulfil appropriate logging mechanisms
- Use proper federation of authorization space
Hadoop File System (HDFS) is a distributed file system. All types of files can be stored in the Hadoop file system.
Taken from Article, Introduction to Data Serialization in Apache Hadoop
Strategies for Granular Access Control, some are listed below –
- Point out mutable elements and immutable elements.
- Access labels should be maintained, track admin data too.
- Use single sign-on, and maintain proper labelling scheme.
- Perform audit layer/orchestrator
Big Data Security Tools
Big Data security should meet four critical criteria – perimeter security and authentication framework, authorization and access, data privacy, and audit and reporting.
Authentication – Required for guarding access to the system, its data and services. Authentication makes sure the user is who he claims to be. Two levels of Authentication needs to be in place – perimeter and intra-cluster – Knox, Kerberos
Authorization – Required to manage access and control over data, resources and services. Authorization can be enforced at varying levels of granularity and in compliance with existing enterprise security standards.
Centralized Administration and Audit
It is required to maintain and report activity on the system. Auditing is necessary for managing security compliance and other requirements like security forensics. – Ranger
Data at rest/in-motion Encryption
It is required for control of unauthorized access to sensitive data either while at rest or in motion. Data protection should is considered at the field, file and network-level, and appropriate methods should be adopted for the security – – HDFS and Wire encryption.
Compressive Approach to Big Data Security
Due to advance technology adoption, enterprises are concerned to assure their big data approach. Hence, enterprises are looking for avenues for Big Data assurance. To implement a holistic approach for Big Data security, please go through with below steps:
- Learn more about Big Data Security Compliance.