What are Storm and Kerberos
Apache Storm Security plays an important role to manage the smooth functioning of the operational database. Before starting with how we can secure Storm with the help of Kerberos let us discuss what actually is Storm and Kerberos are:
Storm or we called it as apache storm is a distributed real-time computation system which is free and open source. As the number of IOT devices is increasing at an enormous range which results in high streams of data at a very short interval which results that we need very large data memories for storing, processing and analyzing these heavy data to get some actionable results. In this case, apache storm is the best method to deal with this situation. Apache storm helps in storing, processing, analyzing and publishing real-time data without storing any actual data.
Kerberos is an authentication protocol which helps us in providing a secure login to the network over the unsecured network. This uses the concept of the encrypted tickets and helps us reduce the amount of the password time sent through the network. This concept of keys is controlled by KDC(key distribution center). The generated ticket will be verified or we can say authentication takes place and then the secure connection will be established.
The architecture of Apache Storm
The main component of the Apache storm is the checkpoints named as spout and bolts. The coming stream of data is passed from one checkpoint to another where the filtering, analyzing and aggregation of the data takes place. After this process occurs then that filtered stream is passed for the people to view.
The streams of data are ejected by Data sources kept and then passes to spout and then bolt and finally to target and to provide Apache Storm Security
Now let us discuss the actual meanings of the components.
SPOUT: The stream of data that is emitted by the Data Source is taken by the spout. The main purpose of the Spout is to receive data from data source continuously and then transfer this received data into the actual format million tuples processed per second per node of tuples and then send this to Bolts for further processing.
BOLTS: The stream of data that is made into tuple form by spout is received by the bolts. To Perform all the processing functions on this stream is the work of bolts. The major functions performed by the bolts are filtering, joining, aggregation, connecting to the database, etc.
The main uses of this Storm are in analysis in real time, machine learning online, distributed Remote Procedure Call, and many more. We can assume the speed of the Storm is as noticed over a million tuples processed per second per node. The setting up of apache storm is easy and this will guarantee to process of the data.
How To Secure Storm with Kerberos
To provide Apache Storm Security, a common setup for handling big data projects is Kerberos. For Kerberos authentication of Storm we setup a KDC and Kerberos are configured at each node.
There are two main nodes i.e. Nimbus and Supervisor node. First of all, we have to make some changes to the storm.yaml file and then copy this changed storm.yaml on each Nimbus and Supervisor node in /home/mapr/.storm/ directory.
After this, we will generate a new Kerberos ticket that will be acting as a key to lock for entering these nodes.
Now let us see how we can implement this practically
.Storm.yaml file contains the configuration of apache storm and we need to add
yamlstorm.thrift.transport:"org.apache.storm.security.auth.kerberos.KerberosSaslTransportPlugin" java.security.auth.login.config: "/path/to/jaas.conf"
to enable Kerberos authentication.
By doing this Nimbus and Supervisor process will also be connecting to the zookeeper but we need the Kerberos authentication for this connection so we can add to the child opts of Nimbus, Supervisor
Now we have to generate a new Kerberos ticket that will be helping us to authenticate. We can generate it by the following command
kinit -kt /opt/mapr/conf/mapr.keytab -p mapr/<fqdn>@<realm>
where (fqdn) and (realm) depends on your specific environment.
The generated key will act as a ticket that will help us to log in to the secure cluster. We can log in by : maprlogin Kerberos
In case we are working on the UI phase of the storm then login is needed to be done as:
where is hostname refers to yours machine hostname and UI port are the same as a ui.port parameter which is available in a storm.yaml file.
A Comprehensive Approach
Apache Storm due to its comprehensive feature helps Enterprises to process data faster, solving complex data problem in very less time. However, providing security to secure operational data matters the most. To know more about Apache Storm we advise taking the following steps –