Apache Flink Security and its Deployment

Overview of Apache Flink Security

This article will give an overview of Apache Flink Security. Understanding what Apache Flink Kerberos based security aims at-

To provide secure data access for jobs in the cluster through connectors.
To validate to Zookeeper.
To validate to Hadoop components.

Kerberos keytabs are not limited to a frame of time as of Hadoop delegation token or unlike tickets cache entry. In the context of production deployment, validation to secure data sources needs to be required for a long duration. It may be days or weeks and even months. In present scenario execution of flink clusters is either done through a configured keytab credentials or with Hadoop delegation token. We can quickly launch a different flink cluster with different setting if we are using a different keytab for a specific job. There are different flink cluster that can run simultaneously in a YARN or Mesos environment.

An open-source, distributed processing engine and framework of stateful computations written in JAVA and Scala. Click to explore about, Data Processing with Apache Flink

How does Apache Flink Security work?

Conceptually, first or third party connectors (HDFC, Cassandra, Flume, Kafka, Kinesis, etc.) may be used by a flink program which requires some authentication method such as Kerberos, password, SSL/ TLS, etc.). Apache Flink provides first-class support for authentication of Kerberos only while providing effortless requirement to all connectors related to security. Kafka (0.9+), HDFS, HBase, Zookeeper are the connectors or the services that are supported for Kerberos authentication. The Apache Flink security modules (implementing org . apache . flink . runtime . security. modules . Security Module) are installed at startup. Following are the sections which describe each of the security modules.

Hadoop Security Module

The Hadoop security makes use of Hadoop User Group Information (UGI) class to build a login user context which would be process-wide. To interact with Hadoop, HBase, HDFS, and YARN, it is login user that would be used. If the security modules are enabled, the login user can have anything that Kerberos identical configures. The login user otherwise conveys only the identity of the user of the OS are that has launched the clusters.

JAAS Security Module

The component such as Zookeeper or Kafka that rely on JAAS is provided a dynamic JAAS configuration to the clusters through this module. The static JAAS configuration can also be provided by the user using the steps described in the Java SE Documentation. The static entry may be overridden by the dynamic entries provided through this module.

Zookeeper Security Module

Specific setting related to security such as Zookeeper service name (default: Zookeeper) and the Zookeeper security module configures the JAAS login context name (default : client).

The process used for analyzing the huge amount of data at the moment it is used or produced. Click to explore about, Real Time Data Streaming Tools and Technologies

What are the deployment modes in Apache Flink Security?

The deployment mode involves -

Standard mode
YARN/Mesos mode

Standalone Mode The steps involved in running a secure Apache Flink cluster in standard/cluster mode are -

The security-related configuration option is added to the flink configuration file on all the cluster modes.
Make sure that the keytab file is existing in the path as indicated by security . Kerberos. login. keytab on the cluster mode.
Deploy the flink cluster.

YARN/Mesos Mode The steps involved in running a secure flink cluster in YARN/Mesos mode are -

The security-related configuration option is added to the flink configuration file on all the client.
Make sure that the keytab file is existing in the path as indicated by security. Kerberos . login . keytab on the client mode.
Deploy the flink cluster.

Using kinit (YARN only)

It is feasible to deploy a secure Flink cluster without a keytab in YARN mode, using the ticket cache. The complexity of generating keytabs are avoided through this. The steps involved in running a secure Apache Flink cluster using kinit -

The security-related configuration option is added to the Flink configuration file on all the client.
Login using the kinit command
Deploy flink cluster

The New Security feature Includes -

Kerberos Authentication Support
Service Level Authorization
Transport Security (SSL/TLS)

Kerberos Authentication Support

There is a cluster level Kerberos identity. This is keytab based and is shared by all the jobs, thus making it not job-specific.
This enables Kerberos authentication. The examples include data servers and sinks like HDFS and Kafka.
This protects the state data.
This is supported in standalone and YARN deployment modes.

Service Level Authorization

It restricts access to your Flink cluster.
Protects all the endpoints, including control path, intra-cluster data transfer, web UI, etc.
The simple shared secret is either configured or generated. It may be stored on clients or in clusters.
It is supported in Standalone and YARN deployment modes.

Transport Security (SSL/TLS)

It is SSL for all connections.
It may be enabled on a per-endpoint basis.
It is supported in Standalone and YARN deployment modes.

Streaming is unstructured data that is generated continuously by thousands of data sources. Click to explore about, Real Time Streaming Application

Installation of Apache Flink on AWS

Amazon Web provides certain services related to cloud computing on which you can run Apache Flink.

EMR - Elastic MapReduce

Amazon Elastic MapReduce (Amazon EMR) web service quickly set up a Hadoop server. It takes care of setting up everything. Therefore, this is the recommended way to run Flink on Amazon Web Services.

Create an EMR Cluster

Make sure to set up I AM roles when creating your cluster. This allows accessing your S3 buckets if required.

Installing Apache Flink on AWS EMR Cluster

You can connect to the master node and install Flink after creating your cluster. Download a binary version of Flink matching your EMR cluster from the download page. You are ready to deploy Flink jobs after extracting the flink distribution via YARN after setting the Hadoop Configuration directory -

HADOOP_CONF_DIR=/etc/hadoop/conf bin/flink run -m yarn-cluster
examples/streaming/WordCount.jar

S3 - Simple Storage Service

The Simple Storage System using Flink for reading and writing data as well as with the streaming state backends. You can use S3 files by providing paths as follows -

s3://<your-bucket>/<endpoint>

Set S3 FileSystem

S3 is considered as a FileSystem by Flink. Through a Hadoop S3 FileSystem client interactions are done. There are two popular S3 file system implementations available. First is the S3 A FileSystem and second is the Native S3 FileSystem.

S3AFileSystem -It works on IAM roles. It uses Amazon’s SDK internally. It is a file system used for reading and writing regular files.

NativeS3FileSystem - It is also used for reading and writing regular files. It does not work with IAM roles, and the maximum size object is 5GB.

Configure Access Credentials

You want to make sure that Apache Flink is allowed to access your S3 buckets after setting up the S3 filesystem.

Identity and Access Management (IAM) (Recommended)

In order to access S3 buckets, you can use IAM features to give Flink instances securely.

Common issues in Installation of Apache Flink on AWS

Missing S3 FileSystem Configuration.
Amazon Web Services access key ID and secret access key not specified.
ClassNotFoundException
IOException
NullPointerException

A Comprehensive Approach

Real Time Processing of data has enabled Enterprises to perform Real-Time Intelligence and Real-Time activity monitoring in very less time. To know more about Real Time Processing we advise taking the following steps -

Read more about Stream Processing with Apache Flink
Understand What is Apache Flink, Its Advantages

Apache Flink Security and its Deployment | Quick Guide

Table of Content

In this Article

Additional Resources

Overview of Apache Flink Security

How does Apache Flink Security work?

Hadoop Security Module

JAAS Security Module

Zookeeper Security Module

What are the deployment modes in Apache Flink Security?

Using kinit (YARN only)

Kerberos Authentication Support

Service Level Authorization

Transport Security (SSL/TLS)

Installation of Apache Flink on AWS

EMR - Elastic MapReduce

Create an EMR Cluster

Installing Apache Flink on AWS EMR Cluster

S3 - Simple Storage Service

Set S3 FileSystem

Configure Access Credentials

Identity and Access Management (IAM) (Recommended)

Common issues in Installation of Apache Flink on AWS

A Comprehensive Approach

Related Articles

Test Driven Development for Java using JUnit | Quick Guide

IoT Analytics Platform for Real-Time Data Ingestion

Intelligent Document Processing with Generative AI

Company

Cloud Native

Data Engineering

AI Engineering

Cloud Platform

Solutions

XS Discover

XS Optimise

XS Scale

XS Cloud Native

XS Adaptive AI

XS Decision Intelligence

Industry Transformation

Industry 5.0

AI-Driven Industries

Technology updates and resources

XS Journey

XS Scale

Enablers of Tomorrow

Apache Flink Security and its Deployment | Quick Guide

Table of Content

In this Article

Additional Resources

Overview of Apache Flink Security

How does Apache Flink Security work?

Hadoop Security Module

JAAS Security Module

Zookeeper Security Module

What are the deployment modes in Apache Flink Security?

Using kinit (YARN only)

Kerberos Authentication Support

Service Level Authorization

Transport Security (SSL/TLS)

Installation of Apache Flink on AWS

EMR - Elastic MapReduce

Create an EMR Cluster

Installing Apache Flink on AWS EMR Cluster

S3 - Simple Storage Service

Set S3 FileSystem

Configure Access Credentials

Identity and Access Management (IAM) (Recommended)

Common issues in Installation of Apache Flink on AWS

A Comprehensive Approach

Related Articles

Test Driven Development for Java using JUnit | Quick Guide

IoT Analytics Platform for Real-Time Data Ingestion

Intelligent Document Processing with Generative AI

Enablers of
Tomorrow