Data Analytics Stack On Kubernetes for Streaming App

Setting up Analytics Stack for Streaming Applications

  • SKACK Stack is an open source Full-Stack platform for Real-Time analysis of Big Data. It consists of Apache Spark, Kubernetes, Akka, Apache Cassandra, and Apache Kafka.
  • GCP & GlusterFS acts a storage solution as it supports multi-mount and data remains on all nodes of GlusterFS & GCP.

Challenge for Setting Up Multi Node cluster on SKACK

  • Set up a multi-node cluster for SKACK Stack with a document on Kubernetes.
  • Container environment is not persistent by default, so application in Kubernetes needs Persistent storage to store data.
    • Using Kubernetes to scale up Spark.
    • Using Kubernetes to scale up Cassandra
    • Using Kubernetes to scale up Kafka

Solution Offerings for Setting Up on Premises Kubernetes Cluster

To overcome the challenges mentioned above, set up a three-node on premises Kubernetes cluster in which one will as a master and the other two workers.

The Cluster includes –

  • Kubernetes Master
  • Kubernetes Scheduler
  • Kubernetes Controller Manager

Setup for analyzing the cluster and reporting to the API server to store metrics that contains resource utilization, availability, and performance.

Read more