Persistent Storage Solution For Containers - Docker & Kubernetes

by Gursimran | December 25, 2017 |  Categories -  Docker, Kubernetes, Containers

Blog Single Post Image

 

Overview

 

Persistent Storage is a critical part in order to run stateful containers. Kubernetes is an open source system for automating deployment and management of containerized applications.

 

There are a lot of options available for storage. In this blog, we are going to discuss most widely used storage which can use on on-premises or in a cloud-like GlusterFS, CephFS, Ceph RBD, OpenEBS, NFS, GCE Persistent Storage, AWS EBS, NFS & Azure Disk.

 


 

Prerequisites

 

To follow this guide you need -

 

  • Kubernetes

  • Kubectl

  • DockerFile

  • Container Registry

  • Storage technologies that will be used -

    • OpenEBS

    • CephFS

    • GlusterFS

    • AWS EBS

    • Azure Disk

    • GCE persistent storage

    • CephRBD

 

Kubernetes

 

Kubernetes is one of the best open-source orchestration platforms for deployment, autoscaling and management of containerized applications.

 

Kubectl

 

Kubectl is a command line utility used to manage kubernetes clusters either remotely or locally. To configure kubectl use this link.

 

Container Registry

 

Container Registry is the repository where we store and distribute docker images. There are several repositories available online we have DockerHub, Google Cloud, Microsoft Azure, and AWS Elastic Container Registry (ECR).

 

Container storage is ephemeral, meaning all the data in the container is removed when it crashes or restarted. Persistent storage is necessary for stateful containers in order to run applications like MySQL, Apache, PostgreSQL, etc. so that we don’t lose our data when a container stops.

 

3 Types of Storage 

 

  • Block Storage - It is most commonly used storage and is very flexible. Block storage stores chunks of data in blocks. A block is only identified by its address. It is mostly used for databases because of its performance.

  • File Storage - It stores data as files, each file is referenced by a filename and has attributes associated with it. NFS is the most commonly used file systems. We can use file storage where we want to share data with multiple containers.

  • Object Storage - Object storage is different from file storage and block storage. In object storage data is stored as an object and is referenced by object ID. It is massively scalable and provides more flexibility than block storage but performance is slower than block storage. Most commonly used object storage are Amazon S3, Swift and Ceph Object Storage.

 

Emerging Storage Technologies

 


 

7 Emerging Storage Technologies

 

OpenEBS

 

OpenEBS is a pure container based storage platform available for Kubernetes. Using OpenEBS, we can easily use persistent storage for stateful containers and the process of provisioning of a disk is automated.

 

It is a scalable storage solution which can run anywhere, from cloud to on-premises hardware.

 

Ceph

 

Ceph is an advanced and scalable Software-defined storage which fits best with the needs of today’s requirement providing Object Storage, Block Storage and File System on a single platform.

 

Ceph can also be used with Kubernetes. We can either use CephFS or CephRBD for persistent storage for kubernetes pods.

 

  • Ceph RBD is the block storage which we assign to pod. CephRBD can’t be shared with two pods at a time in read-write mode.

  • CephFS is a POSIX-compliant file system service which stores data on top Ceph cluster. We can share CephFS with multiple pods at the same time. CephFS is now announced as stable in the latest Ceph release.

 

GlusterFS

 

GlusterFS is a scalable network file system suitable for cloud storage. It is also a software-defined storage which runs on commodity hardware just like Ceph but it only provides File systems, and it is similar to CephFS.

 

Glusterfs provides more speed than Ceph as it uses larger block size as compared to ceph i.e Glusterfs uses a block size of 128kb whereas ceph uses a block size of 64Kb.

 

AWS EBS

 

Amazon EBS provides persistent block storage volumes which are attached to EC2 instances. AWS provides various options for EBS, so we can choose the storage according to requirement depending on parameters like number of IOPS, storage type(SSD/HDD), etc.

 

We mount AWS EBS with kubernetes pods for persistent block storage using AWSElasticBlockStore. EBS disks are automatically replicated over multiple AZ’s for durability and high availability.

 

GCEPersistentDisk

 

GCEPersistentDisk is a durable and high-performance block storage used with Google Cloud Platform. We can use it either with Google Compute Engine or Google Container Engine.

 

We can choose from HDD or SSD and can increase the size of the volume disk as the need increases. GCEPersistentDisks are automatically replicated across multiple data centres for durability and high availability.

 

We mount GCEPersistentDisk with kubernetes pods for persistent block storage using GCEPersistentDisk.

 

Azure Disk

 

An Azure Disk is also a durable and high-performance block storage like AWS EBS and GCEPersistentDisk. Providing the option to choose from SSD or HDD for your environment and features like Point-in-time backup, easy migration, etc.

 

An AzureDiskVolume is used to mount an Azure Data Disk into a Pod. Azure Disks are replicated within multiple data centres for high availability and durability.

 

NFS

 

NFS is Network File System and is the one of the oldest most used file system providing the facility share single file system on the network with multiple machines.

 

There are several NAS devices available for high performance or can we make our system to be used as NAS. We use NFS for persistent storage for pods and data can be shared with multiple instances.

 


 

Deployment

 

Now we are going to walk through with the deployments of storage solutions described above. We are going to start with Ceph.

 

Ceph Deployment

 

For Ceph we need to have an existing ceph cluster. Either we have to deploy Ceph cluster on Bare Metal or can use Docker Containers. Then install ceph client on a kubernetes host.

 

For CephRBD we have to create a separate pool and user for the created pool.

 

  • Creating new/separate pool

 

# ceph osd pool create kube 200 200

 

  • Creating a user with full access to the Kube pool

 

# ceph auth get-or-create client.kubep mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube' -o /etc/ceph/ceph.client.kube.keyring

 

  • Get the authentication key from the ceph cluster for the user client.kubep

 

# ceph --cluster ceph auth get-key client.kubep

 

  • Creating a new secret in default namespace in kubernetes

 

# kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" --from-literal=key='AQBvPvNZwfPoIBAAN9EjWaou6S4iLVg/meA0YA==’ --namespace=default

 

  • kube-controller-manager must have the privilege to provision storage and it needs admin key from Ceph to do that. For that, we have to get admin key

 

# sudo ceph --cluster ceph auth get-key client.admin

 

  • Creating a new secret for admin in default namespace in kubernetes

 

# kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key='AQAbM/NZAA0KHhAAdpCHwG62kE0zKGHnGybzgg==' --namespace=ceph-storage

 

  • After adding secrets we have to define new Storage Class by the copy the following content in the file named ceph-rbd-storage.yml

 

 

# kubectl create -f ceph-rbd-storage.yml --namespace=ceph-storage

 

  • Creating a volume using rbd StorageClass in the file named it ceph-vc.yml -

 

 

# kubectl create -f ceph-vc.yml --namespace=ceph-storage

 

  • Creating a volume claim using rbd StorageClass in the file named it ceph-pvc.yml -

 

 

# kubectl create -f ceph-pvc.yml --namespace=ceph-storage

 

  • Now we are going to launch Apache pod using the claimed volume. Create a new file with the following content -

 

 

# kubectl create -f apache-pod.yml --namespace=ceph-storage

 

  • For CephFS we are going to create a Ceph pool -

 

# ceph osd pool create cephfs_data 200

 

# ceph osd pool create cephfs_metadata 200

 

# ceph fs new newfs cephfs_metadata cephfs_data

 

  • Creating a new secret in default namespace in kubernetes for ceph admin user. Using

 

# sudo ceph --cluster ceph auth get-key client.admin

 

# kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key='AQDkTeBZLDwlORAA6clp1vUBTGbaxaax/Mwpew==' --namespace=default

 

  • After adding secrets we have to copy the following content in the file named ceph-fs-storage.yml

 

 


 

GlusterFS Deployment

 

We can deploy Glusterfs cluster either on Bare Metal servers or on containers using Heketi. After the deployment, we will create a GlusterFS volume.

 

  • Creating the following directory on the server’s where we want to keep the data.

 

# mkdir -p /data/brick1/myvol

 

  • Then we will create a volume using -

 

# gluster volume create myvol replica 2 node1:/data/brick1/myvol node2:/data/brick1/myvol

 

# gluster volume start gv0

 

  • Now we have to create gluster endpoints for kubernetes. By adding the content to file named gluster-endpoint.yaml

 

 


# kubectl create -f gluster-endpoint.yaml

 

  • Create the gluster service in kubernetes by following adding the following content in glusterfs-service.yaml

 

 

 

# kubectl create -f glusterfs-service.yaml

 

  • Then we are going to launch Apache pod using gluster as backend storage and add the following content to file apache-pod.yaml -

 

 

 

# kubectl create -f apache-pod.yaml

 


 

NFS Deployment

 

For NFS server to be consumed by the Kubernetes pod. First, we are going to create persistent volume by adding the following the content in nfs-pv.yaml.

 

 

# kubectl create -f nfs.pv.yaml

 

  • Creating persistent volume claim by adding the following content in nfs-pvc.yaml -

 

 

 

# kubectl create -f nfs-pvc.yaml

 

  • Now we are going to launch web-server pod with NFS persistent volume by adding the following content in apache-server.yaml

 

 

 

# kubectl create -f apache-server.yaml

 


 

AWS EBS Deployment

 

For using Amazon EBS in Kubernetes pod first we have to make sure that -

 

  • The nodes on which kubernetes pods are running are Amazon EC2 instances.

  • EC2 instances need to be in the same region and AZ as of EBS.

 

  • First, we have to create storage class in kubernetes for EBS disk by adding the following content in awsebs-storage.yaml

 

 

 

# kubectl create -f aws-ebs-storage.yaml

 

  • Then we are going to create PVC by adding the following content in aws-ebs.yaml -

 

 

 

# kubectl create -f aws-ebs.yaml

 

  • Now we are going to launch apache-webserver pod with AWS EBS as persistent storage by adding the following content apache-web-ebs.yaml

 

 

 

# kubectl create -f apache-web-ebs.yaml

 


 

Azure Disk Deployment

 

  • For using the Azure disk as persistent storage for kubernetes pods. We have to create storage class by adding the following content in the file named sc-azure.yaml

 

 

 

# kubectl create -f sc-azure.yaml

 

  • After creating storage class we are going to create Persistent Volume claim by adding the following content in azure-pvc.yaml.

 

 

 

# kubectl create -f azure-pvc.yaml

 

  • Now we are going to launch Apache-webserver pod with AZURE DISK as persistent storage by adding the following content apache-web-azure.yaml.

 

 

# kubectl create -f apache-web-azure.yaml

 


 

GCEPersistantDisk Deployment

 

For using GCEPersistantDisk in kubernetes pod first, we have to make sure that

 

  • The nodes on which kubernetes pods are running are GCE instances.

  • EC2 instances need to be in the same GCE project and zone as the PD.

 

  • First, we have to create storage class in kubernetes for the GCEPersistantDisk disk by adding the following content in gcepd-storage.yaml.

 

 

 

# kubectl create -f gcepd-storage.yaml

 

  • We are going to create a PVC by adding the following content in gcepd-pvc.yaml.

 

 

 

# kubectl create -f apache-gce.yaml

 

  • After creating persistent disk we are going to use to store web-data for the web server pod by adding the following content in the file named.

 


 

OpenEBS Deployment

 

  • For OpenEBS cluster setup click on this link for setup guide. First, we are going to start the OpenEBS Services using Operator by -

 

# kubectl create -f https://github.com/openebs/openebs/blob/master/k8s/openebs-operator.yaml

 

  • Then we are going to create some default storage classes by -

 

# kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-storageclasses.yaml

 

  • Now we are going to launch jupyter with OpenEBS persistent volume by adding the following content in the file named demo-openebs-jupyter.yaml -

 

 

 

# kubectl create -f demo-openebs-juypter.yaml

 

This will provision the pv and pvc automatically for the jupyter pod.

 


 

Features Comparison

 

Storage Technologies

Read Write Once

Read Only Many

Read Write Many

Deployed On

Internal Provisioner

Format

Provider

Scalability Capacity Per Disk

Network Intensive

CephFS

Yes

Yes

Yes

On-Premises/Cloud

No

File

Ceph Cluster

Upto Petabytes

Yes

CephRBD

Yes

Yes

No

On-Premises/Cloud

Yes

Block

Ceph Cluster

Upto Petabytes

Yes

GlusterFS

Yes

Yes

Yes

On-Premises/Cloud

Yes

File

GlusterFS Cluster

Upto Petabytes

Yes

AWS EBS

Yes

No

No

AWS

Yes

Block

AWS

16TB

No

Azure Disk

Yes

No

No

Azure

Yes

Block

Azure

4TB

No

GCE Persistent Storage

Yes

No

No

Google Cloud

Yes

Block

Google Cloud

64TB/4TB (Local SSD)

No

OpenEBS

Yes

Yes

No

On-Premises/Cloud

Yes

Block

OpenEBS Cluster

Depends on the Underlying Disk

Yes

NFS

Yes

Yes

Yes

On-Premises/Cloud

No

File

NFS Server

Depends on the Shared Disk

Yes

 


 

Conclusion

 

Every storage described above provides different features, speed and flexibility. You have to choose accordingly to your requirement. Persistent storage is necessary for stateful servers like MySQL, PostgreSQL, WordPress sites, etc.

 

There are a lot of options available for persistent storage. This is where we can help you to make the right decision. Reach out to us, tell us your requirement so that we can discuss and help you out.    

 


 

How Can XenonStack Help You?

 

Our DevOps Consulting Services provides DevOps Assessment and Audit of your existing Infrastructure, Development Environment and Integration.

 

Our DevOps Professional Services includes -

 

Cloud Infrastructure Solutions

 

Get Cloud Consulting Services, Cloud Infrastructure Services, Cloud Migration Solutions, Application Migration to Cloud and Cloud Management Services all under one roof. XenonStack offers Cloud Infrastructure Solutions on leading Cloud Service Providers including Microsoft Azure, Google Cloud, AWS and on Container Environment - Docker & Kubernetes

 

Enterprise Kubernetes Services

 

Make your Cloud Native Transformation with Kubernetes. Unify your Container Management Solutions into a Kubernetes Solution with support for Multi-Cloud Environments including AWS, Google Compute Engine, Google Kubernetes Engine, Microsoft Azure and more.

 

Enterprise Continuous Monitoring Solutions

 

Our DevOps Solutions enables the visibility of Continuous Delivery Pipeline with Continuous Monitoring and Alerting for infrastructure, processes, applications and Hosts. Our Product, NexaTrace is a next generation Monitoring Product with Predictive Intelligence using Artificial Intelligence & Machine Learning for Log Analytics.

 

Ready To Discuss Your Requirements Request Free Consultation