XenonStack Recommends

Enterprise Digital Platform

CockroachDB Architecture and Performance Overview

Navdeep Singh Gill | 02 Sep 2022

CockroachDB Architecture and Performance Overview

What is CockroachDB?

It is a distributed SQL database built on a transactional and strongly-consistent key-value store. It supports consistent ACID transactions and provides a SQL API for structuring, manipulating, and querying data. It's completely open source. Cockroachdb scales horizontally, survives disk, machine, rack, and even data center failures with minimal latency disruption and no manual intervention.
An online database management system providing create, Read, Update, Delete (crud) operations that expose the graph data model. Click to explore about, Graph Database Architecture

Why it is important?

It offers fully-distributed ACID transactions, zero-downtime schema changes, and support for secondary indexes and foreign keys. It provides scale without sacrificing SQL functionality. It also supports JSON datatype to store NoSQL data.

What are the key features?

  • Simplified Deployment
  • Strong Consistency
  • Support of SQL
  • Distributed Transactions
  • Automated Scaling and Repair
  • High Availability
  • Open Source

How it Works?

It runs machines on two commands -
  • cockroach start with a --join flag for all of the initial nodes in the cluster, so the process knows all of the other machines it can communicate with cockroach init to perform a one-time initialization of the cluster.

Once the process is running, developers interact with CockroachDB through a SQL API. Send SQL RPC requests to any of nodes that are available due to the symmetric function that cockroach had, and this makes Cockroach easy to integrate with load balancers. After receiving SQL RPC requests, nodes convert them into operations that work with its distributed key-value store. As these RPCs start filling cluster with data, cockroach starts distributing data among available nodes, breaking the data up into 64 MiB chunks these chunks are also known as the range.

Each range replicated to at least three nodes. This way if any nodes go down, copies of the data is still there which can be used for reading operations and write operations as well as replicating the data to other nodes. If a node receives a read or writes request, it cannot directly serve. Cockroach finds the node that can handle the requests and communicates with it. Cockroach tracks everything and enables symmetric behavior for each node.

Any changes made to data in a range rely on a raft consensus algorithm to ensure a majority of its replicas or nodes agree to commit the change, ensuring industry-leading (best in this industry) isolation guarantees and providing application consistent reading regardless of which node used for communication. Ultimately, data is read from and written to disk using an efficient storage engine, which can keep track of the data's timestamp. It has the benefit of support the SQL standard AS OF SYSTEM TIME clause allowing and finding historical data for a period.


An online database management system providing create, Read, Update, Delete (crud) operations that expose the graph data model. Click to explore about, Database Testing Types

How does it scale?

It scales horizontally with minimum operator overhead. It runs on any local computer, single server, corporate development cluster, or a private-public cloud. Adding capacity in CockroachDB is as easy as pointing a new node at the running cluster. At the key-value level, it starts with a single, empty range. This unique range eventually reaches a threshold size (64 MiB by default) when data put in a database.

When that happens, the data split into two ranges, each range covering a contiguous segment of the entire key-value space and this process continues indefinitely as new data flows, existing ranges continue to split into new range aiming to keep a relatively small and consistent range size. When any cluster spans multiple nodes, newly split ranges are automatically rebalanced to nodes with more capacity. It communicates opportunities for re-balancing using a peer-to-peer gossip protocol by which nodes exchange network addresses, store capacity, and other information.

How does it survive failures?

It is designed to survive hardware and software failures from server restart to data center outages and accomplished without confusing artifacts typical of other distributed systems using replication as well as automated repair after failures.

Replication

It replicates data for availability and guarantees consistency between replicas using the Raft consensus algorithm. Various ways for defining the location of a cluster are -
  • Different servers tolerate server failures.
  • Different servers on different racks within a data center to tolerate rack power/network failures.
  • Different servers in different datacenters to endure large-scale network or power outages.
Database performance can be affected during the replication process done by itself, i.e., if the round-trip latency (delay during transferring of data) between data centers in more then database performance decreases and vice-versa.

Automated Repair

For short-term failures (server restart), it uses Raft to continue seamlessly as long as a majority of replicas remain available. For longer-term shortcomings (server/rack going down for an extended period or a data center outage), CockroachDB automatically re-balances replicas from the missing nodes using the unaffected replicas as sources. Capacity information from the gossip network identifies new locations in the cluster and all available nodes. Moreover, aggregate disk and network bandwidth of the cluster in the re-replication process done in a distributed fashion.

CockroachDB vs MongoDB vs PostgreSQL

  MongoDB PostgreSQL CockroachDB
Automated Scaling Yes No Yes
Automated Failover Yes Optional Yes
Automated Repair Yes No Yes
Strongly Consistent Replication No No Yes
Consensus-Based Replication No No Yes
Distributed Transactions No No Yes
ACID Semantics Document-only Yes Yes
Eventually Consistent Reads Yes Yes No
SQL No Yes Yes
Commercial Version Optional No Optional
Open Source Yes Yes Yes
Support Full Full Full

How to install?

The steps to install CockroachDB is defined below:

Download The Binary

  • Download its archive for Linux, and extract the binary -
$ wget -qO- https://binaries.cockroachdb.com/cockroach-v2.0.6.linux-amd64.tgz | tar xvz
  • Copy the binary into PATH so it's easy to execute cockroach commands from any shell -
$ cp -i cockroach-v2.0.6.linux-amd64/cockroach /usr/local/bin

Use Docker


$ docker version
$ sudo docker pull cockroachdb/cockroach:v2.0.6
  • Install Docker for Linux.
  • Confirm that the Docker daemon is running in the background -
  • Pull the image for the v2.0.6 release of CockroachDB from Docker Hub -

How to get geting Started?

 Below are the steps to get started with CockroachDB Clusters:

Start First Node

Command to execute:-

$ cockroach start --insecure --host=localhost
Output -
  • The --insecure flag makes communication unencrypted.
  • cockroach-data directory stores node data.
  • --host=localhost conveys node to listen only on localhost, with default ports used for internal and client traffic (26257). HTTP requests from the Admin UI (8080).

Add more nodes to the cluster

Execute commands in two new terminals to add two more nodes -

$ cockroach start --insecure --store=node2 --host=localhost --port=26258 --http-port=8081 --join=localhost:26257

$ cockroach start --insecure --store=node3 --host=localhost --port=26259 --http-port=8082 --join=localhost:26257

Test the cluster by adding data using SQL

Open a new terminal and start SQL shell using inbuilt SQL shell of CockroachDB, execute the command -

$ cockroach sql --insecure
Then make a database using SQL commands, and create a table in that database and insert data in the table. Open SQL shell in other nodes to check data is there or not, execute the command in new terminal -
  
$ cockroach sql --insecure --port=26258

Step 4 -> Stop the clusters

Once the testing of cluster done, switch to the terminal running the first node and press CTRL-C to stop the node. At this point, with two nodes are still online, the cluster remains operational because the majority of replicas are available and to verify that the cluster has tolerated this "failure," open the built-in SQL shell of nodes 2 or 3. Do this in the same terminal or a new terminal. Execute below command for removing nodes' data and want a fresh cluster for further testing -
$ rm -rf cockroach-data node2 node3

Overview of Admin UI 

Admin UI provides details about your cluster and database configuration and Real-Time metrics to monitor the following areas -
  • Node Map
  • Cluster Health
  • Hardware Metrics
  • Runtime Metrics
  • SQL Performance
  • Storage Utilization
  • Replication Details
  • Node Details
  • Database details
  • Statement details

Decisions Through Data-Small data, Predictive modeling expansion, and real-time analytics are three forms of data analytics Healthcare data will continue to accumulate rapidly. Click to explore about, Big Data and Predictive Analytics Solution

Concluding 

It is easy to get started, and it is a fully distributed SQL database. It works well with Containers like Kubernetes. It is compatible with Postgresql. It has a fault tolerant system, very sophisticated replica placement rules. It is Cloud-ready and fully Open Source. CockroachDB is an excellent database to use.