What is CockroachDB?
CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It supports consistent ACID transactions and provides a SQL API for structuring, manipulating, and querying data. It’s completely open source. Cockroachdb scales horizontally, survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention.
Why CockroachDB Matters?
CockroachDB offers fully-distributed ACID transactions, zero-downtime schema changes, and support for secondary indexes and foreign keys. CockroachDB provides scale without sacrificing SQL functionality. It also supports JSON datatype to store NoSQL data.
Features of CockroachDB
- Simplified Deployment
- Strong Consistency
- Support of SQL
- Distributed Transactions
- Automated Scaling and Repair
- High Availability
- Open Source
How CockroachDB Works?
CockroachDB runs machines on two commands –
- cockroach start with a –join flag for all of the initial nodes in the cluster, so the process knows all of the other machines it can communicate with cockroach init to perform a one-time initialization of the cluster.
Once the CockroachDB process is running, developers interact with CockroachDB through a SQL API. Send SQL RPC requests to any of nodes that are available due to the symmetric function that cockroach had, and this makes Cockroach easy to integrate with load balancers.
After receiving SQL RPC requests, nodes convert them into operations that work with CockRoachDb distributed key-value store. As these RPCs start filling cluster with data, cockroach starts distributing data among available nodes, breaking the data up into 64 MiB chunks these chunks are also known as the range. Each range replicated to at least three nodes. This way if any nodes go down, copies of the data is still there which can be used for reading operations and write operations as well as replicating the data to other nodes.
If a node receives a read or writes request, it cannot directly serve. Cockroach finds the node that can handle the requests and communicates with it. Cockroach tracks everything and enables symmetric behavior for each node.
Any changes made to data in a range rely on a raft consensus algorithm to ensure a majority of its replicas or nodes agree to commit the change, ensuring industry-leading (best in this industry) isolation guarantees and providing application consistent reading regardless of which node used for communication.
Ultimately, data is read from and written to disk using an efficient storage engine, which can keep track of the data’s timestamp. CockroachDB has the benefit of support the SQL standard AS OF SYSTEM TIME clause allowing and finding historical data for a period.
How does CockroachDB scale?
It scales horizontally with minimum operator overhead. CockroachDB runs on any local computer, single server, corporate development cluster, or a private-public cloud. Adding capacity in CockroachDB is as easy as pointing a new node at the running cluster.
At the key-value level, it starts with a single, empty range. This unique range eventually reaches a threshold size (64 MiB by default) when data put in a database. When that happens, the data split into two ranges, each range covering a contiguous segment of the entire key-value space and this process continues indefinitely as new data flows, existing ranges continue to split into new range aiming to keep a relatively small and consistent range size.
When any cluster spans multiple nodes, newly split ranges are automatically rebalanced to nodes with more capacity. CockroachDB communicates opportunities for re-balancing using a peer-to-peer gossip protocol by which nodes exchange network addresses, store capacity, and other information.
How does CockroachDB survive failures?
It is designed to survive hardware and software failures from server restart to data center outages and accomplished without confusing artifacts typical of other distributed systems using replication as well as automated repair after failures.
CockroachDB replicates data for availability and guarantees consistency between replicas using the Raft consensus algorithm. Various ways for defining the location of a cluster are –
- Different servers tolerate server failures.
- Different servers on different racks within a data center to tolerate rack power/network failures.
- Different servers in different datacenters to endure large-scale network or power outages.
Database performance can be affected during the replication process done by CockRoachDb itself, i.e., if the round-trip latency (delay during transferring of data) between data centers in more then database performance decreases and vice-versa.
For short-term failures (server restart), CockroachDB uses Raft to continue seamlessly as long as a majority of replicas remain available. For longer-term shortcomings (server/rack going down for an extended period or a data center outage), CockroachDB automatically re-balances replicas from the missing nodes using the unaffected replicas as sources. Capacity information from the gossip network identifies new locations in the cluster and all available nodes. Moreover, aggregate disk and network bandwidth of the cluster in the re-replication process done in a distributed fashion.
CockroachDB in Comparison
|Strongly Consistent Replication||No||No||Yes|
|Eventually Consistent Reads||Yes||Yes||No|
Installation of CockroachDB
Download The Binary
- Download the CockroachDB archive for Linux, and extract the binary –
$ wget -qO- https://binaries.cockroachdb.com/cockroach-v2.0.6.linux-amd64.tgz | tar xvz
- Copy the binary into PATH so it’s easy to execute cockroach commands from any shell –
$ cp -i cockroach-v2.0.6.linux-amd64/cockroach /usr/local/bin
$ docker version
$ sudo docker pull cockroachdb/cockroach:v2.0.6
- Install Docker for Linux.
- Confirm that the Docker daemon is running in the background –
- Pull the image for the v2.0.6 release of CockroachDB from Docker Hub –
Getting Started With CockroachDB Clusters
Step 1 -> Start First Node
Command to execute:-
$ cockroach start --insecure --host=localhost
- The –insecure flag makes communication unencrypted.
- cockroach-data directory stores node data.
- –host=localhost conveys node to listen only on localhost, with default ports used for internal and client traffic (26257). HTTP requests from the Admin UI (8080).
Step 2 -> Add more nodes to the cluster
Execute commands in two new terminals to add two more nodes –
$ cockroach start --insecure --store=node2 --host=localhost --port=26258 --http-port=8081 --join=localhost:26257 $ cockroach start --insecure --store=node3 --host=localhost --port=26259 --http-port=8082 --join=localhost:26257
Step 3 -> Test the cluster by adding data using SQL
Open a new terminal and start SQL shell using inbuilt SQL shell of CockroachDB, execute the command –
$ cockroach sql --insecure
Then make a database using SQL commands, and create a table in that database and insert data in the table.
Open SQL shell in other nodes to check data is there or not, execute the command in new terminal –
$ cockroach sql --insecure --port=26258
Step 4 -> Stop the clusters
Once the testing of cluster done, switch to the terminal running the first node and press CTRL-C to stop the node.
At this point, with two nodes are still online, the cluster remains operational because the majority of replicas are available and to verify that the cluster has tolerated this “failure,” open the built-in SQL shell of nodes 2 or 3. Do this in the same terminal or a new terminal.
Execute below command for removing nodes’ data and want a fresh cluster for further testing –
$ rm -rf cockroach-data node2 node3
Overview of Admin UI of CockroachDB
Admin UI provides details about your cluster and database configuration and Real-Time metrics to monitor the following areas –
- Node Map
- Cluster Health
- Hardware Metrics
- Runtime Metrics
- SQL Performance
- Storage Utilization
- Replication Details
- Node Details
- Database details
- Statement details
CockroachDB is easy to get started, and it is a fully distributed SQL database. It works well with Containers like Kubernetes. It is compatible with Postgresql. It has a fault tolerant system, very sophisticated replica placement rules. It is Cloud-ready and fully Open Source. CockroachDB is an excellent database to use.