m3db

What is M3DB

As the data load increases, a need to detect fraud out of that data also increases. This problem can be solved by tracking or analyzing that data in real time. For this problem, real-time databases are being developed at a very large scale and many companies are partying in by making them open-source. Real-time data is the need and Real-time query engines are the solutions. M3DB is developed by uber for their internal use and later they open sourced it under Apache. M3DB is Distributed Time series database inspired by Gorilla and Cassandra (tools by Facebook) that handles a large amount of data and obtain incremental results for different Matrics and written in Go Lang. As data at Uber is not particular to the same location and the same amount, so M3DB handles that with ease for solving Uber Use case. That uber case was not properly solved by other tools they were using before at Uber such as Graphite, Prometheus, etc. M3DB brings that handling of multi-million matrics per second with the persistence of some million aggregated matrics also.

How does it work?

  • M3 stack is used by uber for aggregating matrics and Time series database. In that stack, M3DB is placed with M3 coordinator.
  • M3 coordinator is managed alongside Prometheus and acts as a read/write endpoint between Prometheus and M3DB. This provides long term storage goals.
  • M3DB uses M3TSZ compression algorithm based on Facebook’s Gorilla with the different use case and provides a very scalable compression ratio of 1.45bytes/data point (as per Uber use case).

M3DB In-memory object layout is represented as a hierarchy.

  • DataBase: It will be only one for each M3DB process
  • Namespaces: Each Database has multiple namespaces (equivalent to tables)
  • Shards: Each namespace can have multiple shards based on horizontal partitions. They provide arbitrary distribution of time series data by using a murmur3 hash function.
  • Series: Series is contained by Shards and it acts as time series data. There can be multiple series.
  • Block: Compressed time series data is stored under these blocks that are part of some series. We can define blocks as named partitions according to time cap.
  • M3DB has persistent storage apart of in-memory storage (Active -buffers) as it is the way to recover data. In persistent storage, data is in uncompressed form and only unflushed data is present. Fileset files and commit logs are used to achieve this functionality whereas caching policies also exist to improve performance in this case.

Write flow

There is a number of steps that were followed while writing data and the first step is called made to write. When a call to the writeBatchRaw endpoint is made by the client, It will have to contain the following information:

  • Namespace
  • Series ID (byte blob)
  • Timestamp
  • Data

M3DB first checks for the namespace. If it exists then it generated a hash for that Series ID and verifies shards for that series ID. If that shard Exists then it will assign encoder that then uses M3TSZ to compress that time series data. At the same time data is also written to commitlogs until data is flushed to disk without any error.

Read Flow

Read request contains 3 components: Namespace, series ID, data. The data also gets reads from three places i.e. from active buffers, from the in-memory cache, and from disk respectively. There are three operations based on a cluster in M3DB i.e. Node Add, Node down and Node Remove. M3DB has some consistency levels maintained and defined for reading, write and connect. These are:

For Write:

  • One – One node success
  • The majority – Majority of nodes success
  • All – All nodes success

For Read:

  • One – Single node read
  • Majority – Majority nodes reads
  • All – All nodes successfully read
  • UnstrictMajority – Majority of nodes read without the restriction of failed read

For Connect:

  • Any – Any number of nodes connection
  • None – no node connection
  • One – Single node connection
  • The majority – Most of the node’s connection
  • All – All nodes connection

Filesets are used to store Info, summary, Indexes, Time series data, Bloom filters, Digests and checkpoints each with a different file. These are maintained for every shard.

Benefits of M3DB

  • Reduce the cost of resource development, ingestion, and storage.
  • M3 can process 500 million metrics per second with the persistence of up to 20 million aggregated metrics. Calculating this for 24 hours based structure it can process up to 45 TRILLION matrics a day.
  • Most of the times, data to be accessed is available in Active buffers. That makes it faster to read from there.
  • M3DB has one commit log for each database i.e. each namespace share one commit log. These commit logs are totally uncompressed and retain only that data which is not flushed to Persistent storage.
  • Snapshotting is used to create other compressed time series data in the form of fileSet structure but at a different location. These can help to remove the load from commit logs as they can be cleaned once snapshots of their data are created.
  • It has multiple caching policies that let us define what data is needed to cache and on which level.

Why M3DB Matters

  • M3DB provides the facility to monitor time series data at a faster rate with M3TSZ compression support which is based on Facebook’s Gorilla TSZ compression but with some differences.
  • If real-time matrics report needed to fetch then it can fetch data from active buffers at a fast rate as they are in-memory buffers.
  • Storage can be extended to any value with namespaces defined for data retention time. (currently 26 hours, which is more than the previous system that provides it for 2 hours).
  • Snapshotting, caching policies and deployment with Prometheus are some important points that make it different from others in processing and development way.

Implementation of M3DB

M3 can be deployed by one of the following ways:

  • M3DB single node Deployment: Useful for standalone local implementation. Need to setup m3dbnode and m3coordinator instances.
  • M3DB on Kubernetes: Useful when used on fast disks. Can be integrated with Prometheus
  • M3DB manual Deployment: How a deployment or architecture looks like
  • M3Query: By setting up M3query one can query M3DB easily as it is the part of M3QL and one can define aggregations and all by using M3Query.

Integrations

While Integrating M3DB it needs to set up some configurations. It can be integrated with one of the following and Grafana is used for querying results.

Prometheus

Prometheus provides the facility of matrics monitoring at large scale.

  • Define properties in the configuration file of M3 coordinator
  • Merge it on Prometheus
  • Query by using Grafana

Graphite

Graphite provides support for numeric time series databases and renders graphs for that data.

  • Define the Ingestion Pathway (using carbon plaintext protocol)
  • Handle aggregations by using default defined functions or Graphite functions

To get more deep insights, Get in Touch with us and stay updated. Our team will be happy to assist you.


Leave a Comment

Name required.
Enter a Valid Email Address.
Comment required.(Min 30 Char)