Elasticsearch vs. CtrlB
Elasticsearch vs. CtrlB

Elasticsearch vs. CtrlB

Telemetry data explosion

Data being generated is growing at a staggering rate of 23% YoY, whereas the IT budget grows by 5% YoY in the best cases. But the biggest challenge with this data is that despite it growing with time, the value from it does not.

Data volume grows; hence, the cost to manage that volume grows, but the value from that plateau

Challenges with managing Elasticsearch on larger data volumes

Let us first look at the architecture of the ELK stack to understand the underlying issue

Typical ELK stack architecture with Data Nodes, Master Nodes, Logstash, and Kibana

  • Logstash collects the logs from Kafka or other supported sources and writes them to stateless nodes called ES Writers, which puts data on the data nodes.
  • Kibana/Grafna sends the query to a stateless ES Query node; these fan-out queries to the data nodes that serve the query.

High operational overhead.

Managing 100+ data nodes is tedious and error-prone.

Cluster-wide operations can take days/weeks; one can not just isolate and do operations on a few nodes; these operations impact read/write performance.

There are several single points of failure; if you're doing bulk indexing and one of the indexers is slow, the indexing throughput goes down because everything is as fast as the slowest node. If you have 100s of nodes, then a few of them will always be hitting P99.

Difficult to handle log spikes.

Peak provisioning leads to unused resources, as teams have to peak provision their clusters, keeping log spikes in mind.

Log spikes lead to lag, that is, loss of real-time visibility into our system as there is no capacity to ingest all the messages; hence, ingestion stops or slows down.

During an incident when there is a spike, if you add capacity, ES tries to re-distribute the shards, which in turn reduces the capacity at the moment.

Multi-tenancy and data reliability.

ES rejects requests with field conflicts; if a field ID is an integer in one message and a string in another, ES will reject the event that came later.

Loss in system visibility if there is a re-deploy of the cluster of the indexer rollsover. It might then start to ingest the other ID field, and all the alerts/dashboards will be blown off.

Backups needed for data reliability add to the infra cost.

How CtrlB solves for observability at scale

CtrlB can be divided into two parts -

  • CtrlB Flow - An observability pipeline that allows developers/ops to route data from any source to any destination while analyzing it in the stream.
  • CtrlB Explore - Querable storage on top of S3/Blob storages with sub-second latency, optimized for observability data.


Let us look at the architecture of CtrlB Explore

Separation of compute & storage


  • Kafka writes to the stateless Interface Node, which in turn writes to the elastic compute nodes.
  • For queries, the Interface node receives the query and fans out to the elastic compute nodes, which grow in number to address the queries and then shrink.

CtrlB Advantages

  • Cut down your observability cost by up to 80-90%
  • Take control of your observability data and choose what is important for you and what is not. Pay for data value, not volume.
  • Eliminate vendor lock-in and give your teams the flexibility and freedom to use any tool they like.
  • Have a central place to govern data, retract PII, react to alerts faster, etc.

Interested in knowing more or ready to take control of your observability data? Reach out to us at - [email protected]

Data/Reference

For 90 days retention




Viraj Phanse

VC, Investor | Ex Operator (Product/GTM) - AWS, Oracle, Aerospike, Persistent | Ex co-founder/CEO | 1x exit | UC Berkeley MBA, UCLA MS in CS | [email protected]

9 个月
回复

要查看或添加评论,请登录

Adarsh Srivastava的更多文章

社区洞察

其他会员也浏览了