Elasticsearch vs. CtrlB
Telemetry data explosion
Data being generated is growing at a staggering rate of 23% YoY, whereas the IT budget grows by 5% YoY in the best cases. But the biggest challenge with this data is that despite it growing with time, the value from it does not.
Challenges with managing Elasticsearch on larger data volumes
Let us first look at the architecture of the ELK stack to understand the underlying issue
High operational overhead.
Managing 100+ data nodes is tedious and error-prone.
Cluster-wide operations can take days/weeks; one can not just isolate and do operations on a few nodes; these operations impact read/write performance.
There are several single points of failure; if you're doing bulk indexing and one of the indexers is slow, the indexing throughput goes down because everything is as fast as the slowest node. If you have 100s of nodes, then a few of them will always be hitting P99.
Difficult to handle log spikes.
Peak provisioning leads to unused resources, as teams have to peak provision their clusters, keeping log spikes in mind.
Log spikes lead to lag, that is, loss of real-time visibility into our system as there is no capacity to ingest all the messages; hence, ingestion stops or slows down.
During an incident when there is a spike, if you add capacity, ES tries to re-distribute the shards, which in turn reduces the capacity at the moment.
Multi-tenancy and data reliability.
ES rejects requests with field conflicts; if a field ID is an integer in one message and a string in another, ES will reject the event that came later.
Loss in system visibility if there is a re-deploy of the cluster of the indexer rollsover. It might then start to ingest the other ID field, and all the alerts/dashboards will be blown off.
领英推荐
Backups needed for data reliability add to the infra cost.
How CtrlB solves for observability at scale
CtrlB can be divided into two parts -
Let us look at the architecture of CtrlB Explore
CtrlB Advantages
Interested in knowing more or ready to take control of your observability data? Reach out to us at - [email protected]
Data/Reference
VC, Investor | Ex Operator (Product/GTM) - AWS, Oracle, Aerospike, Persistent | Ex co-founder/CEO | 1x exit | UC Berkeley MBA, UCLA MS in CS | [email protected]
9 个月The issue with ES has been taken care of by Opensearch tiered storage. https://aws.amazon.com/blogs/big-data/petabyte-scale-log-analytics-with-amazon-s3-amazon-opensearch-service-and-amazon-opensearch-ingestion/