The performance comparison between the Cassandra version 4.1 and 5

The performance comparison between the Cassandra version 4.1 and 5


I expect you know that Apache Cassandra is an open-source distributed NoSQL database designed to process large amounts of data across many servers without a single point of failure and this article is about comparing the performance change due to evolution.

The main goals

From the perspective of the performance and response time, we would like to see the differences between Cassandra version 4.1 and 5. We would also like to be sure, that the new features such as UCS with relevant configuration, Java 17, etc. will bring the expected benefits in standard scenarios.

We saw a nice comparison between Cassandra versions 3 and 4, and the question mark for us was whether the last Cassandra version would be about a big performance boost or cosmetic changes only.

NOTE: We used an official tool for measurement (‘cassandra-stress’) in the last version 5.0.2.

Management summary

We saw these key outputs from our testing for consistency level LOCAL_QUORUM:

The Cassandra version 5 has on average 38% better performance and 26% better response time for write operations, than Cassandra version 4.1
The Cassandra version 5 has on average 12% better performance and 9% better response time for read operations, than Cassandra version 4.1
The Cassandra version 5 perform much better (can cover higher throughput) than Cassandra version 4.1 on the same HW

We can say, the performance has really very nice progress and we are looking forward to the Cassandra version 5.1 with additional new features such as ACID, etc. You can see a sample of test details below.

Test outputs

Let us mention a few sample outputs from tests (it does not make sense to publish whole details but only samples to provide relevant imagination):

Figure 1: Output of one run with CL=LOCAL_QUORUM, R/W operations (and compare selected outputs)


Figure 2: ?Output of one run with CL=LOCAL_QUORUM, R/W operations (and compare selected outputs)


Figure 3: Output of one run with CL=LOCAL_QUORUM, W operations (and compare selected outputs)


Details of write outputs


Figure 4: Compare write outputs for UCS level 4 vs STCS, CL=LOCAL_QUORUM


Figure 5: Compare write outputs for STCS vs STCS, CL=LOCAL_QUORUM


Figure 6: Compare write outputs for STCS vs STCS, CL=LOCAL_ONE, see performance degradation on Cassandra v4

Details of read outputs


Figure 7: Compare read outputs for UCS level 4 vs LCS, CL=LOCAL_QUORUM


Figure 8: Compare read outputs for LCS vs LCS, CL=LOCAL_QUORUM

Test setup/Environments setting

We build two similar clusters for Cassandra testing with these specifications:

Hybrid Cluster, 2x data center

  • Primary data center, on-prem (VM under VMWare), 3x node
  • Secondary data center, Azure (VM), 3x node

SW

  • Oracle Linux
  • Java 11 (for Cassandra version 4.1.4) and
  • Java 17 (for Cassandra version 5.0.1)


Test setup/Key characteristics of tests

We used standard/official tooling for performance testing ‘cassandra-stress’ (from the official Cassandra 5.0.2 distribution) and tuned the testing with these settings.

  • Test scenarios for Write and Read operations
  • Consistency levels LOCAL_ONE and LOCAL_QUORUM
  • Different compaction strategies STCS and LCS with default values for Cassandra v4 and STCS, LCS, and also UCS (with scaling 4, 8, and 10) for Cassandra v5.
  • Client performance throughput, based on different amounts of threads (4, 8, 16, 24, 36, 54 and 81). We simulated common/expected throughput not the peek variants for crash tests.
  • Duration of test duration typically 1 and 5 minutes


Typical test commands in ‘cassandra-stress’:

./apache-cassandra-5.0.2/tools/bin/cassandra-stress write duration=5m cl=LOCAL_ONE no-warmup -node 10.129.xx.xx,10.129.xx.xx,10.129.xx.xx -mode user=perf password=xxx prepared protocolVersion=4 connectionsPerHost=24 maxPending=384 -schema "replication(strategy=NetworkTopologyStrategy,factor=3)" "compaction(strategy=SizeTieredCompactionStrategy,max_threshold=32,min_threshold=4)" -rate "threads<=100" -reporting output-frequency=5s > "./stress-output/$curr_date/$curr_date v4 write_LOCAL_QUORUM_STCS_100xTHR.txt"        
./apache-cassandra-5.0.2/tools/bin/cassandra-stress read duration=5m cl=LOCAL_QUORUM no-warmup -node 10.129.xx.xx,10.129.xx.xx,10.129.xx.xx -mode user=perf password=xxx prepared protocolVersion=4 connectionsPerHost=24 maxPending=384 -rate "threads<=100" -reporting output-frequency=5s > "./stress-output/$curr_date/$curr_date v4 read_LOCAL_QUORUM_STCS_100xTHR.txt"        

A few final notes

  • You can see better performance results primarily in case of higher throughput (for 54 and 81 threads)
  • The Cassandra v5 is more stable (lower standard deviations in outputs) and suitable for higher performance throughput than version 4.1 on the same HW, see figure 6.
  • In case of different table schema, the output of tests can be different (lower or higher)
  • The right design, selection of relevant replication factor, compaction strategy and consistency level, etc. are critically (as usual)



#performancecompare #nosql #newsql #cql #cap #base #cassandra5 #cassandra4 #scylla #astradb #spanner #dynamodb #cockroachdb #cosmosdb #yugabytedb #datastax #elassandra #kairosdb #instaclustr #highavailability #consistencylevel


Dmitry Konstantinov

System Architect at NetCracker and Apache Cassandra Committer

2 个月

Jiri Steuer Thank you for the report! If possible could you share cassandra.yaml configuration used for the test? (or at least mention changes compared to a default config, for example: did you use Trie memtable for 5.0.x ?)

Paul Brebner

Open Source Technology Evangelist at Instaclustr by NetApp

3 个月

nice graphs and results, thanks Paul (PS Have you considered submitting a talk to the Performance Engineering track at Community over Code? This would be a good fit)

Sarma Pydipally

Experienced Oracle & Cassandra Data Engineer and Database Administrator

3 个月

Amazing study. Thank you for publishing detailed report. I wish you tried "easy-cass-stress" tool instead of cassandra-stress. https://github.com/rustyrazorblade/easy-cass-stress

Congratulations on a great work and thank you for your contributions to this field.

Jiri Steuer

Architect??Data/App, MLOps+/AI/ML

3 个月

#mlops #featurestore #vectordb

要查看或添加评论,请登录

Jiri Steuer的更多文章

社区洞察

其他会员也浏览了