ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

ScyllaDB: The real-time big data database now available on IBM Power Systems

Chuck Calio

Retired IBM Z Banking Cloud Business Development Manager

å‘å¸ƒæ—¥æœŸ: 2018å¹´6æœˆ6æ—¥

I recently had the pleasure of co-presenting with Bob Dever, ScyllaDB VP Marketing, and Eyal Gutkind, ScyllaDB Solutions Architect to a large group of Global IBM Cognitive Systems Architects on the topic of ScyllaDB: The real-time big data database which is now available on IBM Power Systemsâ€. Key points of our talk included:

Typical NoSQL pain points â€“ The solution to add scale is to add more nodes. But as the number of nodes grows, the cost, complexity and administration of the system also grow, with more things to fail.
Scylla Overview â€“ The real-time big data database was founded by the creators of KVM Hypervisor. It is a drop-in replacement for Cassandra, with efficient and optimized close-to-the-hardware design. It provides up to 10X performance of Cassandra. Scylla Enterprise is GA on POWER8/POWER9 Systems.
Scylla benefits (Scale out and scale up) â€“ Customer was running 120 Cassandra nodes for a time-series heavy-write workload of about 1.5 million operations/sec. By switching to Scylla, they could cut to 24 similar HW, reducing datacenter admin costs by 80%. They were also able to replace the 24 boxes with 3 large-sized boxes, to further reduce admin cost.
Scylla benefits (Consistent performance) â€“ Cassandra canâ€™t keep maintenance operations from interfering with databaseâ€™s ability to process requests, it suffers from latency spikes during compaction and garbage collection. But Scylla is much better in providing consistent performance.
Scylla benefits (Reduced complexity) â€“ Cassandra requires JVM Tuning to improve performance. Scylla has no JVM, and no JVM tuning. When Scylla is installed, it benchmarks against the HW it's on: cores, IO and network. Then it tunes itself to those parameters. 90% of the time you do not need to touch those parameters, even workload changes. If the HW is changed, then re-benchmark and parameters would be updated.
Scylla benefits (Cassandra compatibility) â€“ Scylla is Cassandra from architectural standpoint, use case standpoint, and API standpoint. Whatever you're using to plug into Cassandra, you can take the current tool set and the current engineering, and just change the IP address to a Scylla cluster, and it will work.
Scylla use cases â€“ include supporting storing and retrieving sensor Data/ Internet of Things data, eCommerce data, messaging, fraud detection, logs and Log management, product catalogs/ playlists, recommendation/ personalization engines, binary large object stores, etc.
Scylla on POWER9 performance exceeds Cassandra on Intel Xeon SP â€“ 100% writes throughput: POWER9 LC922 Scylla is 5.83x of Intel Xeon Cassandra. 100% read throughput: POWER9 LC922 Scylla is 2.13x of Intel Xeon Cassandra. 80% read & 20% write throughput: POWER9 LC922 Scylla is 3.16x of Intel Xeon Cassandra.
Scylla Design Decisions (C++ instead of Java) â€“ C++ is running faster than JVM because it's closer to the HW.
Scylla Design Decisions (Shard per core) â€“ Scylla has a shard-per-core infrastructure. It takes all system resources and all your data and split it based on the given number of core threads. Each shard has its own RAM, network, I/O, and its own piece of the data and is responsible only for itself. This gives a huge benefit in parallelizing the problem.
Scylla Design Decisions (Scylla has Its own task scheduler) â€“ Scylla recognizes not all tasks are created equal. With Scylla, all those tasks belong to specific sets of data, so thereâ€™s no chance that other tasks will be affected by that data. This is how we can get to 1,000,000 transactions per second in parallel on a single, physical CPU.
Scylla Design Decisions (All things async) â€“ Weâ€™ve built an async C++ framework called Seastar. Due to modern HW operates asynchronously, the SW needs to as well. Scylla is designed to be completely asynchronous. There are no locks in Scylla. The data is all streamed to the CPU on a shared-nothing architecture.
Scylla Design Decisions (Unified cache) â€“ Cassandra caching is complex. It has key caching, row caching, Linux caching, on heap, off heap, with enormous amounts of tuning. If data model changes, oftentimes you need to retune caching layer. Cassandra also has issue of page faults and context switches, which injects latency into the data flow. Scylla simplifies the cache. There's onboard caching. Whatever RAM given to Scylla, half of that RAM will be used for memtables and other back-end requirements. Scylla will use the other half for caching. There's no tuning to it. It just uses the least recently used algorithm. Scylla keeps it entirely in the user space, with no context switches. There's no wasted RAM because of Linux pages. It's just a far more functional caching mechanism. Scylla caching system fetches information directly from the SSTables without mediators like Linux page caches.
Scylla Design Decisions (IO scheduler) â€“ Latency increases when there are more requests than the disk can handle, those requests become queued inside the disk itself. It is better for the DB not to send those requests to the disk in the first place, and keep them queued inside the DB. Being inside the DB, Scylla can tag those requests and through prioritization, guarantee QoS among various classes of I/O requests that the DB has to serve: commitlog writes, compaction reads and writes, CQL query reads, etc.
Scylla Design Decisions (Autonomous) â€“ Scylla is an autonomous DB. On installation, it tunes itself for I/O, RAM, CPU and network. There are 5 different kinds of requests (commitlog, memtable, compaction, query, repair) that are going in to the network, disks, and CPU. Scylla schedules them through the asynchronous C++ framework, the Seastar scheduler, and assigns them priorities. Scylla will intelligently tune in real time to deal with the critical requirements of the different events within system.

Dor Laor

6 å¹´

Chunk, great job! Awesome to have such a great partner

èµž

å›žå¤

1 æ¬¡å›žåº”

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Chuck Calioçš„æ›´å¤šæ–‡ç«

Better than AI

2023å¹´5æœˆ30æ—¥

Better than AI

Recently AI has been in the news as a tool that (in theory) can start to replace jobs currently done today across aâ€¦

3 æ¡è¯„è®º
The product and the price : Zafin and IBM z16

2022å¹´4æœˆ21æ—¥

The product and the price : Zafin and IBM z16

One could argue there's nothing more important in business than your products and their price. For example, in theâ€¦

1 æ¡è¯„è®º
Temenos Transact is now supported on the IBM Public Cloud

2021å¹´10æœˆ14æ—¥

Temenos Transact is now supported on the IBM Public Cloud

Temenos Transact is a world-renowned Core Banking Platform, which is deployed globally at over 1000 clients. To dateâ€¦
IBM at Temenos TCF 2021

2021å¹´5æœˆ18æ—¥

IBM at Temenos TCF 2021

Next week at Temenos TCF 2021 IBM will be a Gold sponsor and we will have some exciting new announcements to coverâ€¦
Visiting the Dentist ended up being a lesson in Employee Engagement

2019å¹´8æœˆ26æ—¥

Visiting the Dentist ended up being a lesson in Employee Engagement

Saturday morning I had some significant Dental work done and as a result for the next week I can only eat soft foodsâ€¦
What's new in PowerAI Vision 1.1.4 ?

2019å¹´6æœˆ24æ—¥

What's new in PowerAI Vision 1.1.4 ?

PowerAI Vision 1.1.
IBM Power Systems AI Starter Kit: The hardware, software and support toolkit for your AI journey

2019å¹´6æœˆ4æ—¥

IBM Power Systems AI Starter Kit: The hardware, software and support toolkit for your AI journey

The IBM AI Starter kit includes everything you need to start training models and discovering valuable insights with IBMâ€¦
Putting AI to work with 3 of the Watson AI Offerings

2019å¹´5æœˆ21æ—¥

Putting AI to work with 3 of the Watson AI Offerings

IBM's amazing Enterprise AI Portfolio of Offerings includes a wide variety of products to support Clients on theirâ€¦

2 æ¡è¯„è®º
3 steps to achieve effective learning

2019å¹´1æœˆ29æ—¥

3 steps to achieve effective learning

As an IT Professional you'll often be called upon to help teach others with the desired outcome of the rapidâ€¦

1 æ¡è¯„è®º
Adopting new technology : Business Case (easy) and Use Case (harder)

2018å¹´8æœˆ20æ—¥

Adopting new technology : Business Case (easy) and Use Case (harder)

I often assist Sellers, Business Partners and Clients progress the evaluation and adoption of new IT technologies (inâ€¦

See all articles

ScyllaDB: The real-time big data database now available on IBM Power Systems

Chuck Calio

Retired IBM Z Banking Cloud Business Development Manager

Chuck Calioçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

The Database Odyssey: Evolution from Rigid Tables to Horizental, VectorDB, and Elastic Clouds

Couchbase 101: Introduction

Top 5 Big Data Databases

Real-Time detection and alerting of unwanted credit card charges (Part 3 of 3)

Accelerating Data Processing: Leveraging Apache Hudi with DynamoDB for Faster Commit Time Retrieval with Source Code

Tanzu Data in 2025: Optionality of Data Engines, Deployment Flexibility, and Data Strategy

DATA LAKES

The Guide To DynamoDB Streams

From Legacy Challenges to Modern Innovations: Navigating the Evolving Database Landscape

Spanner: Googleâ€™s Globally-Distributed Database

Chuck Calioçš„æ›´å¤šæ–‡ç«

Better than AI

The product and the price : Zafin and IBM z16

Temenos Transact is now supported on the IBM Public Cloud

IBM at Temenos TCF 2021

Visiting the Dentist ended up being a lesson in Employee Engagement

What's new in PowerAI Vision 1.1.4 ?

IBM Power Systems AI Starter Kit: The hardware, software and support toolkit for your AI journey

Putting AI to work with 3 of the Watson AI Offerings

3 steps to achieve effective learning

Adopting new technology : Business Case (easy) and Use Case (harder)

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

The Database Odyssey: Evolution from Rigid Tables to Horizental, VectorDB, and Elastic Clouds

Couchbase 101: Introduction

Top 5 Big Data Databases

Real-Time detection and alerting of unwanted credit card charges (Part 3 of 3)

Accelerating Data Processing: Leveraging Apache Hudi with DynamoDB for Faster Commit Time Retrieval with Source Code

Tanzu Data in 2025: Optionality of Data Engines, Deployment Flexibility, and Data Strategy

DATA LAKES

The Guide To DynamoDB Streams

From Legacy Challenges to Modern Innovations: Navigating the Evolving Database Landscape

Spanner: Googleâ€™s Globally-Distributed Database

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†