March 2024 - chDB joins ClickHouse, Rill Data connector, 1 Trillion Row Challenge
ClickHouse
ClickHouse is an open-source, column-oriented OLAP database management system.
Welcome to the March ClickHouse newsletter where we round up what’s been happening in real-time data warehouses in the last month.
This month, we have the 24.2 release with useful features for data ingestion, Rill dashboards for ClickHouse, and 10x faster materialized views using aggregation states.
Featured community member
This month's featured community member is Steve Flitcroft , VP of R&D at iVendi
Steve is perhaps better known as redsquare on the ClickHouse Community Slack, where he has helped a lot of users solve problems that they’ve encountered when using ClickHouse.
Whether it’s questions about refreshable materialized views, how to speed up a query, or understanding ClickHouse’s table engines, Steve has got you covered!
24.2 release
The 24.2 release added some useful features for data ingestion. Adaptive asynchronous inserts make data batching smarter & more efficient. Plus, ClickHouse is now smarter at detecting file formats even if the file extension is missing or wrong. We’ve also vectorized distance functions, speeding up vector search in RAG applications.
Rill dashboards for ClickHouse
Rill Data is a Business Intelligence tool that lets you build fast operational dashboards with sub-second performance. Having bumped into Alexey Milovidov , ClickHouse’s Co-founder and CTO, at FOSDEM, this month they added a ClickHouse connector. In a blog post, Nishant B. explains how the connector works and gives step-by-step instructions to get your first Rill/ClickHouse dashboard up and running.
The One Trillion Row Challenge
At the start of February, Dask launched the 1 trillion row challenge, which requires entrants to query 1 trillion rows of data stored across 100,000 Parquet files in S3. Dale McDiarmid , our resident challenge expert, set to work and got the query running in under 3 minutes for $0.56 in AWS spot instances. In the blog post, Dale explains how he optimized query performance, including bottleneck detection and working out the best size of AWS machine to use.?
10x Faster Materialized Views with Aggregation States
Sayed Alesawy has written a blog post in which he takes us through various techniques to improve the performance of queries on observability data. An initial query on 26 million rows takes 693 seconds to run, which is reduced to 11 seconds with a materialized view. But sub-second response time is needed and this is achieved by storing aggregation states instead of scalar values.?
chDB joins the ClickHouse family
The biggest news of the month is that chDB, an embedded SQL OLAP engine powered by ClickHouse, is now part of ClickHouse. chDB’s creator and main contributor, 王鹏程 , is joining forces with us to focus on evolving chDB and integrating it even more closely with the ClickHouse ecosystem. We’d love to know what you’d like us to work on next, which you can do via the chDB GitHub discussion board.
Post of the month
My favorite tweet this month was by Michael E. Driscoll (Founder of Rill Data) about chDB joining ClickHouse. See it here
Upcoming events