March 2024 - chDB joins ClickHouse, Rill Data connector, 1 Trillion Row Challenge

March 2024 - chDB joins ClickHouse, Rill Data connector, 1 Trillion Row Challenge

Welcome to the March ClickHouse newsletter where we round up what’s been happening in real-time data warehouses in the last month.

This month, we have the 24.2 release with useful features for data ingestion, Rill dashboards for ClickHouse, and 10x faster materialized views using aggregation states.

Featured community member

This month's featured community member is Steve Flitcroft , VP of R&D at iVendi

Steve is perhaps better known as redsquare on the ClickHouse Community Slack, where he has helped a lot of users solve problems that they’ve encountered when using ClickHouse.

Whether it’s questions about refreshable materialized views, how to speed up a query, or understanding ClickHouse’s table engines, Steve has got you covered!

Follow Steve on LinkedIn

24.2 release

The 24.2 release added some useful features for data ingestion. Adaptive asynchronous inserts make data batching smarter & more efficient. Plus, ClickHouse is now smarter at detecting file formats even if the file extension is missing or wrong. We’ve also vectorized distance functions, speeding up vector search in RAG applications.

Read the release post

Rill dashboards for ClickHouse

Rill Data is a Business Intelligence tool that lets you build fast operational dashboards with sub-second performance. Having bumped into Alexey Milovidov , ClickHouse’s Co-founder and CTO, at FOSDEM, this month they added a ClickHouse connector. In a blog post, Nishant B. explains how the connector works and gives step-by-step instructions to get your first Rill/ClickHouse dashboard up and running.

Read the blog post

The One Trillion Row Challenge

At the start of February, Dask launched the 1 trillion row challenge, which requires entrants to query 1 trillion rows of data stored across 100,000 Parquet files in S3. Dale McDiarmid , our resident challenge expert, set to work and got the query running in under 3 minutes for $0.56 in AWS spot instances. In the blog post, Dale explains how he optimized query performance, including bottleneck detection and working out the best size of AWS machine to use.?

Read the blog post

10x Faster Materialized Views with Aggregation States

Sayed Alesawy has written a blog post in which he takes us through various techniques to improve the performance of queries on observability data. An initial query on 26 million rows takes 693 seconds to run, which is reduced to 11 seconds with a materialized view. But sub-second response time is needed and this is achieved by storing aggregation states instead of scalar values.?

Read the blog post?

chDB joins the ClickHouse family

The biggest news of the month is that chDB, an embedded SQL OLAP engine powered by ClickHouse, is now part of ClickHouse. chDB’s creator and main contributor, 王鹏程 , is joining forces with us to focus on evolving chDB and integrating it even more closely with the ClickHouse ecosystem. We’d love to know what you’d like us to work on next, which you can do via the chDB GitHub discussion board.

Read the announcement

Post of the month

My favorite tweet this month was by Michael E. Driscoll (Founder of Rill Data) about chDB joining ClickHouse. See it here

Upcoming events



要查看或添加评论,请登录

社区洞察