October 2024 - The Pancake SQL pattern, APPEND for Refreshable Materialized Views, first impressions from a new user
ClickHouse
ClickHouse is an open-source, column-oriented OLAP database management system.
Welcome to the October ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.
This month, we have the impressions and challenges of ClickHouse from a first-time user, the APPEND clause for Refreshable Materialized Views, the pancake SQL pattern, and more!
Featured community member
This month's featured community member is Duc-Canh Le , a Software Engineer at Ahrefs.
Duc-Canh works on data infrastructure at Ahrefs and is responsible for developing and operating ClickHouse on over 600 machines that hold 100 PB of compressed data.
He is a regular contributor to the ClickHouse code base and has made 28 contributions in the calendar year. These include supporting OPTIMIZE on join tables to reduce their memory footprint, fixing a bug when using an empty tuple on the left-hand side of the IN function, and a fix for the FINAL clause when run on tables that don’t use adaptive granularity.?
ClickHouse for Embedded Analytics: First Impressions and Unexpected Challenges
Jorin Vogel recently started using ClickHouse for an embedded analytics project and shared his first thoughts. He also described things he wished he’d known before starting, including how materialized views work and working with duplicate data. This is a good read if you’re just starting your ClickHouse journey.
Using ClickHouse for High-Volume Data Pipeline Processing and Asynchronous Updates
Marais Kruger works at Evinced (a company focused on accessibility compliance for enterprise clients) and has written a blog post about this experience building a ClickHouse-based data pipeline.
Marais explains how they designed their pipeline to handle a large volume of incoming data while also handling infrequent updates to that data. He also describes how they made writes idempotent using ClickHouse’s duplicate block detection and a setting used to ensure similar behavior with dependent materialized views.
This one is a good read for the ClickHouse enthusiast or anyone curious about how to design data pipelines at scale.
24.9 release
The 24.9 release introduced the APPEND clause for working with refreshable materialized views. When configured, the materialized view’s query will append results to the end of the destination table rather than replacing everything. This is useful if you want to capture snapshots of data from other tables or poll data from an external API and store it in ClickHouse.
This release also made response headers available when using the url table function, automatic inference of the Variant data type, and aggregate functions to query the new JSON data type.
The pancake SQL pattern
Jacek Migda? had a tricky problem: One of the Quesma dashboards was sending up to 10 queries to populate a single panel, putting the ClickHouse database under pressure.?
Jacek was trying to solve this problem and had a lightbulb moment while feeding his toddler pancakes: Could the dashboard queries be redesigned to look more like pancakes?
Rather than spawning multiple queries, they put everything into one query. The aggregations would be stacked on each other, like a pancake, where each layer is a grouping with a limit, and between layers, they have metric aggregations—our pancake “fillings.”
It worked, and they’re seeing a 50x increase in performance.
ClickHouse Cloud Live Update: September 2024
We had a special guest, Dunith Danushka from Redpanda Data , join us for our latest ClickHouse Cloud update call. Dunith and Mark Needham showed how to use the combination of Redpanda Serverless, ClickHouse Cloud, and OpenAI to power a sports commentary Copilot application.
We also had updates on some upcoming features in ClickHouse Cloud, including Bring Your Own Cloud, Compute-Compute separation, and the JSON data type.
Quick reads
Post of the month
Our favorite post this month was by Carl Lindesv?rd about ClickHouse’s compression rate, a somewhat underrated feature!
Upcoming events
Global events
Free training
Events in AMER
Events in EMEA
Events in Asia Pacific
And how could we forget Jacek Migda? - we loved the blog about the pancake pattern. And Dunith Danushka - thanks for presenting at last month's ClickHouse Cloud Updates call!