July 2024 - Optimal table sorting, Optimizing CPU usage, Import patterns, Tracking vessels
ClickHouse
ClickHouse is an open-source, column-oriented OLAP database management system.
Welcome to the July ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.
This month, we have optimal table sorting in the 24.6 release, tracking vessels with ClickHouse & Grafana, and tactics for optimizing CPU usage when running ClickHouse.
Featured community member
This month's featured community member is taiyang-li (李扬)
taiyang-li is a frequent contributor to the ClickHouse database, regularly contributing pull requests that improve ClickHouse’s performance and string processing capabilities. In just the last few months, he’s committed code that let the -UTF8 functions handle strings containing only ASCII characters, fixed concat to accept empty arguments, and improved the compatibility of the upper/lowerUTF8 functions. And if you’ve noticed that the splitByRegexp, coalesce, or ifNotNull functions are quicker, you can also thank taiyang-li for that!
24.6 release
The latest release of ClickHouse saw the introduction of optimal table sorting. We can use this setting on table creation, and when ingesting data, after sorting by ORDER BY key, ClickHouse will automatically sort data to achieve the best compression. We also had a beta release of chDB that lets you query Pandas DataFrames directly, and functions for Hilbert Curves were added.
?
How to track vessels with Python, ClickHouse, and Grafana
Ignacio Van Droogenbroeck has written a cool blog post on tracking vessels in San Francisco and Buenos Aires. He shows how to get the data from AisStream’s WebSockets API into ClickHouse and then creates a series of visualizations using Grafana.
?
ClickHouse MergeTree Engine
T?i là Duy?t has started writing blog posts about using ClickHouse in Kubernetes. A recent post explores the default MergeTree table engine. T?i explains what happens when data is ingested into a table using this engine. He then goes through how to use it, including inserting data, supported data types, and column modifiers.
Optimizing ClickHouse: Tactics that worked for highlight.io
highlight.io is an open-source, full-stack Monitoring Platform. It ingests 100 TB of observability per month, much of which goes into ClickHouse. CTO Vadim Korolik has written a blog post sharing their lessons on optimizing ClickHouse to reduce CPU load.?
ClickHouse Cloud updates: July 2024
Did you know that we publish a ClickHouse Cloud Changelog every fortnight? In the latest version, we announced the availability of ClickHouse Cloud on Microsoft Azure and a new Query Logs Insights UI to make it easier to debug your queries. The Prometheus endpoints for metrics is also in Private Preview.
Video corner: Import patterns
Mark Needham has recorded several videos demonstrating import patterns with ClickHouse:
?
Post of the month
Our favorite post this month was by anhtho , who’s using ClickHouse to analyze billing data.
Upcoming events
AWS Cloud Certified | Senior Web Developer | Cyber Security Student | Python | Tesla Alum | Futurist
3 个月Wow, great effort by your community! Says a lot about the product. I'm excited to attend the upcoming event ClickHouse Fundamentals on the 24th/25th https://clickhouse.com/company/events/clickhouse-fundamentals?utm_medium=newsletter&utm_source=linkedin&utm_campaign=202407-newsletter