登录查看更多内容

Timeplus as a great embodiment of "Turning the database inside out"

Sarwar Bhuiyan

Technical Architect | Software Engineering, Data and Stream processing, Cloud architectures | Product Management | Technology Consulting | Field Advisory

发布日期: 2024年6月7日

I rewatched Martin Kleppmann 's "Turning the database inside out with Apache Samza" talk and I make the thesis that Timeplus is a great implementation of that paradigm making it simpler and more accessible than other ways that has been done in the past decade. Thank you Martin for teaching us this way.

It's been 10 years since that talk and people have tried to do this with all sorts of combinations of distributed streaming platforms and stream processors in conjunction with traditional databases or data warehouse systems. It's actually not so trivial to learn how to do this and to deploy all this and still reason about it in a simple way. I trip over a lot of this still.

But let's start with how Timeplus does it and see whether you agree with me or not:

Streams Everywhere

In Timeplus, the fundamental abstraction used everywhere is a Stream. You create a stream (natively) and start inserting data into it. Like Kafka, it's a log or ordered or partially ordered facts. There are different types of streams (AppendOnly, Changelog KV, VersionedKV, and an upcoming "Mutable KV Stream"). Inside every stream is a Write-Ahead-Log (WAL) with configurable retention (sort of like a mini-Kafka) and a historical store which also gets kept up to date asynchronously. More on why later.

Fully precomputed caches

Timeplus tags on "View" and "Materialized views". Well, these are not tables like in a database but also just continuously Derived Streams off the back of a "normal stream" if you will. They probably should have been called "Derived Streams" and "Materialized Derived Streams". Naming is hard.

You can subscribe/consume these streams in a streaming way just the same using some APIs given (HTTP, WebSocket, Native SDKs). While consuming these streams, you can run an ad-hoc SQL Query too and get the stream of changes that meet the criteria.

Clients Subscribe to MV Changes

We really do mean Streams Everywhere! You can stream directly from any of the above streams to your UI via Client SDKs from Java to Go to Python to HTTP/Websocket.

Caches/Tables

So where are caches and tables that can be queried with a request-response style query and get one set of results?

Timeplus does not make you create another Table off the back of a stream. It gives you a table(stream_name) function and if you run the same ad-hoc SQL, it will just give you the results as of that point in time. We call it "historical query". This may be for ad-hoc exploration by an analyst or an application just wanting to retrieve some cached data from Materialized View. It serves both humans and software! Remember that historical store paired with every WAL? Well when you do a table() query, it'll just run on that and return results at lightning speed.

Better Data

I'm paraphrasing Martin here but you get all the following benefits:

领英推荐

Postgres for Everything IRL

Timescale 9 个月前

April 2023 - Iceberg Community News

Tabular (now part of Databricks) 1 年前

Proton v1.4.1 now available, Hacker News API demo for…

Timeplus 1 年前

Doing it this way does give you a way to decouple writing and reading. It's good for analytics as you can perform certain kinds of queries on a materialized view optimized for that use case while leaving the original written stream in place. You can Write once, Read from Many different Views. Views are just computations so won't take up space. Materialised Views actually realize the computation so takes up space but can be more performant. You can do historical point-in-time queries (including some really advanced analytics like time travel with AS OF Joins).

What about all those things Martin talked about that databases do that we'd like?

Replication

Timeplus can run as a single binary with all the above functionality built-in (try < 300MB file). Timeplus Enterprise is a distributed system including the usual sharding and replication with Multi-Raft as the underlying consensus mechanism. Because everything is a stream, it's easy to replicate streams and their derivatives.

Secondary Indexes

This is something that sort of already is there if you create Materialized Views with a different primary key. You can query with ANY field though it might require some scanning. It is generally still very fast.

But we went even further and will soon release a new kind of KV Stream with column families and more advanced secondary indices. You heard it here first. :)

Caching

Caching as implemented by the continuously updated Materialized Views above is a great advantage over application-run caches that have to be kept up to date and invalidated consistently. If you do a cold start by creating a materialized view, you can just tell it to read from the historical stream and be fully up to date. Or not, your choice. Either way, Timeplus takes care of keeping everything up to date so you can get on your life.

Bonus thing here:

You'd like to run some arbitrary business logic on your data in your views and materialized view? Well, Timeplus has SQL Functions and an embedded V8 engine to run User Defined Functions (UDFs) and User Defined Aggregation Functions (UDAFs) written in Javascript. Native Python support coming soon too for all you guys wanting to reuse your python libraries.

Conclusion

So that was a lot to take in but if you've been trying to solve these problems in a disaggregated way with #Kafka, #ksqlDB and #Flink and #Spark and whatever other components you realize how quickly the architecture gets "big". If you happen to have Kafka, we can treat Kafka topics as "External Streams" (leave your data there but be able to query those and build views/materialized views just the same). For materialized views outside Timeplus, we have support for #Clickhouse and Kafka as sinks so you can ship your data out and into your existing pipelines if that is what you choose.

For everyone else, Timeplus moves a lot more to the small and lets you build out to the bigger integrations from there. Startup a single binary and spin up Streams, Views, and Materialized views with just SQL. I probably should have said that at the top of the article. Hook up your web apps or your BI tools just like.

Simple is beautiful (at least for users). If you don't believe me, check it out. I'm here to answer questions.

要查看或添加评论，请登录

Sarwar Bhuiyan的更多文章

Apache Camel, Debezium, PostgreSQL to Timeplus pipeline

2024年12月6日

Apache Camel, Debezium, PostgreSQL to Timeplus pipeline

I really do love Apache Camel! I was seeing if there was a quick and dirty way to write a Camel route to use Debezium…
Easily create a data product from Kafka topics with Timeplus

2024年10月2日

Easily create a data product from Kafka topics with Timeplus

Do you want to create a data product on Apache Kafka topics without duplicating that data and using new storage?…
Data Streaming Insights: Understanding Stream Processors and Streaming Databases ????

2024年9月20日

Data Streaming Insights: Understanding Stream Processors and Streaming Databases ????

In today's rapidly evolving data landscape, two technologies stand out for their ability to handle real-time data:…
SQL for Kafka for the humble Platform Ops folk

2024年5月14日

SQL for Kafka for the humble Platform Ops folk

I worked with a lot of ops teams managing Kafka (as well as other infra) and they kind of became the go to guys for all…
Monitor Kafka JMX Metrics with Metricbeat and Elasticsearch

2020年2月12日

Monitor Kafka JMX Metrics with Metricbeat and Elasticsearch

I know Prometheus is so hot right now but for those of us that know and implemented the Elastic Stack, we know how well…

3 条评论

See all articles

Timeplus as a great embodiment of "Turning the database inside out"

Sarwar Bhuiyan

Technical Architect | Software Engineering, Data and Stream processing, Cloud architectures | Product Management | Technology Consulting | Field Advisory

领英推荐

Sarwar Bhuiyan的更多文章

社区洞察

其他会员也浏览了

What’s new in Iceberg 1.1

Code Complete: A Day in the Life of a Product

Summarizing Recent Wins for Apache Iceberg Table Format

The Essential Guide to Node.js, SQL, Kafka, and Event Emitters

FLaNK Stack Weekly for 5 Feb 2024

FLaNK-AIM: 13 May 2024

Understanding the Future of Apache Iceberg Catalogs

Apache KAFKA Connect 101 - Part (1/2)

Just Enough Spark! Core Concepts Revisited !!

Apache Kafka: Core Concepts and Use Cases

领英推荐

Sarwar Bhuiyan的更多文章

Apache Camel, Debezium, PostgreSQL to Timeplus pipeline

Easily create a data product from Kafka topics with Timeplus

Data Streaming Insights: Understanding Stream Processors and Streaming Databases ????

SQL for Kafka for the humble Platform Ops folk

Monitor Kafka JMX Metrics with Metricbeat and Elasticsearch

社区洞察

其他会员也浏览了

What’s new in Iceberg 1.1

Code Complete: A Day in the Life of a Product

Summarizing Recent Wins for Apache Iceberg Table Format

The Essential Guide to Node.js, SQL, Kafka, and Event Emitters

FLaNK Stack Weekly for 5 Feb 2024

FLaNK-AIM: 13 May 2024

Understanding the Future of Apache Iceberg Catalogs

Apache KAFKA Connect 101 - Part (1/2)

Just Enough Spark! Core Concepts Revisited !!

Apache Kafka: Core Concepts and Use Cases