登录查看更多内容

SQL for Kafka for the humble Platform Ops folk

Sarwar Bhuiyan

Technical Architect | Software Engineering, Data and Stream processing, Cloud architectures | Product Management | Technology Consulting | Field Advisory

发布日期: 2024年5月14日

I worked with a lot of ops teams managing Kafka (as well as other infra) and they kind of became the go to guys for all sorts of questions they had very poor tooling for. This caused the birth of Shadow IT cobbled together with a bunch of open source or free tools from github or the wider internet. Useful tools, but you know... the kind that wouldn't go through procurement or scan tools or anything like that. At some point, an audit would happen and there'd be an org-wide announcement that XYZ tool cannot be used anymore.

One such tool was KafkaTool (https://www.kafkatool.com/features.html). It's a tool used by the poor Ops guy to debug a question from a dev team they support. Search a topic, get some topic counts, download some data to a file for debugging, check offsets, etc.

So one day one of my customers asked me why our product didn't have Topic search and topic count. I winced as I knew that meant you needed a tool which had the ability to scan a topic and deserialize data and search something and give you back. Our UI console did not have that facility (only search on latest results showing in Javascript). I suggested #ksqlDB but that meant setting it up and having backing topics and streams/tables creating backing topics in Kafka which would add to the burden when they just wanted to do some ad-hoc queries.

"I just want to search a topic, why is that so hard?" or "I just want a count a topic, is that so hard?"

Yeah, it is hard but it's our responsibility to make it easy.

#timeplus #proton can be that Data explorer (via SQL) for Kafka. Just a low footprint binary and few queries without any storage required via "External Stream" for Kafka.

领英推荐

The Essential Guide to Node.js, SQL, Kafka, and Event…

Karthik Rana 1 个月前

FLaNK-AIM: 13 May 2024

Tim Spann 10 个月前

Repartition and Coalesce in Apache Spark

Kumar Preeti Lata 7 个月前

Connect to a Kafka topic so you can query it:

Let's check what the raw data looks like by getting 1 item:

Let's search with something fuzzy that I don't exactly know the exact url:

Can I do some counts of things to find some counts to look at and maybe group by something to narrow down?

None of these queries will survive the session and do not require heavy setup and teardown. They may however spark conversations with data engineers or software teams to build proper metrics or dashboards to monitor their data and their data. They can even create new data products for exchanging with other teams or companies.

Interested in a tool like this? Would love to talk.

要查看或添加评论，请登录

Sarwar Bhuiyan的更多文章

Apache Camel, Debezium, PostgreSQL to Timeplus pipeline

2024年12月6日

Apache Camel, Debezium, PostgreSQL to Timeplus pipeline

I really do love Apache Camel! I was seeing if there was a quick and dirty way to write a Camel route to use Debezium…
Easily create a data product from Kafka topics with Timeplus

2024年10月2日

Easily create a data product from Kafka topics with Timeplus

Do you want to create a data product on Apache Kafka topics without duplicating that data and using new storage?…
Data Streaming Insights: Understanding Stream Processors and Streaming Databases ????

2024年9月20日

Data Streaming Insights: Understanding Stream Processors and Streaming Databases ????

In today's rapidly evolving data landscape, two technologies stand out for their ability to handle real-time data:…
Timeplus as a great embodiment of "Turning the database inside out"

2024年6月7日

Timeplus as a great embodiment of "Turning the database inside out"

I rewatched Martin Kleppmann 's "Turning the database inside out with Apache Samza" talk and I make the thesis that…
Monitor Kafka JMX Metrics with Metricbeat and Elasticsearch

2020年2月12日

Monitor Kafka JMX Metrics with Metricbeat and Elasticsearch

I know Prometheus is so hot right now but for those of us that know and implemented the Elastic Stack, we know how well…

3 条评论

See all articles

SQL for Kafka for the humble Platform Ops folk

Sarwar Bhuiyan

Technical Architect | Software Engineering, Data and Stream processing, Cloud architectures | Product Management | Technology Consulting | Field Advisory

领英推荐

Sarwar Bhuiyan的更多文章

社区洞察

其他会员也浏览了

Repartition and Coalesce in Apache Spark

Practical Apache Spark in 10 minutes. Part 3?-?DataFrames and?SQL

Bringing Data from MySQL to Kafka Using Debezium, Joining Kafka Topics with Flink, Upserting into a New Kafka Topic, & Ingesting into Hudi RealTime

Apache Spark :: HiveWarehouseSession (CRUD) with Hive 3 Managed Tables

Partitioning and Bucketing in Apache Spark

Presto! We need to replace Apache Drill!

Exploring Apache Hudi, Apache Iceberg, and Delta Lake: A Comparative Analysis of Open-Source Data Lake Management Projects

DuckDB is Strategically Important

LinkedIn and Apache Kafka

Governing Apache Ranger

领英推荐

Sarwar Bhuiyan的更多文章

Apache Camel, Debezium, PostgreSQL to Timeplus pipeline

Easily create a data product from Kafka topics with Timeplus

Data Streaming Insights: Understanding Stream Processors and Streaming Databases ????

Timeplus as a great embodiment of "Turning the database inside out"

Monitor Kafka JMX Metrics with Metricbeat and Elasticsearch

社区洞察

其他会员也浏览了

Repartition and Coalesce in Apache Spark

Practical Apache Spark in 10 minutes. Part 3?-?DataFrames and?SQL

Bringing Data from MySQL to Kafka Using Debezium, Joining Kafka Topics with Flink, Upserting into a New Kafka Topic, & Ingesting into Hudi RealTime

Apache Spark :: HiveWarehouseSession (CRUD) with Hive 3 Managed Tables

Partitioning and Bucketing in Apache Spark

Presto! We need to replace Apache Drill!

Exploring Apache Hudi, Apache Iceberg, and Delta Lake: A Comparative Analysis of Open-Source Data Lake Management Projects

DuckDB is Strategically Important

LinkedIn and Apache Kafka

Governing Apache Ranger