SQL for Kafka for the humble Platform Ops folk
Sarwar Bhuiyan
Technical Architect | Software Engineering, Data and Stream processing, Cloud architectures | Product Management | Technology Consulting | Field Advisory
I worked with a lot of ops teams managing Kafka (as well as other infra) and they kind of became the go to guys for all sorts of questions they had very poor tooling for. This caused the birth of Shadow IT cobbled together with a bunch of open source or free tools from github or the wider internet. Useful tools, but you know... the kind that wouldn't go through procurement or scan tools or anything like that. At some point, an audit would happen and there'd be an org-wide announcement that XYZ tool cannot be used anymore.
One such tool was KafkaTool (https://www.kafkatool.com/features.html). It's a tool used by the poor Ops guy to debug a question from a dev team they support. Search a topic, get some topic counts, download some data to a file for debugging, check offsets, etc.
So one day one of my customers asked me why our product didn't have Topic search and topic count. I winced as I knew that meant you needed a tool which had the ability to scan a topic and deserialize data and search something and give you back. Our UI console did not have that facility (only search on latest results showing in Javascript). I suggested #ksqlDB but that meant setting it up and having backing topics and streams/tables creating backing topics in Kafka which would add to the burden when they just wanted to do some ad-hoc queries.
"I just want to search a topic, why is that so hard?" or "I just want a count a topic, is that so hard?"
Yeah, it is hard but it's our responsibility to make it easy.
#timeplus #proton can be that Data explorer (via SQL) for Kafka. Just a low footprint binary and few queries without any storage required via "External Stream" for Kafka.
领英推荐
Connect to a Kafka topic so you can query it:
Let's check what the raw data looks like by getting 1 item:
Let's search with something fuzzy that I don't exactly know the exact url:
Can I do some counts of things to find some counts to look at and maybe group by something to narrow down?
None of these queries will survive the session and do not require heavy setup and teardown. They may however spark conversations with data engineers or software teams to build proper metrics or dashboards to monitor their data and their data. They can even create new data products for exchanging with other teams or companies.
Interested in a tool like this? Would love to talk.