Use External Schema Registry with MSK Connect – Part 1 Local Development

Use External Schema Registry with MSK Connect – Part 1 Local Development

When we discussed a Change Data Capture (CDC) solution in?one of the earlier posts, we used the JSON converter that comes with Kafka Connect. We optionally enabled the key and value schemas and the topic messages include those schemas together with payload. It seems to be convenient at first as the messages are saved into S3 on their own. However it became cumbersome when we tried to use the?DeltaStreamer?utility. Specifically it requires the scheme of the files but unfortunately we cannot use the schema that is generated by the default JSON converter – it returns the?struct type, which is not supported by the Hudi utility. In order to handle this issue, we created a schema with the?record type?using the Confluent Avro converter and used it after saving on S3. However, as we aimed to manage a long-running process, generating a schema manually was not an optimal solution because, for example, we’re not able to handle schema evolution effectively. In this post, we’ll discuss an improved architecture that makes use of a schema registry that resides outside of the Kafka cluster and allows the producers and consumers to reference the schemas externally.

Architecture

Below shows an updated CDC architecture with a schema registry. The Debezium connector talks to the schema registry first and checks if the schema is available. If it doesn’t exist, it is registered and cached in the schema registry. Then the producer serializes the data with the schema and sends it to the topic with the schema ID. When the sink connector consumes the message, it’ll read the schema with the ID and deserializes it. The schema registry uses a PostgreSQL database as an artifact store where multiple versions of schemas are kept. In this post, we’ll build it locally using?Docker Compose.

No alt text provided for this image

Continue...

要查看或添加评论,请登录

Jaehyeon Kim的更多文章

社区洞察

其他会员也浏览了