ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Capture the Change, you want to see in the database.

Kaivalya Apte

The GeekNarrator Podcast | Staff Engineer | Follow me for #distributedsystems #databases #interviewing #softwareengineering

å‘å¸ƒæ—¥æœŸ: 2023å¹´3æœˆ11æ—¥

Change Data Capture

As the name suggests, it is about capturing changes to your data. By capturing, I mean reacting to changes and doing something else. If we take an example of an e-commerce application the most common functionality is `createOrder` but once an order is created in the database, we want to do more things like sending a notification to the user (an email, a message etc), like publishing this data to the warehouse for performing analytics etc.

How can we build this?

Solution 1

Well, looks simple. We can implement this logic in the application. Something like

public Order createOrder(CreateOrderRequest orderRequest) {
    Order order = dao.createOrder(orderRequest);
    userNotificationService.notifyOrder(order);
    dataWarehouseClient.post(order);
    return order;
}

No alt text provided for this image — Create Order

As you can see, your application logic handles the post order creation work as well, and it looks straightforward. Congratulations, we have just implemented MCDC (Manual Change Data Capture). Jk, this is just an acronym I came up with. But the point is, this approach has several problems:

If we want to capture changes in multiple systems, we need to change our application code.

public Order createOrder(CreateOrderRequest orderRequest) 
    Order order = dao.createOrder(orderRequest);
    userNotificationService.notifyOrder(order);
    dataWarehouseClient.post(order);
    
    // new usecase
    newUsecaseClient.post(order);
      
    return order;
}

This is not great, because you need to make changes, write tests, build and deploy the entire order processing system to capture changes to yet another system (downstream)

Another problem is, we haven't really thought about failures. What happens if createOrder is successful, but user notification fails? Or user notification is successful but data warehouse is down? Do we rollback the entire operation? Do we commit partial state and asynchronously try to complete the operations? Well yeah, we can do that, but this is yet another logic/code/infra/process to maintain, which can get super complication.
This brings down the availability of the createOrder process (in case you rollback), because now you need multiple systems to be up and running, just to create a simple order.
This solution isn't a general solution that can be used by other CDC use cases.

Solution 2

Another approach could be to combine the capture part into one component. The idea is to publish an event into a pub-sub topic which can be consumed by a consumer and the consumer can then do all the capturing part.

This works great, because now you don't depend only on the pub-sub system to be available and you have achieved decoupling between the downstream systems and the order service. Any new system that you want to publish changes to, you don't need any change in the order service, you can simply update the CDCWorker.

But again this has some problems, you need to implement this into your business logic. More importantly you have to implement this in all the places which needs CDC.

é¢†è‹±æŽ¨è

Horizontal vs Vertical Partitioning: Choosing the Right Strategy for Your Database

Horizontal vs Vertical Partitioning: Choosing theâ€¦

Kannan Dharmalingam 5 ä¸ªæœˆå‰

Setup Postman To Call D365 Data Entities

Munna Kumar Pandit 1 ä¸ªæœˆå‰

5 practices and tips to enhance your query performance experience in Snowflake

5 practices and tips to enhance your query performanceâ€¦

Isha Taneja 1 å¹´å‰

Solution 3

CDC frameworks like Debezium comes to the rescue as they provide you a framework to implement CDC without even touching your business logic. Using Debezium you can stream the data change events into any third system and do whatever you want with it. But how is the change event stream captured?

To understand this, we need to understand how Databases maintain a history of changes happening to a piece of data. Typically all transactional databases maintain a Log (append only) data structure to capture all the changes happening to the data. This is done mainly for two purposes:

Transaction recovery - If things go bad, this log helps the database to recover the state.
Replication - Using this log, state changes can be replicated to other nodes to keep a consistent view of the data.

Now as databases already maintain this append only log of transactions, can we use this log to achieve CDC? Well yes, CDC frameworks like Debezium does that in a reliable way while letting the application developers focus only on the business logic.

Benefits:

Application is decoupled from CDC use cases.
CDC frameworks are highly resilient and are compatible with most of the databases.
Provides low latency data capture (millis range), so you typically don't have to worry about lag.
Source and sink connectors make the whole process pluggable and easy to configure.
You can configure what data (columns) you want to expose to the stream without making any change to the application.
You can mask sensitive data.
You can monitor connectors using JMX.

There are several other benefits of using a standard CDC framework like Debezium. Most importantly it opens up a new world of streaming use cases without needing any changes in your application.

To know more, watch my discussion with Gunnar Morling , who is currently working with Decodable and is a former project lead for the Debezium project.

If you like this edition, please subscribe to the newsletter and The GeekNarrator youtube channel.

Also please give me a like on this post and share it with your network.

Keep learning! Keep rocking!

Cheers,

The GeekNarrator

5,616 ä½å…³æ³¨è€…

è®¢é˜…

Akash Agarwal

SMTS @ OCI

2 å¹´

Good one, just missing on terminologies for newbies like whats a CDC exactly.

èµž

å›žå¤

1 æ¬¡å›žåº”

Kartik S.

2 å¹´

Very insightful bhaiya Kaivalya Apte

èµž

å›žå¤

1 æ¬¡å›žåº”

Sachin Jindal

Senior Software Engineer at Intuit

2 å¹´

Nice one

èµž

å›žå¤

1 æ¬¡å›žåº”

sukhad anand

Senior Software Engineer @Google | Techie007 | Google Summer of Code @2017 | Opinions and views I post are my own

2 å¹´

Great content as always

èµž

å›žå¤

2 æ¬¡å›žåº”

Gunnar Morling

Technologist at Confluent

2 å¹´

Nice one!

èµž

å›žå¤

2 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Kaivalya Apteçš„æ›´å¤šæ–‡ç«

Crash course on JVM Memory Management:

2025å¹´1æœˆ13æ—¥

Crash course on JVM Memory Management:

JVM provides automatic memory management. In languages like C, developers manage memory explicitly using functions likeâ€¦
Cassandra 5.0 : ACID Transactions, Vector Search and much more

2023å¹´5æœˆ18æ—¥

Cassandra 5.0 : ACID Transactions, Vector Search and much more

Introduction In a recent podcast discussion with Patrick McFadin, VP of Developer Relations at DataStax, we delved intoâ€¦

5 æ¡è¯„è®º
Be Resilient - Humans and Servers

2023å¹´3æœˆ4æ—¥

Be Resilient - Humans and Servers

What is Resiliency? Ability to recover quickly to a â€œnormalâ€ working state from degradations/problems/failures. Like inâ€¦

4 æ¡è¯„è®º
Functional Programming on the JVM

2023å¹´2æœˆ25æ—¥

Functional Programming on the JVM

Functional Programming is a programming paradigm which involves breaking down a "giant" functionality into smallerâ€¦

3 æ¡è¯„è®º
Distributing SQL Databases Globally

2023å¹´2æœˆ11æ—¥

Distributing SQL Databases Globally

There are several reasons, why you would want to distributed your database. Keep your data closer to your customers.

9 æ¡è¯„è®º
Why is DynamoDB AWSome?

2023å¹´2æœˆ2æ—¥

Why is DynamoDB AWSome?

What is DynamoDB? A cloud NoSQL database service that guarantees consistent performance at any scale. Consistentâ€¦

4 æ¡è¯„è®º
Designing Instagram, Linkedin, Facebook like applications

2023å¹´1æœˆ29æ—¥

Designing Instagram, Linkedin, Facebook like applications

Hey Everyone, Welcome to the first article of The GeekNarrator newsletter, I am excited to start this newsletter alongâ€¦
5 things I learned from Hack-Week

2018å¹´9æœˆ9æ—¥

5 things I learned from Hack-Week

Last few days we spent on hacking things and building something interesting, useful and which could not be done as partâ€¦
Readable Code : Just like a fairy tale

2017å¹´5æœˆ31æ—¥

Readable Code : Just like a fairy tale

Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with noâ€¦

7 æ¡è¯„è®º
Work smart(less), achieve big. After all you are an engineer.

2016å¹´7æœˆ14æ—¥

Work smart(less), achieve big. After all you are an engineer.

In the current world, where IT industry is booming, people often miss the bigger picture.Everyone is running, toâ€¦

See all articles

Capture the Change, you want to see in the database.

Kaivalya Apte

The GeekNarrator Podcast | Staff Engineer | Follow me for #distributedsystems #databases #interviewing #softwareengineering

Change Data Capture

How can we build this?

é¢†è‹±æŽ¨è

Benefits:

The GeekNarrator

5,616 ä½å…³æ³¨è€…

Kaivalya Apteçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Custom Settings vs. Custom Metadata in Salesforce: Which One Should You Use?

11 Important Things about Salesforce Data Loader

Dive into D365 FO DIXF parameters

Data Assets Explained - Part 1

Business Central Data Transfer: 10 Tips

What difference would it make to use CDS View instead of table name in select query?

Data Mart

Understanding the Difference Between Star Schema and Snowflake Schema

A Deep Dive into Database Sharding with Real-World Examples

Import Data into Business Central with Free Import Export PowerTool

Change Data Capture

How can we build this?

é¢†è‹±æŽ¨è

Benefits:

The GeekNarrator

5,616 ä½å…³æ³¨è€…

Kaivalya Apteçš„æ›´å¤šæ–‡ç«

Crash course on JVM Memory Management:

Cassandra 5.0 : ACID Transactions, Vector Search and much more

Be Resilient - Humans and Servers

Functional Programming on the JVM

Distributing SQL Databases Globally

Why is DynamoDB AWSome?

Designing Instagram, Linkedin, Facebook like applications

5 things I learned from Hack-Week

Readable Code : Just like a fairy tale

Work smart(less), achieve big. After all you are an engineer.

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Custom Settings vs. Custom Metadata in Salesforce: Which One Should You Use?

11 Important Things about Salesforce Data Loader

Dive into D365 FO DIXF parameters

Data Assets Explained - Part 1

Business Central Data Transfer: 10 Tips

What difference would it make to use CDS View instead of table name in select query?

Data Mart

Understanding the Difference Between Star Schema and Snowflake Schema

A Deep Dive into Database Sharding with Real-World Examples

Import Data into Business Central with Free Import Export PowerTool

é¢†è‹±æŽ¨è

5,616 ä½å…³æ³¨è€…

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†