登录查看更多内容

Transactional Outbox Pattern?-?Distributed Design?Patterns

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

发布日期: 2023年4月18日

As we deal with more complex distributed systems, we’ll often come across use cases where we need to atomically perform operations on multiple data sources in a consistent manner.?

So, let’s assume that we are persisting order data into an RDBMS. The ML team might want to perform some analytics on this data. So, we have the following options -

Grant the ML team access to our DB. It creates a tight coupling between the Order Service and the ML team and any changes to the Order schema need to be coordinated across both teams and hence isn’t the preferred approach.
The Order Service writes to another DB owned by the ML team using the 2PC protocol. 2PC protocol is not as performant because of the need to coordinate across multiple nodes and is a blocking protocol. Hence it isn’t the preferred approach.?
Push the Order data onto a message broker like Kafka. The ML team can then have a consumer that reads data off Kafka and persists in their DB and perform analytics against that DB.

We’ve happily decoupled the Order Service with the Analytics Service and everyone is happy! (Or so you think!)

There are multiple failure scenarios here:

Order Service successfully persisted the message on the Database but crashed before it could send the message to the Broker. This leads to a loss of messages, which means the ML team will not have all the orders to run their analytics on(making the analytics wrong/skewed).
Order Service successfully sent the message to the Broker but the transaction on the Database failed. This will lead to orphaned/false records with the ML team again impacting their analytics.

Outbox Pattern

Outbox Pattern comes to the rescue here. We make use of an Outbox table, which can be used to store the operations we’re performing on the database. Order Service will write to both the Order table as well as the Outbox table, as part of the same transaction, ensuring the operation will always be atomic(1).

Once the record is inserted into the Outbox table, it can be read by an asynchronous process(2) that reads the data and publishes it to the Message Broker(3).

QQ: What does the Outbox pattern remind you of? Hint: WAL

No alt text provided for this image — Outbox Pattern

Advantages of Outbox?Pattern

The Outbox Pattern provides several benefits over other messaging patterns. Some of the major advantages of the Outbox Pattern are as follows:

Reliability: With the Outbox Pattern, messages are persisted in a database transactionally with the business transaction. This ensures that messages are always delivered, even if there are system failures or network issues.
Scalability: The Outbox Pattern can handle high volumes of messages without overwhelming the message broker. Since messages are persisted in the database, the message broker can consume them at a more controlled rate.
Performance: The Outbox Pattern can be faster than other synchronous messaging patterns because it eliminates the need for synchronous communication between microservices. The microservice that produces the message can quickly complete the business transaction and return a response, while the message is sent asynchronously in the background.
Decoupling: The Outbox Pattern allows microservices to be loosely coupled. Each microservice can focus on its specific business logic and ignore the details of how messages are sent and received.

领英推荐

Message Queuing in Modern Systems

David Shergilashvili 1 个月前

Lithium: Dynamic, Self Hosted, and Distributed…

Niraj Mishra 7 个月前

How to orchestrate MLOps by using Azure Databricks?

Aritra Ghosh 1 年前

Alternatives to Outbox?Pattern

If the Outbox Pattern is not suitable for your use case, there are a few alternative messaging patterns you can consider:

Direct Messaging: This pattern involves a direct synchronous request between microservices. It can be a good option for low-latency, low-volume communication.
Database Trigger: Another option is to use a database trigger to write the messages to the messaging infrastructure. The trigger can be used to detect changes in the database and write the messages to the messaging infrastructure.
CDC: Just like the database triggers, we can make use of CDC to read messages from the transaction log. This way, you can rely on CDC as your source of truth as only committed transactions would show up in the CDC stream. Caveat here is you might not have direct access to the binlog/might need 3rd party systems like Debezium to read data from the transaction log.
Publish-Subscribe Pattern: This pattern involves a message broker that allows multiple microservices to subscribe to specific message types. It can be a good option for high-volume, low-latency communication. So in the above case, there could be a common message broker that can be used by Order Service as well as the Analytics Service.

Sample Implementation

Here is a simple example of how you can implement the Outbox Pattern in Golang using a PostgreSQL database:

Create a message struct that contains the message data:

type Message struct 
    ID        string `json:"id"`
    EventType string `json:"event_type"`
    Payload   []byte `json:"payload"`
}{

2. Create an Outbox table in the database:

CREATE TABLE outbox 
    id uuid PRIMARY KEY,
    event_type text NOT NULL,
    payload bytes NOT NULL,
    created_at timestamp NOT NULL DEFAULT NOW()
);(

3. Insert a message into the outbox table in a database transaction:

func sendMessage(db *sql.DB, message *Message) error 
    tx, err := db.Begin()
    if err != nil {
        return err
    }

    defer func() {
        if r := recover(); r != nil {
            tx.Rollback()
        }
    }()

    _, err := tx.Exec("INSERT INTO orders(id, order_value, order_qty) VALUES ($1, $2, $3)", ...)
    if err != nil {
        tx.Rollback()
        return err
    }

    _, err := tx.Exec("INSERT INTO outbox(id, event_type, payload) VALUES ($1, $2, $3)", ...)
    if err != nil {
        tx.Rollback()
        return err
    }

    err = tx.Commit()
    if err != nil {
        panic(err)
    }
}

This brings us to the end of this article. We talked about the problem where the outbox pattern is really useful, the advantages of it and what the alternatives to the outbox pattern could be. We even see a sample snippet on how you could implement a transactional outbox pattern in Golang & Postgres. Please post comments on any doubts you might have and will be happy to discuss them!

Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this newsletter for notifications of new posts.

Please comment on the post with your feedback, it will help me improve!?:)

Until next time, Keep asking questions & Keep learning!

Distributed Systems Made Easy

7,968 位关注者

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

1 年

Subscribe to my LinkedIn newsletter to get updates on any new System design posts -?https://www.dhirubhai.net/newsletters/system-design-patterns-6937319059256397824/ You can also follow me on Medium -?https://distributedsystemsmadeeasy.medium.com/subscribe

Kaivalya Apte

The GeekNarrator Podcast | Staff Engineer | Follow me for #distributedsystems #databases #interviewing #softwareengineering

1 年

I like CDC (Change Data Capture) over outbox pattern because it loosely couples the application from data publishing. It is quite flexible and easy to configure. Also doesn’t need an additional table. Transaction logs already have the change data we need. Any specific use case where you think outbox pattern works better?

12 次回应

查看更多评论

要查看或添加评论，请登录

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

2024年5月29日

Database Intermediate Series: Change Data Capture(II)

Our previous post discussed Change Data Capture and how to implement it using triggers. In this post, we’ll explore how…

1 条评论
Database Intermediate Series: Change Data Capture(I)

2024年4月23日

Database Intermediate Series: Change Data Capture(I)

Change Data Capture (CDC) refers to identifying and capturing changes made to data in a database and then delivering…

2 条评论
Database Intermediate Series: SQL Isolation Levels Internals

2024年4月4日

Database Intermediate Series: SQL Isolation Levels Internals

In our last post, we talked about Database Isolation Levels and how different Isolation Levels allow us to balance the…

1 条评论
Database Basics Series: Understanding SQL Isolation Levels

2024年3月21日

Database Basics Series: Understanding SQL Isolation Levels

We are starting a new series on Databases, covering Basic, Intermediate, and Advanced concepts. This is the first…

6 条评论
Go Concurrency Series: Concurrency Patterns(II)

2024年2月3日

Go Concurrency Series: Concurrency Patterns(II)

In our last post, we talked about the Worker Pool and Pipeline concurrency patterns, that we can use while designing…

1 条评论
Go Concurrency Series: Concurrency Patterns

2024年1月23日

Go Concurrency Series: Concurrency Patterns

Let’s continue being a little more hands-on in our Go Concurrency Series! In this post, we’ll look into the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(III)

2024年1月20日

Go Concurrency Series: Deep Dive into Go Scheduler(III)

In my previous posts in the Go Concurrency Series, I’ve gone into the different components of the Go Scheduler and…
Go Concurrency Series: Deep Dive into Go Scheduler(II)

2024年1月14日

Go Concurrency Series: Deep Dive into Go Scheduler(II)

In my last post, we covered the components inside the Go Scheduler, and how a Go Scheduler can orchestrate the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(I)

2024年1月4日

Go Concurrency Series: Deep Dive into Go Scheduler(I)

In my last post about Goroutines, we talked about how Goroutines differ from Traditional threads. The Go Runtime…

8 条评论
Go Concurrency Series: Introduction to Goroutines

2023年12月25日

Go Concurrency Series: Introduction to Goroutines

Concurrency is a fundamental concept in modern software development, enabling programs to handle multiple tasks…

4 条评论

See all articles

Transactional Outbox Pattern?-?Distributed Design?Patterns

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

Outbox Pattern

Advantages of Outbox?Pattern

领英推荐

Alternatives to Outbox?Pattern

Sample Implementation

Distributed Systems Made Easy

7,968 位关注者

Pratik Pandey的更多文章

社区洞察

其他会员也浏览了

Transforming ETL Processes with Generative AI: A Revolution in Data Management

IPFS Clustering with Kubernetes: Advancing Decentralized File Sharing through Resilient Architecture

Introducing Easier Change Data Capture (CDC) with Apache Spark Structured Streaming

AIOps architecture: What is it and how is it changing?

Demystifying Resilient Distributed Datasets (RDD) in Apache Spark

Using Kafka for Log Processing: Efficient and Scalable Data Pipeline

Versioned Value (Design Pattern of Distributed Systems)

Most-Used Distributed System Design Patterns

Developing Data-Driven AI Apps: Making Calls to AI Services Directly from the?Database

Outbox Pattern

Advantages of Outbox?Pattern

领英推荐

Alternatives to Outbox?Pattern

Sample Implementation

Distributed Systems Made Easy

7,968 位关注者

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

Database Intermediate Series: Change Data Capture(I)

Database Intermediate Series: SQL Isolation Levels Internals

Database Basics Series: Understanding SQL Isolation Levels

Go Concurrency Series: Concurrency Patterns(II)

Go Concurrency Series: Concurrency Patterns

Go Concurrency Series: Deep Dive into Go Scheduler(III)

Go Concurrency Series: Deep Dive into Go Scheduler(II)

Go Concurrency Series: Deep Dive into Go Scheduler(I)

Go Concurrency Series: Introduction to Goroutines

社区洞察

其他会员也浏览了

Transforming ETL Processes with Generative AI: A Revolution in Data Management

IPFS Clustering with Kubernetes: Advancing Decentralized File Sharing through Resilient Architecture

Introducing Easier Change Data Capture (CDC) with Apache Spark Structured Streaming

AIOps architecture: What is it and how is it changing?

Demystifying Resilient Distributed Datasets (RDD) in Apache Spark

Using Kafka for Log Processing: Efficient and Scalable Data Pipeline

Versioned Value (Design Pattern of Distributed Systems)

Most-Used Distributed System Design Patterns

Developing Data-Driven AI Apps: Making Calls to AI Services Directly from the?Database