Domain-Driven Design With Event Sourcing, Akka Cluster Sharding, Cassandra, Kafka, Scala. Distributed & Reactive Trading System. FinTech Use Case.

Domain-Driven Design With Event Sourcing, Akka Cluster Sharding, Cassandra, Kafka, Scala. Distributed & Reactive Trading System. FinTech Use Case.

An application can be developed with amazing architecture, using the latest technologies and having the best interface, etc. but if it doesn’t solve the business’s needs, it won’t be considered useful. That’s when Domain-Driven Design (DDD) comes in. As its name says, the point here is to focus on the domain of a specific business problem.

To design good software, it’s important to know what that software is about. To create a banking software system, you need to have a good understanding of what banking is all about, one must understand the banking domain. DDD has a strategic value and it’s about mapping business domain concepts into software artifacts. It’s about organizing code artifacts in alignment with business problems, using the same common, Ubiquitous language that both technical, business analysts and product owners can understand.

DDD isn’t a methodology, it’s more about the software’s architectural design, providing a structure of practices to take design decisions that help in software projects that have complicated domains. The DDD approach was introduced by Eric Evans in his book Domain-Driven Design: Tackling Complexity in the Heart of Software

In this article, we are going to demonstrate how to build a distributed Reactive Stock Trading System using Domain-Driven Design, Event Sourcing, Akka Cluster Sharding, Akka Persistence, Cassandra, Kafka the solution is written in Scala and Akka framework. The simple trading system has two main functionality to buy and sell the stock.

What is Domain-Driven Design

Domain-Driven Design (DDD) advocates modeling based on the reality of business as relevant to your use cases. In the context of building applications, DDD talks about problems as domains. It describes independent problem areas as Bounded Contexts (each Bounded Context correlates to a microservice) and emphasizes a common language to talk about these problems. It also suggests many technical concepts and patterns, like domain entities with rich models (no anemic-domain model), value objects, aggregates and aggregate root (or root entity) rules to support the internal implementation.

Bounded Context: A Bounded Context is an explicit boundary within which a domain model exists. The domain model expresses a Ubiquitous Language as a software model. When starting with software modeling, Bounded Contexts are conceptual and are part of the "problem space". In this phase, the task is to try to find actual boundaries of specific contexts and then to visualize what are the relationships between these contexts. As the model starts to take on a deeper meaning and clarity, Bounded Contexts will transition to the "solution space", with the software model being reflected in the project source code. Bounded Contexts are a strategic concept in Domain-Driven Design, and it is important to know how it is reflected in the architecture and organizational/team structure.

Domain Event: A domain event is something that happened in the domain that you want other parts of the same domain (in-process) to be aware of. The notified parts usually react somehow to the events. We use domain events to explicitly implement side effects of changes within your domain. Domain event helps us achieve better scalability and less impact in database locks, use eventual consistency between aggregates within the same domain. Domain events are append-only and immutable operations with no update so we don't have any network IO and database lock performance bottleneck. example FundDepositedEvent, AccountCreatedEvent, AccountCredited etc.

Ubiquitous Language: A language structured around the domain model and used by all team members to connect all the activities of the team with the software development cycle. This is one language both technical team, domain expert and product owner speak in other to have effective communication during the project life cycle.

Use Case

Our use case is about the SpringBoot reference application called SpringBootTrader written using the SpringBoot REST framework. The application is about a simple financial trading system. It has 4 main microservices and an API Gateway. I am going to refactor using Domain-Driven Design, Event Sourcing, Akka Cluster Sharding, Cassandra, Kafka, Event-Driven, Reactive microservice pattern. The goal and requirement of this solution are to make it a distributed and Reactive solution that can serve millions of users concurrently. Using the asynchronous, non-blocking style of systems development.

Below is a high-level Architectural diagram of the solution we intend to build.

No alt text provided for this image

Let us kick start our conversation with Domain-Driven Design with Event storming.

Event Storming In Domain-Driven Design Phase

Event-Storming: The goal of event storming and Domain-Driven Design (DDD) is to establish a technology-independent language and a detailed understanding of the business needs and processes. This will allow the business domain experts, those most familiar with the stock trading domain and the role our business has in it, to communicate their domain knowledge with the rest of the team's technical team. Stakeholders involved in a modeling workshop may include technology experts i.e developer, software architect, solution architect, project manager, user experience specialists, quality assurance analysts, and anyone else involved in the execution of this project; however, the most important people to include are the business domain experts and product owners.

No alt text provided for this image

In this phase, we represent events with orange sticky notes. stakeholders simply begin to think of and write down interesting business events on orange stickies and affix them to a modeling surface (typically paper on a wall that can easily be rolled up when finished). Within 10-15 minutes of beginning an event storming session, we will likely already have a bunch of interesting events identified! Once we have enough events, we need to create flows. Event storming is all about the free flow of communications. The next phase is Domain-Driven Design.

Domain-Driven Design Phase:

Domain-driven design gives us the blueprints to transform the output of an event storming session into models that are formal enough to use for architecting and building a real-world system. In understanding the Stock Trading business domain, using our business experts as the source of that knowledge. We’ll then translate those domain concepts directly into domain logic in our code, using the appropriate domain language. This will enable our technology experts and our business experts to stay closely aligned throughout the entire design and implementation of Reactive Stock Trading. Both teams will speak on a language called Ubiquitous Language. In DDD there are three main building blocks namely:

Commands: are often the trigger for a sequence of one or more events commands to express our intent for something to happen in the future. e.g "Buy Stock". However, commands can be refuted. e.g a command to ‘Sell stock’ might be rejected if the client’s portfolio doesn’t hold the number of shares they wish to sell.

Event: An event is a factual statement of occurrence e.g, a stock purchased. In other words, a set of events is a historical record of things that have happened in our system (or business). We will always express events in the past tense to reflect this. In theory, we can model an entire business, no matter how complex, as a flow of events. In practice, however, it would be virtually impossible to gather tens of thousands of stakeholders from a large company in the same room to make any meaningful progress. For this reason, we must limit an event storming session to a targeted area of the business not a targeted area of the technology.

Aggregate Or Aggregate Root: An aggregate is a cluster of domain objects that can be treated as a single unit. e.g list of stock in our portfolio, these will be separate objects. An aggregate will have one of its component objects be the aggregate root. Any references from outside the aggregate should only go to the aggregate root. The root can thus ensure the integrity of the aggregate as a whole.

Modeling Our Stock Trading Domain

In our event storming, our domain experts and product owners have identified three bounded contexts within the stock trading domain: Below context mapping diagram.

No alt text provided for this image

Portfolio: Microservice is responsible for buying and selling and reconstructing our portfolio holdings to determine investor profit and loss.

Trade Order: Microservice is responsible for the stock quote and executes trade orders via the exchange e.g NASDAQ

Account: Microservice is responsible for managing customer accounts like account balance, customer profile.

Event-Driven Kafka As Message Bus

The event-driven pattern is a way of communicating among microservices or systems using asynchronous Pub/Sub patterns. This happens when a microservice sends event messages to notify another service of a change in its domain. A key element of event notification is that the source system doesn't really care about the downstream system if it is up or not. The event is published to a message bus, Kafka. The downstream subscribe to an event it is interested in. Event notification is nice because it implies a low level of coupling and also gives us scalability. Kafka can process millions of messages per second.

As we can see each microservice is Bounded Context. But they still need to communicate with each other. We will use Kafka as our message bus and event-driven asynchronous mode of communication using Pub/Sub patterns. Each microservice publishes event and any interested microservice subscribe to the event update its state.

No alt text provided for this image

The next phase in our design is to determine how to store our data. Should we use RDBMS or NoSQL? What persistence pattern should we implement traditional CRUD or EventSourcing?

Cassandra With Akka Persistence And Event Sourcing

The traditional CRUD approach has served most enterprise use cases well for decades. As part of the trade-off for achieving reactive characteristics, complexity arises when the data needs to be shared among multiple microservice services. Transactions can’t span hosts without coordination. Distributed transactions incur high latency with an increased possibility of failures the opposite of microservice goals. Also, operations that block all the way down to the database often do not take full advantage of multi-core architectures. CRUD based approach doesn't scale for a large system with concurrent users. They perform slowly when data becomes large. They are typically implemented using RDBMS which is not designed to scale compare to NoSQL counterpart. Achieving scalability and elasticity is a huge challenge for RDBMS. In other to scale our data access layer we need a different design approach called Event-Sourcing in a large distributed solution with concurrent users.

No alt text provided for this image

Event Sourcing and CQRS gained a lot of popularity recently. Event Sourcing ensures that all changes to the application state are stored as a sequence of events. Not just can we query these events, we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes. Instead of storing just the current state of the data in a domain, use an append-only store to record the full series of actions taken on that data. The store acts as the system of record and can be used to materialize the domain objects. This can simplify tasks in complex domains, by avoiding the need to synchronize the data model and the business domain, while improving performance, scalability, and responsiveness. It can also provide consistency for transactional data, and maintain full audit trails and history that can enable compensating actions.

We are going to use Akka Persistence to implement our event-sourcing and Akka Actor model.

No alt text provided for this image

Akka Persistence framework has an excellent implementation of EventSourcing. Akka has a persistence framework that enables stateful actors to persist in their state so that it can be recovered when an actor is either restarted, such as after a JVM crash, by a supervisor or a manual stop-start, or migrated within a cluster. The key concept behind Akka's persistence is that only the events received by the actor are persisted, not the actual state of the actor (though actor state snapshot support is also available). The events are persisted by appending to storage (nothing is ever mutated) which allows for very high transaction rates and efficient replication. Akka persistence also provides point-to-point communication with at-least-once message delivery semantics which is critical in a financial application. You can learn more about Akka Persistence. We will use Cassandra Akka Persistence implementation. There is also implementation for other NoSQL like MongoDB, CouchDB, Postgres, etc.

Concurrency, Parallelism With Akka

The second requirement in our solution is to build a Reactive System at scale using the asynchronous, non-blocking style of systems development. Systems built as Reactive Systems are more flexible, loosely-coupled and scalable. This makes it easier to develop and amenable to change They are significantly more tolerant of failure and when failure does occur they meet it with elegance rather than a disaster. Reactive Systems are highly responsive, giving user effective interactive feedback. The concepts of time and space are at the heart of a reactive system. Strategically leveraging concurrency, parallelism, distribution, and location transparency are the keys to unlocking the full potential of our software. I have written an article on this topic. You can read more on it

To scale our system, we need to understand the concept of Concurrency and Parallelism.

Concurrency and Parallelism are related concepts. Concurrency is two or more tasks progressing with only one task executing at any given point in time on a single processing unit. Parallelism is two or more tasks executing simultaneously on two different processing units.

No alt text provided for this image

Threads And Akka Actors

Although Akka Actors and Threads are related, they are not the same. Under the hood, the Akka engine will run sets of actors on sets of threads. Therefore, the number of actors is not equal to the number of threads. In general, many actors may share one thread, and triggering the same actor subsequently may execute the actor completely on a separate thread. Since each actor can be run on a different thread for subsequent invocations, an actor's state is never shared with other actors. Actors are shielded from outside interference and therefore actors communicate with the rest of the system using message passing.

Threads, on the other hand, require synchronization and locking to access shared resources. The state of a thread is extremely critical for the proper functioning of multithreaded applications. Maintaining a multithreaded application is not only hard to debug, but also hard to scale across nodes when scaling out an application.

Scalability Using Akka Cluster Sharding

The financial industry depends heavily on distributed technology to be able to scale. In this industry, 99.99% availability is a first-class SLA requirement, due to this factor, it needs a reactive framework such as Akka to be able to scale.

The primary design goal here is scale, more so than resilience against failure (even though that is a nice side-effect). To work fast with a lot of data, we need to hold the data in memory, and to scale, we need to distribute it across a cluster of nodes. This is achieved by employing an architecture that separates scale-agnostic code (the one written by application developers) from scale-aware code (the one written by library developers).

Building a distributed system is very hard but Akka makes it easy for us. Akka tool kit is for building distributed and scalable, elastic and reactive systems

No alt text provided for this image

Cluster Sharding is useful when you need to distribute actors across several nodes in the cluster and want to be able to interact with them using their logical identifier, but without having to care about their physical location in the cluster, which might also change over time in the cluster. Cluster sharding use location transparency.

Akka Cluster + Sharding + Persistence creates an ultimate trio for building fast, resilient, scalable, distributed applications.

In our solution, each of our bounded contexts models an Actor entity as an aggregate root.

AccountEntityActor, PortfolioEntityActor, TradeOrderEntityActor. Each entity handles incoming messages. These messages are commands, which are requests to change the state of the entity. Other messages are query requests that are used to retrieve entity. The process of forwarding these messages to the right entities, which could be distributed across multiple JVMs running in a cluster, is handled by Cluster Sharding Proxy. To send a message to an entity the sender simply sends the message to a shard region proxy actor. The shard region actor is responsible for forwarding the message to the correct entity actor in the cluster.

Translating the Domain Model To Service Implementation

Portfolio Service: A Portfolio model and represent stock and cash holdings which comprise Portfolio Aggregate and Pub/Sub Kafka Service. and asynchronous Akka Http Service API. During our event storming. We have identified the following command/event associations:

Let’s look at an end-to-end process flow for a sell order:

No alt text provided for this image
  1. Customer Place sell order via MobileApp or API Gateway via Akka Http
  2. PortfolioActor confirms the availability of shares to sell.
  3. Create OrderPlacedEvent and ShareDebitEvent
  4. Publish PlaceOrderCommand to InboundTradeOrderTopic in Kafka
  5. TradeOrderActor subscribe to InboundTradeOrderTopic Kafka and create OrderReceivedEvent , forward SellOrderCommand to TradeActor. It execute TradeOrder on exchange i.e Nasdaq StockExchange.
  6. TradeOrderActor publish TraderOrderCompleted event to OutboundTradeOrderTopic topic in Kafka
  7. PortfolioActor subscribe to OutboundTradeOrderTopic and created TradeOrderCompleted Event increase funds.

Account Service: Account model and represent stock and cash holdings. which comprises of Portfolio Aggregate and Pub/Sub Kafka Service. and asynchronous Akka Http Service API. During our event storming. We have identified the following Command/Event associations:

Let’s look at an end-to-end process flow for opening an account:

No alt text provided for this image

Account Opening Workflow

  1. Customer make an account opening request from FrontEnd App Web or Mobile
  2. Return Open account response to asynchronous API Http.
  3. An AccountOpeningCommand is generated send to AccountActor and AccountCreatedEvent is generated
  4. AccountActor send OpenPortfolioEvent to PortfolioActor to open portfolio for customer

Account Deposit Workflow

  1. Customer make a deposit request from FrontEnd App Web or Mobile
  2. Return deposit response to asynchronous API Http.
  3. Account Actor receives DepositFundCommand and raises DepositFundEvent send to PortfolioActor to increase customer funds in the portfolio state.
  4. Customer request funds from his Trading Account. WithdrawFundCommand is received from API gateway. AccountActor publishes WithrwalFundEvent to send to PortfolioActor service to decrease customer funds in the portfolio state.

Account Withdraw Workflow

  1. Customer make a withdraw request from FrontEnd App Web or Mobile
  2. AccountActor receives DepositFundCommand and publishes DepositFundEvent send to PortfolioActor to increase customer funds in the portfolio state.
  3. Customer request funds from his Trading Account. WithdrawFundCommand is received from API gateway. AccountActor publishes WithrwalFundEvent to send to PortfolioActor service to decrease customer funds in the PortfolioActor.

As we can see, all the communication and interaction among all the services are asynchronous, non-blocking style, no synchronous request-response that leads to tightly and point-to-point coupling. We achieved a higher degree of loosely coupling service. This pattern is called event-driven. With asynchronous, non-blocking, we have a more scalable system.

The full source code of this solution can be downloaded on my Github repository below. Feel free to clone the repo modify, play around and enhance the source code. The code is written in Scala. I promise to convert the Scala code to Java for my Java Guys in the future. I love to write codes in functional programming like Scala and Golang because I write faster and less code than using my beloved Java.??????

Takeaways

Domain-Driven Design (DDD) advocates modeling based on the reality of business as relevant to our use cases. DDD has a high learning curve. Developing a new way of thinking might seem painful at first but it worth given a try. DDD is designed to simplify complexity.

Event sourcing can be considered one of the most critical milestones towards embracing a completely reactive style of systems architecture and development.

Akka makes building powerful concurrent & distributed applications simple. Akka strictly adheres to the Reactive Manifesto. Reactive applications aim at replacing traditional multithreaded applications with an architecture that satisfies one or more of the following requirements:

  • Event-driven. Using Actors, one can write code that handles requests asynchronously and employs non-blocking operations exclusively.
  • Scalable. In Akka, adding nodes without having to modify the code is possible, thanks to both message passing and location transparency.
  • Resilient By Design. Any application will encounter errors and fail at some point in time. Akka provides “supervision” (fault tolerance) strategies to facilitate a self-healing system.
  • Responsive. Many of today’s high performance and rapid response applications need to give quick feedback to the user and therefore need to react to events in an extremely timely manner. Akka’s non-blocking, the message-based strategy helps achieve this.
  • High performance. Akka delivers up to 50 million messages per second on commodity hardware having ~2.5 million Actors per GB of RAM. We can model one actor per user. If we have 10 million users = 10 million Akka Actors. We can distribute these users across nodes in our cluster using cluster sharding. This is a kind of backend infrastructure tech giants like Google, Amazon, PayPal, Uber, etc are running. PayPal currently uses Akka to scale. It processes 1 billion transactions per day using Akka.

Overall, Domain-Driven Design is worth learning because it simplifies one factor that is complex in every professional relationship: communication. DDD allows developers, solution architects, technical teams, product owners, domain experts, data engineers, DBAs, business owners, and (most importantly) clients to communicate effectively with each other to solve problems, everyone is on the same page.

Domain-Driven Design + Akka Cluster + EventSourcing creates an ultimate trio for building fast, resilient, scalable, distributed applications that can serve millions of concurrent users at scale.

Thank you for reading.

Oluwaseyi Otun is Software and Data Engineer, Backend Software Engineer (Java, Golang, Scala, Akka). Big Data and Distributed system enthusiast with a special interest software architectural design pattern, microservice, Big Data, large scale distributed system in in-memory, streaming computing and big data analytics. I love learning internals working on systems and exploring what happens under the covers. He lives in one of the Atlantic Province in Canada.

Moses Ayankoya ?

iOS(SwiftUI, UIKit), Kotlin Multiplatform , Nodejs : Typescript, Swift & Kotlin

5 年

well written and articulated : Questions =? - For data integrity how elegant would this architecture able to mitigate transactional data integrity. I believe saga-pattern should able to mitigate transactional data. I want enlightenment on this?

回复

要查看或添加评论,请登录

Oluwaseyi Otun的更多文章

社区洞察

其他会员也浏览了