Why Payments Engineers Should Avoid State Machines

Why Payments Engineers Should Avoid State Machines

There are two ways to represent movement.

Say you’re chasing someone in the dark, Marco Polo style. To find where they are, you shout “Marco”, and they scream “Polo” back. Like a radar, you move to where the voices are. And when you tag somebody, you stop.

Notice, though, that Google Maps doesn’t work that way.

Asking for directions is the other way to move. The path is ahead, and you just keep straight until a new direction is given to you. Like a car driving at night, you can see only what’s right in front of you. And when you aren’t given directions anymore, you stop.

Both approaches work. But if, once you were at your destination, I asked you which path you took, which of the two approaches would be more useful?

The first approach is a state machine; the second is an event-driven system.

A state machine cannot reconstruct the past. It can only move forward.

Payments Engineers must avoid state machines.

I’m Alvaro Duran, and this is The Payments Engineer Playbook. Scroll for five minutes on Youtube and you’ll find tons of tutorials that show you how to pass software design interviews that use payment systems. But there’s not much that teaches you how to build this critical software for real users and real money.

The reason I know this is because I’ve built and maintained payment systems for almost ten years. I’ve been able to see all types of interesting conversations about what works and what doesn't behind closed doors.

And I thought, “you know what? It’s time we have these conversations in public”.

In The Payments Engineer Playbook, we investigate the technology that transfers money. All to help you become a smarter, more skillful and more successful payments engineer. And we do that by cutting off one sliver of it and extract tactics from it.


It makes sense to think in terms of state machines.

It is undeniably easy to design if you draw boxes with names and arrows that make all possible transitions explicit. They force you to think about all of them.?

But to code that way? In payments? That’s probably a mistake.

The first reason is replayability.

Replayability means being able to reconstruct what happened using previously recorded data.

Replayability is not only useful when debugging payment systems, it is also required when a customer disputes a payment. Merchants win those disputes only when they can prove that all the obligations toward the payer were met.

If finality doesn’t exist in payments, replayability is key.

The second reason is that state machines are a bottleneck for scalability.

In case you haven’t noticed, payment systems are used pretty heavily at most companies. Most of the problems of scaling money software come from the fact that they have to be strongly consistent (everything has to be accounted for) and highly available (every second of downtime is a second when the company is not selling).

State machines are an obstacle to that because they scale with the number of clients.?

State machines are a push-based system for clients.

This is when clients request from it—the work is “pushed” to the server that handles the state machine. That’s what REST APIs are all about. Push-based systems are common, and they’re force-fed to engineers when they’re undergraduates.

But there’s another kind of system, the antithesis to push-based.

This is when the server leaves a breadcrumb on an intermediary, ready to be reconstructed, much like we reconstruct our way to where we want to go on Google Maps by driving.

The app has already cached all the instructions on our phone. Which is why you can use Google Maps in airplane mode.

The server is then responsible for pushing all breadcrumbs to the intermediary, usually a durable queue. And the client is responsible for pulling all that data, and reconstructing the current state from it.

Pull-based systems are replayable. Payment systems are requested data frequently, and by many services, internal and external.

In order to have payment systems that are pull-based and replayable, we must describe changes in state in terms of events.

You keep using that word…

The truth is, I don’t like the word event. This is the best definition I could find:

An event is a statement that something interesting has occurred

Here’s the thing: this definition is at the core of what makes events tricky.

First, the fact that something “has occurred” means that there is an inherent synchronization problem when the client handles events.

Remember the Marco Polo game? For all its inconveniences, requesting the current state from the server is an idempotent operation by design. Repeatedly shouting "Marco" won't make the person responding with "Polo" feel like they've moved.

Compare that with Google Maps telling you to turn left on the first corner and then right, only to find out that it should have been the other way around.

With state-machines, the server is responsible for the reconstruction of the current state (duh!). But event-driven servers push that responsibility to the client.

You’re getting scalability by forcing the client to accept more responsibility.

Second, the fact that “something interesting” has occurred when an event is created means that there’s a degree of domain knowledge that is imposed on the client.

In other words: reconstructing state from a stream of events needs a modicum of domain knowledge.

That’s what state-machines abstract clients from! If you have a payment system that only exposes the state of a payment, the client only needs to make sure that whenever the state changes to “finalized” or “paid” or “success”, it does what the client is meant to do.

State machine payment systems hide as much information as possible from the client.

However, I don’t think they should.

I think what’s missing from state machine payment systems is a consistent definition of what it means that a payment is in a certain state. What steps were made to get it to where it is, so to say.

Reconstructing a payment’s state is the same thing as defining what it means to be in that state.

Rather than hiding that information from the client, payments engineers should build common libraries that make the process of state reconstruction consistent across the client-server divide.

Copy pasting code works, yes. Until it doesn’t.

The Nick of Time

“‘Where did you go to, if I may ask?' said Thorin to Gandalf as they rode along.

‘To look ahead,' said he.

‘And what brought you back in the nick of time?'

‘Looking behind,' said he.”

― J.R.R. Tolkien, The Hobbit

Integrating with payments API is difficult and error prone. And I believe that’s because most of them are state-machine based.

It doesn’t matter if Stripe’s goal is to “abstract away the complexity of payments”. In the end, you either have 7 lines of code that are obvious, but too simple, or a PaymentIntent API that’s adequate, but no longer friendly.

What I find most useful about events is that they are individually obvious, and collectively powerful.

  • Events are linked to a specific action by one of the services involved: when something specific happens, a specific event gets created.
  • But they also stack into a story, one that can be debugged, understood, and reported.

Plus, state can be reconstructed from events, but not the other way around.

You can have events on a pull-based system, but you can also have a push-based API that exposes the state, reconstructed from all the collected events. And, if that reconstruction is lengthy and resource intensive, you can cache it, right until you collect a new event.

Not only caching works, it is straightforward when the cache is no longer valid.

Events force clients to be smarter because they have to reconstruct the state of the payment. But payments engineers can accept that responsibility with a common library!

I don’t think there are many good reasons to keep using state machines in payments. Events are scalable and replayable, and their problems can be mitigated.

It’s a matter of giving clients what they need, rather than making sure they stay inside your walled gardens.

But that’s not a software problem anymore.


PS: Subscribe for free to The Payments Engineer Playbook to receive new articles in your inbox every Wednesday.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了