AWS Step Functions — Integration Patterns

AWS Step Functions — Integration Patterns

In a distributed architecture, we can decompose a business process or workflow into several functions and microservices. The workflow can involve complex communications, decisions and choice of outcomes depending on the application state. In the choreography model:

  • The order of business steps gets embedded in each microservice and the workflow algorithm leaks within specific business domains.
  • It is a challenge to handle failures between depending components.

AWS Step Functions offers a way to orchestrate the process by addressing the problems above. Having an executable, visual and understandable workflow also has benefits for business stakeholders. Developers and operations gain as a state-driven engine provides a lot of context to understand and resolve problems or failures.

Use Cases/Applications

AWS Step Functions can be employed at different scales. On a really big scale, it can be used to model the mission of an organization to fulfill a large objective. Alternatively, it could also be deployed for smaller scale use cases where we recognize a series of ordered steps that need to occur to perform a function.

Some potential use cases include -

  1. Any workflow that can be categorized as work orders or sequential/parallel spanning various services/systems.
  2. Access data from various data stores (including serverless): Step Functions could act as a data aggregator and processor collecting data from various sources and perform processing.

Alternative Technologies

  • AWS SWF (Simple WorkFlow) also provides workflow creating capabilities but increases the complexity of developing applications and doesn’t provide a visual interface and easy integration with other services.
  • jBPM: is an open-source workflow engine written in Java that can execute business processes. However, it has limitations when it comes to executing serverless/cloud-native applications

To learn how to Get Started and Running with Step functions, the Developer Guide at AWS is pretty extensive at https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html. While tackling use-cases involving several integrations to a mix of cloud-native/hybrid/on-prem solutions, the following Integration patterns could help.

Step functions Integration Patterns:

AWS Step Functions have in-built support for various AWS resources as shown in https://docs.aws.amazon.com/step-functions/latest/dg/connectors-supported-services.html

Integration — Synchronous

The services commonly used for synchronous communication are

Integration — Asynchronous

Steps that need to integrate with long running systems or manual processes should be implemented with asynchronous actions. There are a few patterns with with these could be achieved:

Job Poller Pattern

Reference: https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-job-poller.html

This is a traditional solution with a poll for job status. This implements busy wait for long running jobs. One way to do this would be to dispatch the message/work to a queue/message broker with token/identifier. Another system can pick this message, execute and then post a message in the response queue. Meanwhile the step function can implement a wait with a loop to poll for the response message to be received. Example state machine below:

Advantages -

  • Works well for simple use case
  • Easy to implement with a synchronous outlook.

Disadvantages -

  • Loops and eats up AWS execution history limits
  • Not easy to determine wait time.
Job Poller Pattern


Activity/Worker pattern

Reference: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-activities.html

Activities are a way to have work executed by workers running somewhere for a specific task in a state machine. An activity step waits until a worker picks up the execution using the activity arn. The worker is constantly polling for work by using GetActivityTask and gets the input with a taskToken for the task to be executed. After processing, it reports back success or failure status with any output and taskToken to the activity, once the status is received the state machine can proceed.

Advantages -

  • Offloading of work to external system with built-in wait and conserving execution limits. Activities have built in retries and error handling.

Disadvantages -

  • Long polling workers running on servers use compute and time
Activity/Worker pattern

Pull model using Activities with Serverless workers

Reference: https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/

This pattern can be used for workflows involving manual intervention steps. Here, an activity worker runs on a scheduled lambda backed by cloudwatch events. On receiving work, the lambda delegates the work to a processor or drops the message to a queue/email and waits for user action/processing. Once the action is taken, it invokes success or failure API activity endpoints with the taskToken to notify that the step was complete.

Advantages -

  • Suited for manual steps
  • Completely serverless, loosely coupled, easy to maintain and tweak components.

Disadvantages -

  • Scheduled lambda needs to be timed correctly, poll method often times use resources/cost.
Pull model using Activities with Serverless workers

Push model using Activities with Serverless workers (using Activity as a wait condition)

Reference: https://medium.com/semantive/part-1-asynchronous-actions-within-aws-step-functions-without-servers-f58e030a0e8b and https://medium.com/semantive/part-2-asynchronous-actions-within-aws-step-functions-without-servers-e2ef26aa75d9

This method does not need to implement a long polling worker to process the activity. It starts by splitting the asynchronous task into 2 parallel tasks: one of which starts the asynchronous execution on the Activity, this generates the taskToken and then invokes processing either by invoking a nested state machine or lambda or pushing the message to the queue. The processor/nested state machine/consumer processes and then calls the success/failure endpoint with the taskToken. Meanwhile, the second parallel task is just a wait state that waits for the asynchronous action to be complete. Once it sees the activity response, it can then move to the next step.

Advantages -

  • Completely serverless, loosely coupled, good way to run nested state machines, no need to implement workers

Disadvantages -

  • Pass around/maintain task tokens between state machines
Push model using Activities with Serverless workers

Callback pattern (using SQS)

AWS just released this new pattern and it jives very well with being event-driven, loosely coupled and message-passing asynchronously. A task that requires invoking and waiting for an external system or a manual process can now post the payload to a queue with taskToken. It then waits or heartbeats and times out depending on the configuration. Meanwhile an external system (could be AWS lambda, ECS microservice etc) can pick up the message from the queue, finish processing and then call the SendTaskSuccess or SendTaskFailure endpoint with the taskToken. The waiting task receives the message and then resumes processing based on the response.

Advantages -

  • Loosely coupled, no need to use activities or workers, event-driven using queues

Disadvantages -

  • Can only use AWS created resources for queueing/notifiyng with taskTokens
Callback pattern (using SQS)

Please like if you think this article was helpful. Also, please comment your feedback below or anything in specific with AWS Step Functions you wanted to know.

Note: The reference articles that helped me put together this piece are noted inline, thanks to each of them.


Lyju Edwinson

Technical Architect at Foxtel Group

3 年

Now that Step fns supports integration to Eventbridge, it could be a better choice for the Callback pattern.

Juan Soto

Data-Driven Solution Engineer

5 年

Well explained Andy!

要查看或添加评论,请登录

社区洞察