Azure Durable Functions
Alperen Belgic
Founder @ Rapido Flow - Software Engineer helping streamline team operations
A quick description
Azure Durable Functions are code pieces deployed as Azure Functions whose runtime can be completely shut down in the middle of execution while keeping its state to wake up later and continue from where it stopped.
Shot down completely? How does it help?
It helps to save resource consumption. But more importantly, it makes process-based functionality development much easier.
When can we stop processing?
How is it different from async await?
Even if it relies on async/await usage and the description may make you think it is something similar to it, Azure Durable Functions are much different. While most async/await implementations & usages such as HTTP calls or DB operations release CPU resources until the operation is completed, they still keep the runtime on. Azure Durable Functions can completely turn off the server/VM it runs on.
How much pause are we talking about?
Anything between seconds to months (actually to infinity)
Why would my program wait for months to complete an execution?
Well, it doesn't have to. And, we have lots of tools to provide persistency, such as databases, queues, and blob storage. We have been using them for decades. Azure Durable Functions provide a programming approach that makes you implement your processes in a much simpler way. If you need a timer to trigger something, you can just code it in your function with Durable Functions, you don't have to use another resource to trigger your service later. If you want to get approval in order to continue processing, you don't have to keep the state in a database and provide an API to interact with it. If you want to do post-processing on a generated report, you don't have to deal with the queues.
What does it look like?
(The rest of the article requires some Azure Functions knowledge)
Azure Durable Functions are Azure Functions but there is an important thing to note. Durable functions have their own constructs, APIs, and restrictions. It internally uses a framework called Durable Task Framework. The trigger we use to run a Durable Function is OrchestrationTrigger and it provides an IDurableOrchestrationContext reference. Almost anything (except the control flow) is processed via this context. It internally interacts with the Durable Task Framework. This abstraction helps us to write our stateful code in a procedural (or rather functional) way. Check out a sample Orchestrator Function with no intended business functionality below, and explanations follow.
You can trigger an orchestrator function via an HTTP call or by another Azure Function by using orchestration client binding. I am skipping how to initiate part on this example, but notice we have OrchestrationTrigger attribute on the parameter, and the type of the parameter is IDurableOrchestrationContext (in line 15).
In line 17, we access the input of the function and we can deserialize it to a type we create. When the orchestrator function is initiated by an HTTP call, the content must be a valid JSON.
In line 19, we make an HTTP call and we do it by using the context. When this is called, an HTTP request will be initiated and then our function's runtime will be stopped. When the response is received, the function's state will be recovered and effectively it'll continue in a replicated execution context. Using the provided context parameter for the call is crucial because it both manages the life cycle of the runtime and it manages the state recovery. And this is an example of how a possibly long-running remote request wouldn't require our system to be running. Durable Function guarantees there won't be any cost occurred because of the runtime usage, as it is turned off until the response is received.
In line 22, we calculate a time for the next day in order to use the timer we use in the next line. Notice we used the helper on the context and we didn't use DateTime.NowUtc. This is again related to the state recovery mechanism. I'll give more details about how the state recovery mechanisms work later.
In line 23, we are awaiting CreateTimer method of the context. Similar to the HTTP call, our function execution will be stopped in this line until it is reinitiated by the internal timer the next day. Our function execution is completely stopped at this point.
In line 26, once it starts execution after the timer's period, this time it'll await context.WaitForExternalEvent. Similarly, this will immediately stop the function execution. This time, we need a call with the expected event name to our Orchestration Function instance to make it proceed again. An important note is that each Durable Function execution has an identity. You may set a unique value otherwise it will automatically assign a value. There is an HTTP API and an Azure Function binding that can be consumed by other functions in order to send events to a durable function instance. We can optionally expect the event sender appends a (JSON) message to the event. See the generic parameter used in the function call which will enable the message value to be converted to an object instance of that type. Once we receive the external event, our function's context will be recovered once again, receive the appended value and continue running.
领英推荐
In line 28, this time we make a call to context.CallActivityAsync. At this stage, I need to explain this new type of function called Activity functions. We may consider them as regular Azure Functions with a specific trigger called ActivityTrigger. Activity functions exist in order to be called by Orchestrator functions and they don't have the context-related restrictions that orchestrator functions have. This is again an abstraction related to state recovery. The restriction of the activity is that it can receive only one parameter as an input and it can return a single value. And again, during the call, the orchestration function stops until it receives a response. (But this time activity function will cause some cost.)
In line 31, it returns a value. From start to finish, it may spend milliseconds or months. Another concept about orchestrators to mention at this stage is that each instance keeps a status that can be queried by the HTTP API or the client binding. Using alongside the build-in status, we can set custom status as a serializable object during orchestration function execution and it can also be queried by orchestration instance ID. We can place progress information on checkpoints and provide a status that can be queried. See what it looks like.
context.SetCustomStatus(new MyCustomStatusObject());
How does the state recovery work?
When a durable function starts working, it keeps a history of each interaction with the context in the correct order and also it records what value each operation returns. While execution is being recovered, it doesn't actually continue from where it ended. It starts from the beginning and replays what happened from the beginning. It has the history of the events, and it knows that context helpers will be called in that order. When anything is called, it brings the result from the history list and it doesn't really make the intended call until it uses all the records from the history. From that point on, any interaction with the context will cause actual calls and bring fresh results. And, because it provided the same returned values up to that point, the function flow runs exactly the same and it is expected to be deterministic.
For this reason, we cannot use DateTime.Now in order to use the current time, it would always bring the actual current time. We need it to be recorded and the same value must be provided next time during the replay. Similarly, any external call must be done via context. It'll record the result and it will provide it in the next run. In this programming model, as long as we guarantee the determinism either by using context's helpers or making conscious decisions, we can assume that our program really continues from where it ended.
A little bit less boring example
There are a few scenarios that this model fits nicely. For one of them, think about a two-factor authentication scenario. You log in to a website and it wants you to enter the temporary code you receive as a text message, and you have to do this within 60 seconds. (explanations follow)
Line 19: We call an external activity to trigger SMS delivery and it returns the code included in the text message. No big deal here.
In line 23: We set up a timeout timer here, but this time we don't await it, we grab the task it returns.
In line 28: We call context.WaitForExternalEvent method to handle an "CodeSentByUser" event. But again we don't await it immediately.
Line 30 is where things get interesting. We await Task.WhenAny and provide both tasks as parameters. This line completes once one of the tasks is completed and obviously, one of them will be completed earlier than the other and it will return the completed task and assign it to the variable winner.
We know that timer's task will complete in 60 seconds. If the user submits a code in less than 60 seconds, "winner" will be the external event one. If it returns the matching value, we are done here. If it doesn't return a matching value, we'll need a new uncompleted external event task to await alongside the timeout task. We repeat it with a loop for this reason. If we cannot receive the matching code in 60 seconds then, we'll break the loop. We normally should've added some functionality to communicate with the user about the results here, possibly an activity call making some SignalR calls would work.
In this example, one of the events would be sufficient to proceed. In other cases, we may create multiple parallel calls and wait for them all to proceed. It becomes very powerful and flexible to combine and multiply these events.
Final words
I conceptually explained Orchestrator Functions -the core portion of a durable function- so far and mentioned Activity Functions briefly.
Next
There is an additional type of function (or rather a type of function trigger) included in Durable function extensions: Durable Entities. This is an even more interesting extension to Durable functions. I'd like to define these as "object-oriented runtime entities that can be persisted". This may not be the most conventional definition. I am playing with an example project in which I am implementing the entire API of the basket & order process of an e-commerce system by using only Azure Durable functions. I'd like to publish another article mentioning the usage of Durable Entities and explaining how this example works and why I want to define it this way.
I hope you enjoyed this article and I've been able to explain Azure Durable Functions. Let me know about any comments or questions!
Azure Data Engineer | Azure Data Factory | Azure Functions | Python | Postgresql | SQL | Azure DevOps | Logic Apps
1 年Thanks for Sharing Alperen Belgic. This article is informative and helpful