Active and passive API callout strategies
API failure management is critical and needs to be designed properly to ensure the API calls are successfully made as well as retried in case of failures. External platforms, frameworks and tools like Mulesoft, Spring Boot have a structured way of doing this. Inspired from Spring boot RestTemplate the following is a reference design for a robust active and passive API callout strategy within APEX.
Active strategy
This strategy is used when the callout fails for the first time and needs to be tried subsequently and immediately. The code needs to be decomposed as shown above or similarly for better separation of concerns. The actual callout is made via a Queueable inside a Base callout class. The Base class ensure that the code is applied to all callouts within the application.
The following JSON object is defined in a 'Retry metadata'.
{ "allowRetires" : "true", "noOfretires" : "3", "exponential-backoff" : "true", "retryInterval" : "1" "failover" : "RetryBatch" }
The Base class reads the above configuration for a specific callout. If the callout fails it will retry the same callout as specified by the above control flags. E.g. If noOfretries = 3, then the callout is retired 3 times with an interval of 1s between them via a recursive function.. If exponential-backoff control is true, then the callout interval follows a quadratic distribution f(x) = x^^2. If the API callout still fails after all retries are exhausted, then the whole request payload is written to a custom object 'CalloutFailureTracker'. Similarly if the queueable that made the original invocation fails for some reason the request payload gets written to the tracker object as part of 'Transaction finalizer'.
Passive strategy
In the passive strategy the callouts are retried via a batch. It is passive since the retry does not happen immediately. The retry batch will read all the failed requests written to the tracker object and then make the callouts. Since it invokes the same Base class the original active strategy gets applied as well.
There are few variables that need to be taken into account to fine tune the values of the metadata. These include
- 120s limit for the total duration of all callouts within a transaction
- 100 callout limit within a single transaction
- DMLs not allowed after a callout
- 2 queueables cannot be created within a batch