10 Things they DON'T tell you about REST API (Part I)
Saurav Saha
Software Engineering Professional - BPCL | Gold Medalist - NIT Mizoram | DAAD-WISE Scholar - Universit?t Bremen, Germany | Ex - ML/NLP Research {JUNLP Lab, Jadavpur Univ | ACS Lab, IIT Mandi | NLP Lab, NITMz}
In the world of micro services where distributed systems (Docker Containers & Kubernetes screaming behind the scenes) are the hottest topics of the season, the underlying usage of REST APIs play a vital role.
REST APIs provide a flexible and lightweight way to integrate applications and are emerging as the most popular way to connect components in microservices architectures.
REST is an acronym for?REpresentational?State?Transfer and an architectural style for?distributed hypermedia systems which Roy Fielding first presented in 2000 in his famous?dissertation. Since then it has become one of the most widely used approaches for building web-based APIs (Application Programming Interfaces).
REST also has its guiding principles and constraints. These principles must be satisfied if a service interface has to be referred to as?RESTful.
A Web API (or Web Service) conforming to the REST architectural style is called a?REST API (or RESTful API).
Typically, which describing REST APIs, they only tell you about these salient features which despite being guidelines are not concrete enough for designing Real-World Production Ready APIs for B2B Integration.
REST APIs communicate through HTTP (Hyper Text Transfer Protocol) to carry out database functions or operations like creating, reading, updating, and deleting information (also known as CRUD) within a resource.
In HTTP, there are five common methods that are used for communication, they are:
Important REST Specific HTTP Status Codes associated and widely used with above HTTP Verbs
200 (OK)
It indicates that the REST API successfully carried out whatever action the client requested and that no more specific code in the 2xx series is appropriate.
Unlike the 204 status code, a 200 response should include a response body.
201 (Created)
A REST API responds with the 201 status code whenever a resource is created inside a collection. There may also be times when a new resource is created as a result of some controller action, in which case 201 would also be an appropriate response
202 (Accepted)
A 202 response is typically used for actions that take a long while to process. It indicates that the request has been accepted for processing, but the processing has not been completed. The request might or might not be eventually acted upon, or even maybe disallowed when processing occurs.
204 (No Content)
The 204 status code is usually sent out in response to a PUT, POST, or DELETE request when the REST API declines to send back any status message or representation in the response message’s body.
304 (Not Modified)
This status code is similar to 204 (“No Content”) in that the response body must be empty. The critical distinction is that 204 is used when there is nothing to send in the body, whereas 304 is used when the resource has not been modified since the version specified by the request headers If-Modified-Since or If-None-Match.
400 (Bad Request)
400 is the generic client-side error status, used when no other 4xx error code is appropriate. Errors can be like malformed request syntax, invalid request message parameters, or deceptive request routing etc.
401 (Unauthorized)
A 401 error response indicates that the client tried to operate on a protected resource without providing the proper authorization. It may have provided the wrong credentials or none at all. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource.
403 (Forbidden)
A 403 error response indicates that the client’s request is formed correctly, but the REST API refuses to honor it, i.e., the user does not have the necessary permissions for the resource. A 403 response is not a case of insufficient client credentials; that would be 401 (“Unauthorized”).
404 (Not Found)
The 404 error status code indicates that the REST API can’t map the client’s URI to a resource but may be available in the future. Subsequent requests by the client are permissible.
429 (Too Many Requests)
The user has sent too many requests in a given amount of time (“rate limiting”).
500 (Internal Server Error)
500 is the generic REST API error response. Most web frameworks automatically respond with this response status code whenever they execute some request handler code that raises an exception.
502 (Bad Gateway)
The server got an invalid response while working as a gateway to get the response needed to handle the request.
503 (Service Unavailable)
The server is not ready to handle the request.
Now that we are well versed with the basics of REST APIs which are usually taught everywhere, let's deep dive into those critical parts of APIs in B2B (Business to Business) or BFF (Back-end for Front-end) Scenarios
Since for communication of micro-services which are external facing (downstream services will consume these APIs) needs to have certain additional features apart from what REST Guidelines recommend, here are some of the most crucial points while developing secure APIs for scale.
Let's take a use case of designing a set of APIs for eKYC Microservice which will be consumed by multiple B2B Partners as BFF APIs (Client to Server or Server to Server Call) or as B2B Scenarios (Server to Server Call).
Since this is a practical scenario and we've designed similar APIs , where scale will be 10-15 Million API Calls per day with a peak of 5 Million API Calls during peak hours and having a spike of 10000 TPS.
This scale has brought in the complexity of APIs to be able to perform optimally and correctly. Along with that, due to distributed systems in the picture to handle the scale requirement, tracing of errors in distributed systems and effective troubleshooting.
In the eKYC APIs developed, say we have 2 endpoints, one for Creating eKYC Id and another for updating the eKYC Status wrt the eKYC Id which was generated.?
So now we can create resource driven endpoints like this:
Endpoint 1 for generating eKYC Id will look like this:
HTTP POST /ekycs?uniqueApiRequestNo={uniqueApiRequestNo}
Enpoint 2 for updating eKYC Status against that eKYC Id will look like this:
HTTP PUT/PATCH/ekycs/{ekyc-transaction-id}/statuses?uniqueApiRequestNo={uniqueApiRequestNo}
Now that our use case based APIs are ready, let us try to understand how the 5 important points which we've learned over time after developing APIs for Population Scale impact the REST API design.
HMAC (Checksum), Partner Code and Partner Transaction ID : Hash Based Message Authentication Code (Checksum) is used for data integrity check to establish a common understanding of the Data Types and their values between Client and Server (B2B Partners). Since our APIs are multi-tenant, Partner Code usually is a uniquely identifier for that particular partner consuming the API. Partner Transaction ID is the unique transaction Id at the partner's end before the API Call was generated to get a mapping of Server Side Transaction ID and Client Side Transaction ID for End-To-End Transaction Analysis
The essence of uniqueness of the combination of Partner Code and Partner Transaction ID is that idempotent transactions, retry scenarios can all be identified with these metadata. Also, this enables partner wise business logic integration, rate limiting, custom authorization and much more.
HMAC is generally computed using SHA256 by appending all the values of the input parameters without any space followed by a Hash Secret which is specific to the partner. This computed value is that added in a field in JSON Request Body named "HashValue". This ensures that the understanding of the data types and data format is as per the contract between the Client and the Server (B2B) as the same hash is validated by API at Server End.
领英推荐
API Server generates the hash using the same method using the data from Request Payload and then compares the HashValue provided by client in the Request Body with the computed value. This might feel like an overkill, but is very essential in establishing trust and data security for API calls between two systems.
Assume we have the following Request Payload for an API and the HashSecret is h*dK9@L% then using SHA256 hashing algorithm, we get the following value and the resulting final payload would look like below:
108de50b69254e4638ebe6ff6af5fe6e7ac5735a602abb8ad1add98472474031
{
"partnerCode": "EO",
"partnerTransId": "EKYC123",
"uniqueApiReqNo": "7e6f0252-35a1-4c53-9cd3-0181fc59b8ef",
"eKycTransactionId": "11212144",
"hashValue": "108de50b69254e4638ebe6ff6af5fe6e7ac5735a602abb8ad1add98472474031"
}
HTTP Status Codes / API Status Codes (Optional) : HTTP Status Codes as mentioned above are useful for describing state of Resource based on HTTP Method. However, there are scenarios where BFF APIs are used and HTTP Status Code is not enough. This is because client has to translate the Return Message to a proper client specific message rather than displaying the actual error message returned by API Server
Usually, for B2C APIs or Internal Service to Service communication APIs/Microservices, HTTP Status Code itself is sufficient where the Business Logic is not very complex. However, when dealing with Business transactional systems where complex business logic along with volume of transactions are huge, B2B API with Partners or BFF APIs are heavily used.
In that case, to improve the user experience by displaying the actual error message to end user, a mapping of API Status Code with Message to be displayed to end users is done so that User Experience remains good and internal sensitive information doesn't get reveals for security reasons.
For example, in case there is some issue with the API for which HTTP Status Code 500 is getting returned. End-Users are not concerned with the X-Correlation-Id and won't know whom to contact. However, it is the responsibility of the B2B Partner to display a generic error message based on the API Status Code so that user can get informed that service is not available as of now, he/she can try later.
This is particularly helpful when downstream service is down.
Another use-case is of Idempotency (Success-Retry). End users are not concerned whether it's a Success or a Success-Retry as long as the operation ended up in functional success. So, a mapping of API Status Code 200 and 201 both should be mapped to "Successful" for end user and internally those API Status Codes along with HTTP Status Codes will help the B2B Partner to identify what happened actually for that transaction.
Another interesting thing which is very crucial in distributed systems is tracing each transaction therefore with the Response of every API Call, a custom HTTP Header 'X-Correlation-Id' is returned so that in case logs needs to be traced for that Request-Response pair, it can be done very easily in API layer.
Idempotency : This essentially means that the effect of a successfully performed request on a server resource is independent of the number of times it is executed.
These duplicate requests may be unintentional as well as intentional sometimes (e.g. due to timeout or network issues). We have to make our APIs fault-tolerant so that duplicate requests do not leave the system unstable.
In Endpoint 1 for generation of eKYC Id, we need to ensure that our API doesn't create multiple eKYC Id at backend if the request parameters are same. This scenario occurs because as we know from CAP Theorem, in distributed systems since Network Partitioning is inevitable, we need to choose between Consistency and Availability. We generally go with AP so that our Service is always up and then the business logic of the API needs to ensure that Idempotency due to Retry scenarios or Network Issue doesn't occur.
One practical example of this is that, let's say Client called the Endpoint 1, API has successfully processed the request, however due to some issue in any of the intermediate devices or intermittent network issue, the client could not get the HTTP Response. Hence, if they are retrying the same API call with the same TransactionID, we need to consider that previously success transaction and return back the same details as if it's the first call with a different API Status Code and HTTP Status Code to segregate between first call vs retry call.
Refer below matrix of HTTP Status Code & API Status Code:
Had there been no Idempotency logic in the API, we would have got HTTP Status Code 201 for the first API call and for the retry/subsequent API call with the same transaction ID and details, we would have received HTTP Status Code 400 as Bad Request mentioning "eKYC Id already created" which is not correct as the client doesn't have the already generated eKYC Id.
Refer below HTTP Request payload where the Transaction Details are sent without any change except the change in uniqueApiReqNo in case of Retry Call/Idempotency
Distributed Transactions Tracing : It's a method of tracking application requests as they flow from frontend devices to backend services and databases. This process starts with one request (considering each request as a trace) that receives a unique ID (trace ID) that identifies this specific transaction
Each and every API Call (HTTP Request) should be uniquely identifiable and traceable so that we don't end up with ghost transactions (Transactions that exist at Client end but have no trace at Server end). There are multiple approaches to achieve this, however the best way we found for this is to use a GUID/UUID (Uniquely Identifiable Identifier) which can be sent in any API in the URL/Endpoint as query string or as custom HTTP Header. We recommend sending it in URL as query string because URLs get logged in API Server access logs, proxy, firewall, API Gateway, load balancer and other intermediate devices. This helps in tracing to what extent a request has reached the infrastructure layers and drill down to pin point from where issue is actually arising.
Based on years of experience developing APIs for Population scale systems, we recommend the following query string in any API endpoint as shown in Endpoint 1/2. The benefit of this approach is that no changes are need at the API Level for explicit logging, it comes pre-baked in all the API Servers like IIS, Apache Tomcat, Kestrel, JBoss etc.
HTTP METHOD API_ENDPOINT?uniqueApiRequestNo={uniqueApiRequestNo}
Tolerance Limit of Time / Retry / Asynchronous Processing for Failed Transactions : Tolerance Limit is a configurable hard-limit as per business logic beyond which a transaction should not be considered for processing. This is essential from security point of view to tackle replay attack. Retry of failed transactions are another design paradigm where failed transactions are re-tried N number of times either in real-time or after an interval trying to close the open transaction. However, for some reasons if abruptly something happens in network or from client end and retry of API isn't possible, a WebHook (API Endpoint) is provided generally with Bulk Request functionality for updating the status of failed transactions asynchronously.
Let's take the scenario that the Endpoint 1 was successfully called and the eKYC Id was successfully generated. However, due to some network issue or client behaviour, Endpoint 2 for updating the status of the eKYC as Success or Failure could not be done on synchronous API calls. We need to keep some tolerance value say max upto N minutes after the creation of eKYC Id we will allow the updation of it's status. Otherwise, this might result in fraudulent transactions.
An approach taken for improving the UX of the end user is to do N number of retries in backend and show the updated status of the transaction to the user when he/she returns to the App after certain period of time.
However, issue is that in case, the service related to Endpoint 2 has some issue, there will be multiple users facing the same issue and one-by-one retry will take considerable amount of time to get updated.
There comes is a bulk API (aka WebHook) for the rescue. Based on previous experience, we found that almost 20% transactions fail in any synchronous systems heavily dependent on Up-Stream or Down-Stream services, hence alternative mechanism is a must to handle these scenarios.
The WebHook endpoint supports bulk input of such cases and then does a one shot update and share the result. This has the benefit of very low latency and less network intervention.
Because webhooks follow the Observer Design pattern and works with queuing mechanism, even though the underlying service is down, the requests will be queued and processed when service is up and thereby no loss in data, which otherwise is a challenge in normal synchronous APIs.
The client doesn't have to keep on polling every time thereby the API resources and network bandwidth are saved thereby saving cost and time.
Once processing is complete at Server end, the client will get notified.
Since the post has become really lengthy, remaining 5 will be in the Part II of the article
#restapi #apis #apidevelopment #apidesign #practitioner #systemdesign #lld #technology #restfulapi #realisticexpectations #practicalexperience #softwareengineering #softwarearchitecture #scale #roadmap #guidance #mentorship #practicalguidance
What would you read next, let me know in comments below ?
Also I write a weekly newsletter to teach Realistic Systems design here: https://www.dhirubhai.net/newsletters/realistic-systems-7146486009650749440/
If you liked this post.
?? Follow: Saurav Saha for more such content
?? Subscribe: Realistic Systems
? Repost to help others find it.
?? Save it for future reference.
?? References:
ISTQB Certified | Certified Jenkins Engineer | SDET | DevOps |Cypress | WebdriverIO | Playwright | Selenium | Appium | Web,Mobile,API testing | Android/iOS | BDD-Cucumber | Karate | Rest Assured | Java,Java Script,Python
10 个月Amazing !!!
Cloud Architect | Co-Founder & CTO at Gart Solutions | DevOp, Cloud & Digital Transformation
10 个月Your insights on building secure APIs for scale are incredibly valuable! Thank you for sharing. ??
Inspiring Educator & Mentor in Computer Science | Driving Technological Fluency & Innovation | Enhancing Educational Outcomes through Innovative Teaching
11 个月This is amazing.. I would like to know about rest API a bit detailed analysis in spring boot.. lots of love..
Software Engineering Professional - BPCL | Gold Medalist - NIT Mizoram | DAAD-WISE Scholar - Universit?t Bremen, Germany | Ex - ML/NLP Research {JUNLP Lab, Jadavpur Univ | ACS Lab, IIT Mandi | NLP Lab, NITMz}
11 个月Saran Kumar K Saurabh Sharma Satish Yadav Siddhartha D Realistic Systems Remaining 5 Topics: ?? API Versioning ?? Rate Limiting ?? Authentication / Authorization (Basic Auth, OAuth2) ?? Health Check / Monitoring / Metrics ?? Logging / Handling Downstream Services Your views please ????