Graph Based API - GraphQL

Graph Based API - GraphQL

One important milestone of Transformation journey is to have Enterprise Data API Strategy and their design. Today I want to share few thoughts around APIs more from design point of view. I am not going to talk about general Design approach for end-to-end API lifecycle management or talk about very detail specifics of GraphQL or compare multiple design patterns like REST ,gRPC or GraphQL. But what I want to share today is my thoughts around Graph based approach to API design over REST based APIs which have been very popular last decade or so. I will provide specific examples of use cases in the Asset Management or Investment Management Industry and how well it can fit into Domain based or Data Mesh Architecture.

I hope you find it interesting and useful in your Architectural Journey.

Problem Statement:

Let’s think about simplistically how you would create APIs. Let’s say we have Accounts, Securities and Holdings. We might have use cases where user or application needs Accounts only, Securities Only, Holdings Only or any of the combinations of these such as Accounts plus Holdings, Securities plus Holdings or Securities plus Accounts. So basically, we need to build these combinations.

So, we will have to clear choices here:

1.Build everything in one API and expose all elements for these domains. With this approach consumer must do additional diligence to pick fields it needs. Especially in case of Web Applications this can be costly and that's where API are used typically.

2.Build Custom APIs for each use case. For example, In MuleSoft you will create an end point with 10 fields from Security Domain and 5 fields from Holdings domain. So, every time you need to add a new field, there is deployment and change management involved. On the other end

Why GraphQL:

Single key benefit I see is GraphQL lets you can define your consumption. Let’s say we expose GraphQL API on top of Securities Database and caller requests for 3 fields, it will precisely return those 3 fields back. If you need additional fields request will need modification.

Implementation Approach:

There are two ways in which you can implement GraphQL:

1. Federated Model. Individual Models can be owned by different teams and therefore is well suited for Data Mesh kind of architecture where each domain design is driven independently including Functional Design, Data Design and Technology Design.

For instance, owners of Holdings domain may not have full visibility (documentation and definition) into fields of Security Domain and vice versa. So, in this scenario each domain team will define their own Domain API and expose it through single Gateway API or One Graph. It means any client connecting to API and define datapoints they need from the schema(s). Gateway API can pull schema from individual GraphQL Domain Services and can combine them into one single schema. Once inter connection between all the schemas is defined, it can as well pull all domain results in parallel and put it into a single Schema.

For instance, consider a Portfolio API, it will contain multiple Accounts/Entities and which in turn will contain multiple holdings and their Securities. These securities can have multiple Industry classifications, ratings. That will make it a whole graph. But if you just think from Security domain point of view graph is Security (Fundamental attributes both common and Asset Class specific), Security Classifications and Security Ratings. Holdings prospective Graph is just Holdings data and for Accounts/Entity perspective graph is Account Reference data. So, Gateway API will take the results for these 3 individual services and combine it and return it as per user requirement. So here each domain has its schema defined and it can be considered its contract. The contract definition will be totally up to Domain. For example, it i their decision whether they want to define it normalized or flattened or something in between. In case of Security, we can have Security Schema, Issuer Schema, Classification Schema and Ratings Schema.

Idea is whether consumers go to database and query, or they come to the API they should get consistent results. As far as consumers are concerned, they can always take the Data Domain APIs and create their own APIs, and these can be even REST APIs or GraphQL APIs. Consumers can as well use these to Build their multi-tier APIs including Experience Layer, Process Layer and System Layer.

?Please note that we can always adding caching aspect to this, but it will just increase complexity due to updates and refresh requirements. So, if Data connections are optimized for performance, we may not even need to consider Caching. In general, as a best practice Caching should be closer to the client.

Here Schema being referred to is not necessarily Database schema but GraphQL schema or representation by individual Domains.

2. Traditional GraphQL Model. It is a monolithic application where entire database is exposed through one service, and you define how you want to pull the data. There are use cases for this approach, but we will more focus on federated model as that is an obvious choice for Domain Driven Design.

Authorization

One key point to note here is, what schemas and fields are also driven by entitlement Role Based Access Control (RBAC) and Row Level Access Control levels of the client itself. For instance, if a client is not authorized (RBAC) to access Security domain and if it requests any part of Security schema in its request, it will get an error. So, these both types of authorizations can come from underlying Data Platform like Snowflake, Redshift, Databrick. So, in this scenario response can be all or nothing. If any part of request is not authorized whole request should fail. With Row Level Access Control, you will still get the response, but only with rows you are authorized to.

Design Aspects

Let’s say we need a Portfolio/Fund/Account level response.

The request needs.

·?????? 3 columns from Holdings (Market Value, Quantity and Price)

·?????? 5 Columns from Security. (Master Security ID, Cusip, Security Name, Currency, and Security Type)

Let’s say it is a HTTP Post and in the body of the post it will return only columns requested above. So, this request is going to combine data from Holdings and Security Schema.

But if I just alter same request to have only Security fields, response will get data only from Security Schema. So single request can have data from multiple schemas or single schemas in structured way.

So now let's say we bring another domain in the mix called Security Level Analytics and modify request as follows:

·?????? 3 columns from Holdings (Market Value, Quantity and Price)

·?????? 2 columns from Analytics (Duration, Yield)

In this request we are not requesting Security data, but Graph representation is such that Analytics is inner Object to Security. So now to return Analytics data, we need to return empty shell of outer Object (in this Case Security) as Outer Object is not part of the request. So, the Graph definition and what you request is the key here.

So, there are 3 Key aspects from Design point of view:

? Query: Which is the request and how consumers are going to use the API. It shows attributes. Each domain will define it.?It determines how users interact with each Domain Service.

Example, Security Domain can only be queried by Security ID.

???????????????? Holdings can be queried using Security ID, Account Code and Date

???????????????? Account can be queried using Account Code and Date

? Resolver: It is responsible for data resolution from Sources and takes in the Request.

? Schema: How the data is defined. Example, Security Schema, Account Schema, Holdings Schema. Account Schema can be a Composite Schema, which calls Holdings Schema.

In this context if you look at REST vs GraphQL , REST is a single resolver API. It does a single parse Query, get the data, and return it. In Graph QL there are multiple resolvers. Each field can have different resolvers. For instance, in above examples, each field can have its own resolver.

Computes used will be Data Computes like Snowflake/Redshift/Databricks and application computes like Kubernetes. So multiple Application Servers and Multiple Virtual Warehouses (if Snowflake or whatever is its equivalent) will be in play. As the API calls are in parallel for multiple domains multiple Virtual Warehouses will be in play.

Irrespective of variations of implementations typical flow will include:

1.?????? Client will create Request Payload.

2.?????? Client will need be Authorized (may be using JWT Token)

3.?????? Go to Authenticator and get the token.

4.?????? GraphQL API will validate the token and ensure that client is authorized user. Token will also provide client ID. Payload will have Client ID.

5.?????? Payload will be broken into multiple pieces for multiple schemas will pass each (will have client ID and Token) to each Service.

6.?????? Each service will take Client ID and look at Secrets Manager (or wherever credentials are stored) for the client. For instance, in case Snowflake it will be User ID, Password, Virtual Warehouse, Role.

7.?????? Then query the Database and return the results to GraphQL API.

8.?????? All the results are combined and returned to the client.

Popular Technology Stack:

? Apollo GraphQL Gateway. It is a Node Application. No Code or Low Code and only customization is Service URLs. It has enough code to validate the token and pass the client ID to individual services. (Steps #4 and #5 above.) Node application will be used to build it. It knows how to combine the data together.

? Each GraphQL Service can be build using Python with GraphQL libraries. There are two types of GraphQL Applications, Schema First and ones where you must define schema programmatically.

?? GraphQL API being multi resolver and need to run lot of things in parallel will need to build Asynchronous Gateway Interface for it. These are Web Server and HTTP Framework.

? Python ORM.

? AWS Lambda for JWT validation.

? AWS Route 53

? AWS Secrets Manager.

? AWS EKS

? AWS ELB One Per each Domain.

For Unit Testing we can use PyTest or any other similar framework.

For localized development, we cannot run multiple servers together. We can have a project, which includes Docker image and includes all the multiple services Like Security, Holdings and Accounts, Gateway and deployed as a single Kubernetes environment.

Documentation

When we deploy the GraphQL API. It has UI (like Swagger) to try out APIs before you start using them. Gateway API connects to all Domain APIs and retrieves data. This UI lets you put request and you see the results back. ?You can look at each schema and its documentation as you navigate.

Other Example Use cases.

1.?????? Distribution of ESG Data. When you want to give out ESG data for downstream consumers, typically they need Security and Issuer data along with it.

For building holistic ESG capability on UI with Security, Issuer, Holdings, Accounts data.

I hope you will find these thoughts useful in your Architecture or Design work. Please feel free to comment and leave your feedback on this article. This will only help me to come up with better content. I am more than happy to engage in constructive discussion and open to be part of communities or network where I can constructively provide positive influence.

#graphql #datamesh #aws #snowflake #restapi

要查看或添加评论,请登录

Rohan Rekhi的更多文章

  • Securities Lending

    Securities Lending

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Custom Mapping Design

    Custom Mapping Design

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Portfolio API

    Portfolio API

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Solution Architecture - Evaluations & Selections

    Solution Architecture - Evaluations & Selections

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Separately Managed Account - Strategy

    Separately Managed Account - Strategy

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Solution Architecture - Cost Governance.

    Solution Architecture - Cost Governance.

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Solution Architecture - Technology Architecture

    Solution Architecture - Technology Architecture

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Separately Managed Account Lifecycle

    Separately Managed Account Lifecycle

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

  • Solution Architecture - Enterprise Application Architecture

    Solution Architecture - Enterprise Application Architecture

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

    1 条评论
  • Solution Architecture

    Solution Architecture

    Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

    1 条评论

社区洞察

其他会员也浏览了