How to Share Data Between Microservices on High Scale!

How to Share Data Between Microservices on High Scale!

Thanks To : https://medium.com/@siddharth.ram85/how-to-share-data-between-microservices-on-high-scale-4db831fcac53


Several approaches suitable for a scalable system considering tradeoffs between availability and consistency


Prologue

Fiverr ?is an online marketplace for freelance services. Sellers offer their services, or?Gig?— e.g, graphic design, translations, web programming — for buyers to order.


When buyers search for Gigs (services) on Fiverr, they are attracted to sellers that appear trustworthy based on their seller level, reviews, and portfolio displayed on their Gig pages. Buyers reported that the seller’s past experience is the most important factor in their decision. As a result, we’ve recently launched the?Top Clients?feature, which allows sellers to display the high-quality brands they’ve worked with.

No alt text provided for this image

Top clients on the seller’s gig page

Motivation

The?Top Clients?feature MVP (minimum viable product) began with displaying the seller’s top clients in just one touchpoint on the site — their gig page (buyer view).

For the ad hoc implementation, we created a dedicated service (the “source of truth”) to handle the top clients data.

When a seller adds a new top client, the data is saved in the Top Clients service database. To render the seller’s gig page, the Gig Page service sends an HTTP request to Top Client service to fetch the top clients data.

No alt text provided for this image

MVP architecture (image by the author)

The MVP showed promising signs, yay! So we decided to increase exposure by?displaying the sellers’ top clients on Fiverr’s top-of-funnel pages?— the most popular pages on the user’s journey, with tens of millions of visits every day (e.g., user page, search results page). Displaying the data on multiple pages on Fiverr brings further complexity and challenges.

Because the MVP focused on a single page on Fiverr, we could rely on the Top Clients service to handle everything. The expansion, however, included a growing number of pages with varying scales, including the most popular Fiverr page; all of this requires high availability.

We use a microservices architecture in which an application is structured as a collection of loosely coupled services.

In this blog, we’ll talk about services that serve the top-of-funnel pages on our website (e.g, user page, gig page).?Each service is responsible for a specific view (page) on Fiverr.?Their databases store a read-optimized date-view (according to their page needs) and have all (or most of) the data needed to render the page.

Constraints

We had three main constraints:


  1. Self-view mode should be highly consistent?— self-view mode (where a seller can view and edit their profile, including their top clients) must always be up to date; discrepancies are not an option. The data will be taken from the Top Clients service DB, where all reads and writes occur.
  2. Other pages can be eventually consistent?— aside from the self-view mode, other pages such as the gig page or the user page may display data with a delay. For example, if a seller has just added a new client, this client may not appear on other pages right away.
  3. Top Clients service should not impact availability?— the seller’s top clients will usually be presented on the most popular pages on Fiverr. Therefore, if the data could not be fetched, then the page should be rendered successfully regardless of the missing data.


Alternatives, alternatives everywhere

Synchronous

This is the same as the ad hoc solution — each service sends an HTTP request to get a seller’s top clients and waits for a response. Reads and writes are from the same DB.

No alt text provided for this image

Synchronous Approach (image by the author)

Benefits

  1. Simplicity for low scale — each service makes one API call to Top Clients service.
  2. Data consistency — all pages display the?exact?same data without worrying about consistency.
  3. Different services (consumers) don’t need to save data in their databases.

Drawbacks

  1. Lower Availability — when scaling up (such as in our top-of-funnel pages), the Top Clients service will have to handle all the load, thus becoming a bottleneck.
  2. Redundant calls — not all sellers provided data about their top clients, yet we still call Top Clients service for all sellers, which returns an empty response.
  3. Higher Latency — adding an API call (to the Top Clients service) in each page rendering might increase the page’s response time and cause latency.

Optimization

To reduce response time, we can make parallel calls to User Page service and Top Clients service. Note that this is possible only if we have the necessary information to call the Top Clients service and do not need to wait for a response from the User Page service.

Asynchronous

Our assumption is that the Top Clients service will be read-heavy. Therefore, we want to decouple the business logic (adding, modifying, and deleting clients) from the user-facing pages.


Command Query Responsibility Segregation (CQRS ) is a pattern that separates the operation that reads data (queries) from the operation that updates data (commands) by using different interfaces.

The Fiverr architecture uses CQRS alongside?event sourcing .?In our case?— when a seller adds/deletes a top client, the Top Clients service updates its DB and then publishes a message to?Kafka ?(a messaging system), describing the change that occurred. All the other services will consume that message from Kafka to update their data-view (denormalized database) accordingly.

No alt text provided for this image

Asynchronous Approach (image by the author)

Benefits

  1. High availability — each service stores the top clients data, eliminating the need for a second API call.
  2. The data in the various services will be?eventually consistent?with the source of truth data, thanks to Kafka’s high availability.
  3. Each service adjusts its own resources to manage its scale. This can be achieved by fine-tuning the CPU and memory, having a highly available database, and maintaining low query complexity.

Drawbacks

  1. Possible inconsistencies between the services’ databases. Different pages might display different top clients of the same seller.
  2. Data duplication over multiple databases.


Hybrid solution

The hybrid solution is an improvement on the synchronic alternative. One important observation is that not all the sellers provided data about their top clients; therefore, there is no need to call the Top Clients service on each page rendering.


Once a client is created, the Top Clients service will send the client ID to its consumers. Each service will save the client identifier in its DB:


No alt text provided for this image

Hybrid solution-write data (image by the author)

To display the top clients on the gig page, the Gig Page service will check whether the seller has any client IDs in the DB. If true, it will call the Top Clients service to enrich the data of that client.

No alt text provided for this image

Hybrid solution-read data (image by the author)

Benefits

  1. Call Top Clients service only for sellers with clients; avoid redundant calls.
  2. Only the client ID is saved, resulting in a smaller database.

Drawbacks

  1. The Top Clients service is still affecting the response time of the pages.
  2. Data is still duplicated in the services.


And the winner is…

Our goal was to display sellers’ top clients on multiple pages across Fiverr. The most crucial factor was that the page be highly available, even at the price of not displaying the seller’s top clients in case of failure. Therefore, we chose the Asynchronous alternative.


Summary

We implemented the?Top Clients?feature MVP, which served a single page in Fiverr, using a synchronous approach. As the product expanded, we wanted to display the data on different pages across Fiverr.


We realized that the Top Clients service (our source of truth) will not be able to handle the heavy load of all the other services, so we decided to use Kafka to share the data from the Top Clients service among all other services. In this solution, each service holds all the data required for rendering the page, with the additional data of the seller’s top clients.

It would be interesting to know the difference between a microservice and an API.

回复

Nice thoughts Omar Ismail we know a company that makes microservices management quite simple!

I'll keep this in mind

回复
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了