登录查看更多内容

Caching - In Code or External?

Shrey Batra

Founder @ Cosmocloud, Ex-LinkedIn, Angel Investor, MongoDB Champion, Book Author, Patent Holder (Distributed Algorithms)

发布日期: 2021年12月14日

When writing applications, often we think about Caching in all sorts of applications - API services, databases, consumer apps, Android apps and what not. Caching is a beautiful technique to have your data available instantly with minimum overhead and maintenance.

Is Caching always cheap?

One thing to consider when building scalable systems is that when you build a caching layer, how costly is it going to be? Now this question has so many scenarios and variables that we can't cover them here fully. Let's try to check a few major things, and you can solve this system design Hands-On on my coding platform - Eazy Develop

Amount of Data

Obviously, the first cost factor would be the amount of data stored in any cache layer. Now as simple as it seems, it poses a few design issues.

Let's say we are building an application which caches User data (name, dob, other basic info). Now let's say we are having 100 Users in the system, we deploy say 2 instances of our application. Each User's average data size (data in cache) is about 1KB.

Now, we can easily store this cache In Memory using an In-Code approach. Using this, a library or some custom code (using Hash maps and lists and whatnot) to store the data in the application code's memory. Given 2 instances of application, both the instances would use 100KB of extra memory (RAM) to store this cache. Total memory consumption = 200KB.

Scaling to 10k Users

Now, let's say we have got scale and it scaled to 10k users and we scaled our instances to say, 5, to support more concurrent users of the application. Now given our previous approach, we use 10K * 1KB * 5 instances = 10^4 * 1000B * 5 = 5 x 10^7B = 50MB (let's assume 1000KB = 1MB for simplicity)

Total extra cache memory used by our application is 50MB for 10k users and 5 instances. Now even this looks good and might suffice. Let's increase the scale more.

Scaling to 10M Users

Now our application is widely famous and has 10M users using the application with approximately 100 instances of our application running at the backend for high availability and failover.

Jon Bonso 10 个月前

Caching Strategies in Microservice Architecture

David Shergilashvili 5 个月前

UUID v7: A New Era for Unique Identifiers in .NET 9

David Shergilashvili 1 个月前

Using our previous method of using In-Code library, we are going to use - 10M * 1KB * 100 = 10^7 * 1000B * 100 instances = 10^14 B = 1TB of Memory (RAM), divided as 10GB of memory per instance.

Now you might think, how can I get >10GB RAM Virtual machines? Even if we get it on some cloud platform, how costly would that be?

The Problem of In Code Caching

One thing you might have noticed is that we have data duplication 100x because we are storing all users all information in every instance. You might argue that why not include stickiness and only have those User's data in those instance which the User would be accessing. Theoretically, yes you can do it. Should you? Definitely not. This is too much overhead to maintain this stickiness, maintain replica for each User's instance for high availability, distributing the data fairly and evenly and what not. It'll be a nightmare.

External Caching Layer

Let's revisit the problem and use of Caching again before dwelling into the the solution. We cache our data (mostly Database calls / API calls) and those information which is not static (but stay same for some timeframe). So technically, there is 1 SOT (source of truth) being your database which being a persistent layer becomes slow on heavy read usage (Using disk instead of Memory/RAM).

So why not have an external layer / application which stores the data in its memory but makes it available for every instance of your application? It's like outsourcing the Caching logic and management to external system. It will decrease the memory required by your application by 100x factor (above example, let's ignore cache replication) and still have the same performance (network call overhead yes, but let's ignore as not very prominent).

So now with 10M users, you have 1TB of cache layer and each instance of your application would use this cache layer to fetch the data. Later, you can use a distributed cache (topic for separate article) to distribute this cache horizontally to have lower order machines (say 100 machines of 10GB memory).

Some very famous Distributed Caching layers (battle tested) are Redis, Memcached, etc.

Conclusion

So as you saw, it depends on your application use case and scenario, what type of caching system would be beneficial for you. In order to learn it with a Hands-On experience, do SignUp on my Coding Platform - Eazy Develop, which will have much more in-depth problems and challenges for you to code and solve.

System Design & Architecture

48,000 位关注者

Ankur Jat

Senior Engineer at Booking.com, Ex Microsoft | Paytm, Java | Python | NodeJS

2 年

In-memory cache in API instances is never a good idea excepts the configuration related data that is very minimal. Also caching is a kind of data so as per the 12 factor application principal you have to create separate instances for this.

2 次回应

Arvind Kumar

Strategy Planning Quality at Samsung India research institute noida

2 年

I wonder

1 次回应

Tarini Tanaya Mohapatra

2 年

A good read. If the users are less and business logic doesn't involve DB calls then in code cache is considered mostly but to support a distributed system and expensive DB calls external cache layer seems better. The cache consistency and fault tolerance is a different headache if not a managed solution ??.

1 次回应

Shaurya Uppal

2 年

Bookmarked - for my this weeks read.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Caching - In Code or External?

Shrey Batra

Founder @ Cosmocloud, Ex-LinkedIn, Angel Investor, MongoDB Champion, Book Author, Patent Holder (Distributed Algorithms)

Is Caching always cheap?

Amount of Data

Scaling to 10k Users

Scaling to 10M Users

领英推荐

The Problem of In Code Caching

External Caching Layer

Conclusion

System Design & Architecture

48,000 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Visual Data Insights Into Laravel Application Scalability

Azure Functions: Unlocking Serverless Computing Potential

Laravel’s Performance Optimization for Start-ups

The Superhero Guide to Turbocharged APIs: 5 Performance-Boosting Techniques

Integrating GraphQL with Serverless Architectures: Opportunities and Challenges

Deep Dive into Caching in Apache Spark

The Power of React Query Caching to Enhance React App Performance

Modernized API services

CQRS Pattern — Architecture Patterns

How to Build Scalable Applications with Laravel

Is Caching always cheap?

Amount of Data

Scaling to 10k Users

Scaling to 10M Users

领英推荐

The Problem of In Code Caching

External Caching Layer

Conclusion

System Design & Architecture

48,000 位关注者

E02 - Finding the best Devops & PaaS Platforms - AWS ECS

2024年10月24日

SMILe and Cosmocloud partners together: Transforming Logistics with Tech-Driven Operations

2024年9月19日

Building a Custom Link-Clicks Tracking System

2024年8月23日

Databases & Platform Mentorship Program

2024年8月21日

Building your own Event Tracking System

2024年8月10日

Hiring Tech Lead @ Cosmocloud

2024年7月24日

Moneyvest builds its Investment Platform with Cosmocloud's no-code backend

2024年7月11日

Too many microservices?

2024年6月24日

No-Code Backend - Demo

2024年5月29日

Cosmocloud Product Demo

2024年5月23日

社区洞察

其他会员也浏览了

Visual Data Insights Into Laravel Application Scalability

Azure Functions: Unlocking Serverless Computing Potential

Laravel’s Performance Optimization for Start-ups

The Superhero Guide to Turbocharged APIs: 5 Performance-Boosting Techniques

Integrating GraphQL with Serverless Architectures: Opportunities and Challenges

Deep Dive into Caching in Apache Spark

The Power of React Query Caching to Enhance React App Performance

Modernized API services

CQRS Pattern — Architecture Patterns

How to Build Scalable Applications with Laravel