Case Study: How Stackoverflow's monolith beats microservice performance.
Navjot Bansal
Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist
Every Software Engineer's savior Stack Overflow operates immaculately, serving around 260,000,000 (260 Million) requests per month with an average latency of 18ms. In order to serve Millions I believed we need Hundreds of instances on multiple Availability domains.
Have a look at this diagram, an overview from: https://blog.bytebytego.com/i/76744008/how-will-you-design-the-stack-overflow-website
Here it is, the whole service is hosted with the following components
A detailed overview of these components is discussed.
Scalable Monoliths: Critical Components
How is Stack Overflow utilizing these components so efficiently? There are multiple practices involved that help them achieve performance and scale, though some practices are even not coherent with the standard practices.
"An important aspect to note is that the components being a monolith the components won't scale thus the servers have to manage all load with the given resources themselves"
Ensuring Availability
Multiple Servers hosted over distributed data-centers
Availability is the primary goal for any application/web service. Stack Overflow achieves this by hosting its multi-tenant server count of 9 evenly distributed over 3 data centers. Hosting on multiple data centers is crucial as it helps with
SQL Database with Hot Standby
Monoliths have only a Database for the whole service. Similarly, for Stack Overflow they adopted a SQL Database with a Readonly standby. The Standby automatically updates itself with the Live DB server in an async fashion when the read/write operations are low.
SQL Server Cache. The SQL server loads the whole DB in memory, "THE WHOLE OF IT". It saves time for queries as it requires minimal disk operations.
Ensuring Performance
Load Balancing through HAProxies
It's not just sufficient to install multiple servers over different Availability Domains. There should be a reverse proxy that distributes the load amongst servers. Stack Overflow uses HAP Proxy for the job.
领英推荐
In simpler terms, HAProxy receives the traffic and then balances the load across your servers. HAProxy can also deal with any one of your servers failing since the load balancer can detect if a server becomes unresponsive and automatically stop sending traffic to it.
In the figure above the Fail Over proxy you see is a stand-by that replaces the Live Proxy if it encounters any issues ensuring Availability.
Caching with Redis
Redis is the L3 cache used for in-memory caching backups for the server themselves. Redis is used to store questions, answers, related questions (mapped by the tag engine), and similar.
If a server skips a cache it comes to the Redis for the info, even when the Redis missed they go to the Database Server. Any information with the Redis server is shared with all servers. I am not very sure whether the DB updates the Redis on cache misses or not.
Redis is crucial as it saves both CPU and Latency for the user by hot hitting and waiting on DB read operations.
There are also non-cache reasons we use Redis, namely: we also use the pub/sub mechanism for the web sockets that provide real-time updates on scores, rep, and dynamic values.
Quick Search with ElasticSearch Engine
Being a Q/A service, StackOverflow has to be ready with a lot of queries shooting up their search bar. This is a critical component to optimize.
To support millions of Queries 3 instances of Elastic Search engines are deployed in a Load Balanced fashion. Elastic Search is almost everyone's first choice when dealing with full-text queries these days.
Stack Overflow maintains a table called Posts which has both questions and answers in a row indexed in a timely fashion.
According to them, the rows is a small sized entry thus require little to no time with indexing and updates. Kudos to people for not adding a lot of Images to their answers :)
More on how Elastic Search fetches relevant docs from indices: https://www.elastic.co/guide/en/enterprise-search/current/engines.html#engines-index
Conclusion
This case study helped me break my bias towards "microservices are essential to serving millions of requests". With proper practices and planning any service could be scaled to serve X amount of people.
Being a monolith Stack Overflow has to be careful in ensuring proper resource allocations to deal with bursts of traffic.
They will be well of with overallocation and underutilization. For all the images attached, there's a common pattern in all systems being allocated with a lot of CPU and Memory resources which are generally functioning at ~10% of their capacity.
You would ask aren't they wasting 90% of their infra costs? Someone asked a similar question here.