Understanding Single Threaded applications Scalability

Understanding Single Threaded applications Scalability


Rocket.Chat is built on Node.js and, by default, runs in a single-threaded environment. This means it primarily uses a single core, limiting its ability to leverage multiple CPU cores out of the box.

Scaling Node.js (and Meteor-based applications like Rocket.Chat) often requires a different approach than simply adding CPU units, due to this inherent single-threaded nature.

CPU Limitations in Single-Threaded Node.js Applications

Single-Core Limitation

Since the core execution thread of Node.js runs on a single core, adding more CPU resources (cores) without adapting the architecture will not inherently increase the application's throughput. You may observe marginal improvements due to non-blocking async operations (like I/O), but CPU-bound tasks won’t see a substantial gain.

Event Loop & Blocking Operations

CPU-bound operations or blocking the event loop can be particularly problematic in single-threaded environments. Additional CPU cores don’t alleviate these bottlenecks unless you employ worker threads, clustering, or parallel processing.

Effective Scaling Strategies for Rocket.Chat

Clustering

Using Node's cluster module, you can fork multiple child processes, each on a separate core, enabling Rocket.Chat to handle more concurrent requests by distributing them across all available cores. Rocket.Chat generally supports this via PM2 or Docker, both of which can spawn multiple instances of the app to maximize CPU usage.

With cluster mode PM2 allows networked Node. js applications to be scaled across all CPUs available, without any code modifications. For example, if you have server with 8 CPUs you can run 4 instances of Rocket.Chat and increase the performance and reliability of your application.

Horizontal Scaling

Rather than scaling vertically (by increasing CPU on a single instance), horizontal scaling (adding more instances of Rocket.Chat behind a load balancer) is usually more effective for a Node.js app. This approach lets you scale out to handle more traffic and user load across multiple server instances.

Optimizing with Microservices

For specific, high-demand functionalities (>400 concurrent users) (e.g., message processing, real-time notifications), breaking out certain services (like the Rocket.Chat Stream-Hub microservice) into separate, dedicated instances will allow these components to scale independently and distribute load.

This is the ultimate scaling in Rocket.Chat, here's a breakdown of the microservices.

In a Rocket.Chat microservices deployment, each service plays a specific role to handle different aspects of functionality and scaling, especially under heavy concurrency.

Accounts Service

This service manages user authentication, registration, and login processes. It handles tasks such as user creation, password validation, and session management.

Accounts Service - Impact under heavy concurrency

Since user authentication is a critical entry point for any application, heavy concurrency will increase the load on this service. Efficient caching mechanisms and distributed session management are important to ensure quick user authentication and minimize bottlenecks. This service often integrates with identity providers like LDAP, OAuth, and SAML for single sign-on (SSO), which might add extra load in scenarios with high user traffic.

Authorization Service

The Authorization Service handles permission checks for users and bots, ensuring that each user can only access specific resources and functionalities according to their roles and permissions.

Authorization Service - Impact under heavy concurrency

High levels of concurrent requests will demand quick, scalable access to permission-related data. If permissions are complex or involve checking multiple roles across different channels, this service can become a bottleneck. Proper scaling or caching of authorization data is essential to handle spikes in permission checks efficiently.

Presence Service

This service monitors the presence status (online, offline, away) of users and keeps the system updated with users’ availability in real time. It sends updates when users go offline, idle, or return.

Presence Service - Impact under heavy concurrency

Presence updates are frequent and can cause significant load when thousands of users are online simultaneously. The Presence Service must scale horizontally to efficiently process these frequent status updates and reduce latency in notifying other users about the availability of their contacts. Network overhead and real-p.p1 real-time communication make this service one of the critical points under heavy load.

Rocket.Chat Core Service

This is the main application service handling the core logic of Rocket.Chat, including message handling, channel management, and serving the UI. The Rocket.Chat service is the central node that interacts with most other services, so it must scale horizontally under high concurrency to handle chat messages, user interactions, and real-time updates across channels.

Rocket.Chat Core Service - Impact under heavy concurrency

Message processing and database interaction can lead to higher latency without proper optimisation. Load balancing across multiple instances is often required to manage increased traffic. MongoDB plays a crucial role in keeping this service instances healthy, so fast disks and a proper MongoDB infrastructure is a must have in complex heavy concurrency scenarios.

NATS (Messaging System)

NATS is a high-performance messaging system used for communication betweendifferent microservices. It handles event-driven communication and broadcasting messages between services like presence, accounts, and authorization.

NATS- Impact under heavy concurrency

NATS enables decoupled and efficient communication across services. Under heavy concurrency, the speed and reliability of message delivery are critical, as delays or message losses can cause inconsistent states in services like presence or authorization.

NATS needs to scale effectively with the message throughput to avoid becoming bottleneck.

StreamHub

The Streamhub handles the real-time streams of data, including chat messages and user presence updates. It plays a key role in keeping clients (such as browsers or mobile apps)synchronized in real-time.

StreamHub - Impact under heavy concurrency

Streamhub is crucial for delivering real-time updates to users. Under heavy concurrency,efficient streaming of data and quick updates are essential. If not optimized, the service could suffer from latency or dropped messages, leading to poor real-time chatexperiences. Efficient message queuing and delivery mechanisms is necessary to handle large numbers of concurrent users and messages.

As we can see, in many cases, just adding CPU cores to a single-threaded Node.js instance won’t yield significant performance improvements. Instead, clustering, horizontal scaling, and service decomposition are more effective strategies for scaling Rocket.Chat.

DDP Streamer

DDP (Distributed Data Protocol) is used to synchronize data between clients and the server in real-time. The DDP Streamer facilitates the push of updates to connected clients when relevant data changes, such as new messages or status updates.

DDP Streamer - Impact under heavy concurrency

The DDP Streamer is critical for ensuring that updates are propagated to clients in real-time, especially when multiple clients are connected and active simultaneously. As user counts increase, this service must handle a large volume of updates without significant delays. Bottlenecks here can result in clients not receiving timely updates or experiencing inconsistent data. Optimized data streaming and efficient use of WebSocket connections are necessary to maintain real-time communication quality.

要查看或添加评论,请登录

Luis Cabaceira的更多文章

  • Best Practices for Rocket.Chat

    Best Practices for Rocket.Chat

    Hi all, im back to writing a tech article..

    1 条评论
  • My Last Day at Alfresco Hyland

    My Last Day at Alfresco Hyland

    This story starts 10 years ago, I was working in Barcelona for OpenText and my career needed a change, i felt i was…

    45 条评论
  • Stop, Rewind, Play

    Stop, Rewind, Play

    Reviewing the last 7 Years with Alfresco, i've compiled an interesting list on some of my works that eventually got…

  • Geo-Distributed Alfresco - AWS - Let's replicate ?

    Geo-Distributed Alfresco - AWS - Let's replicate ?

    Today i was in WebSummit in Lisbon assisting an AWS partner workshop on "How to build multi-region applications in the…

  • SAML & Alfresco - From zero to hero

    SAML & Alfresco - From zero to hero

    In this article i will show you some basic and advanced SAML concepts and a real example showing how to use SAML…

    3 条评论
  • Cheetah method to application Performance

    Cheetah method to application Performance

    Hello friend, today we will be using nature observation to explain you the Cheetah method to achieve optimal…

社区洞察

其他会员也浏览了