登录查看更多内容

How Big Techs Manage High Volume of Real-Time Data Streaming with Node.js

Jo?o Victor Fran?a Dias

Senior Fullstack Software Engineer | Typescript | Node | React | Nextjs | Python| Golang | AWS

发布日期: 2024年9月3日

In the current technology landscape, where data is generated and consumed at unprecedented volumes and speeds, big tech companies face significant scalability challenges. An effective solution to this problem is the combination of message queues with the efficient architecture of Node.js. This article explores how technologies like RabbitMQ and Apache Kafka, integrated with Node.js, are used to efficiently manage large volumes of real-time data.

RabbitMQ and Apache Kafka are message queue systems that enable the processing of continuous and high-volume data streams, essential for applications requiring real-time analysis and response. RabbitMQ is known for its low latency and ability to send thousands of messages per second, making it ideal for tasks that require quick point-to-point message delivery. On the other hand, Kafka is designed to handle extremely high throughput, transmitting millions of messages per second, making it suitable for processing large volumes of real-time data.

The architecture of Node.js, with its non-blocking I/O operations and event-driven model, complements these queue systems perfectly. This allows Node.js applications to efficiently consume data from Kafka or RabbitMQ while maintaining the ability to scale and respond quickly to real-time data demands. The integration of Node.js with these queues facilitates the construction of resilient and scalable systems that can handle the demands of modern data streaming applications, such as those used by big tech companies to monitor and react to events in real-time.

Building a Real-Time Data Consumption Architecture with Node.js and RabbitMQ

In this practical example, we will build a simple application that serves as a proof of concept to demonstrate how big tech companies can handle high volumes of real-time data using Node.js and RabbitMQ. The application aims to consume random numbers that are introduced into a queue by another application and display them in real time on the frontend of a user connected via WebSocket. As shown in the diagram below:

The architecture consists of a backend in Node.js that connects to RabbitMQ to consume data from the queue and transmit it to connected clients via WebSockets. This approach allows data to be processed and distributed efficiently and in real-time, ensuring that users receive the information without delays, even in high-demand scenarios. We will explore how this architecture works and how it can be implemented to solve scalability and real-time data processing issues.

Connecting Node.js to a RabbitMQ Queue: The First Step

To start integrating Node.js with RabbitMQ, we will use a class called Queue, which encapsulates all the necessary logic to connect to a RabbitMQ queue, consume messages, and emit events to other parts of the application. This class is built on top of the amqplib library, which is one of the most popular libraries for working with RabbitMQ in Node.js.

Code:

import amqp from 'amqplib'
import { EventEmitter } from 'stream';
import { QUEUE_HOST, QUEUE_NAME, QUEUE_PASS, QUEUE_PORT, QUEUE_USER, QUEUE_VHOST } from './constants';

export class Queue extends EventEmitter {
  private url: string;
  private channel: amqp.Channel | null;
  private static instance: Queue
  constructor() {
    super();

    this.url = `amqp://${QUEUE_USER}:${QUEUE_PASS}@${QUEUE_HOST}:${QUEUE_PORT}${QUEUE_VHOST}`

    this.channel = null
  }

  static getInstance() {
    if (!this.instance) {
      this.instance = new Queue()
    }
    return this.instance
  }

  addEvent(event: string, cb: (...args: any[]) => void): void {
    this.on(event, cb)
  }

  async consumeQueue(queueName: string, event: string) {
    if (!this.channel) {
      throw new Error('Channel not initialized')
    }

    this.channel.consume(queueName, (msg) => {
      if (!msg) {
        return
      }

      const content = msg.content.toString()
      this.emit(event, content)
      this.channel!.ack(msg)
    })
  }

  async connect() {
    try {
      const connection = await amqp.connect(this.url);
      const channel = await connection.createChannel()
      this.channel = channel
      await this.assertQueue(channel, QUEUE_NAME)
      console.log('Connected to RabbitMQ')
    } catch (err) {
      console.error(err);
    }
  }

  private async assertQueue(channel: amqp.Channel, queue: string, option?: amqp.Options.AssertQueue) {
    const defaultOption: amqp.Options.AssertQueue = {
      exclusive: false,
      durable: true,
      autoDelete: false,
      arguments: null
    }

    await channel.assertQueue(queue, { ...defaultOption, ...option })

  }

  async sendToQueue(data: string | number) {
    if (!this.channel) {
      throw new Error('Channel not initialized')
    }

    this.channel.sendToQueue(QUEUE_NAME, Buffer.from(data.toString()))
  }

}

Code Explanation:

The Queue class is designed to be a singleton, ensuring that only one instance of the connection to RabbitMQ is created during the application's lifecycle. This is important to avoid multiple unnecessary connections, which could overload the message server.

Connection to RabbitMQ: The connect() method establishes the connection to the RabbitMQ server using the provided credentials and configurations. It also creates a communication channel, which is necessary for sending and receiving messages from the queue.
Message Consumption: The consumeQueue() method is responsible for consuming messages from a specific queue. When a message is received, it is emitted as an event, allowing other parts of the application to react to this data in real-time.
Message Sending: The sendToQueue() method allows data to be sent to the queue. This is useful in scenarios where the application needs to produce messages that will be consumed by other services or parts of the system.

Custom Events: The Queue class inherits from EventEmitter, which allows custom events to be emitted and listened to. This facilitates integration with other parts of the application, such as data streaming services or controllers that need to react to new messages in the queue.

With this Queue class, we establish a solid foundation for connecting our Node.js application to a RabbitMQ queue. This allows us to consume and process real-time data efficiently and at scale. In the next step, we will integrate this queue with the rest of the application, enabling random numbers to be consumed and displayed in real-time on the frontend of users connected via WebSocket.

Step two: Implementing the Real-Time Data Streaming Service

Code:

import { PassThrough, Readable } from "stream"
import { Queue } from "./Queue"
import { QUEUE_NAME } from "./constants"

export class StreamService {

  private dataStrem: Readable
  constructor(private queueService: Queue) {
    this.dataStrem = new Readable({
      read() { }
    });
    const event = 'dataStrem'
    this.queueService.consumeQueue(QUEUE_NAME, event).catch(console.error)
    this.queueService.on(event, (data) => {
      console.log('data', data)
      this.dataStrem.push(data)
    })
  }

  public getData() {
    const clientStream = this.createClientStream()

    this.dataStrem.pipe(clientStream)

    return {
      clientStream
    }
  }

  private createClientStream() {
    const clientStream = new PassThrough()
    return clientStream
  }
}

Code Explanation:

The StreamService class is a central component in the real-time data streaming architecture, responsible for consuming messages from the RabbitMQ queue and transforming them into a continuous data stream that can be transmitted to connected clients. This class utilizes Node.js's Streams API, which is highly efficient for handling real-time data.

Integration with the Queue: In the class constructor, StreamService receives an instance of the Queue class, which has been previously configured to connect to RabbitMQ. StreamService then consumes messages from the specified queue (QUEUE_NAME) and emits a custom event called streamData whenever a new message is received.

领英推荐

Seamless Data Streaming: How to Integrate Kafka with…

Reckonsys Tech Labs 1 个月前

Tackling Kafka Consumer Latency During Peak Traffic

VARAISYS PVT. LTD. 5 个月前

Databricks Advances Spark Structured Streaming With…

Speedb 1 年前

Data Stream : StreamService creates a Readable stream called dataStream, which serves as a buffer for the data received from the queue. When a new message is consumed from the queue, it is pushed into this stream, allowing the data to be processed continuously and in real-time.

getData() Method: This method is responsible for providing a data stream (clientStream) that can be consumed by other parts of the application, such as the controller that handles WebSocket connections. The clientStream is created using a PassThrough stream, which simply passes the received data forward, allowing it to be transmitted directly to the client.

Creating the ClientStream: The private method createClientStream() creates and returns a PassThrough stream, which is used to transmit the dataStream to the client. This allows the data to be sent in real-time to any connected client without the need to store or process the data synchronously.

The StreamService class is essential for transforming the messages consumed from the RabbitMQ queue into a continuous data stream that can be easily transmitted to clients. By utilizing Node.js's Streams API, StreamService ensures that data is processed efficiently and in real-time, allowing the application to handle large volumes of data without compromising performance. In the next step, we will see how to integrate this class with the controller that manages WebSocket connections, enabling the data to be displayed in real-time on the users' frontend.

Step Three: Controlling Data Flow with the StreamController

Code:

import { Writable } from "stream"
import { StreamService } from "./service"


export class StreamController {
  constructor(private streamService: StreamService) { }

  public getData(socket: any, channel: string) {

    const { clientStream } = this.streamService.getData()

    const writeStream = this.writeStreamOnObj((chunk) => {

      socket.emit(channel, chunk)
    })

    clientStream.pipe(writeStream)

    return {
      onClose: () => {
        console.info(`closing connection of ${socket.id}`)
      }
    }
  }

  private writeStreamOnObj(func: (chunk: any) => void) {
    return new Writable({
      objectMode: true,
      write(chunk, encoding, callback) {
        func(chunk)
        callback()
      }
    })
  }
}

The StreamController is the final piece of the architecture that connects the data streaming service (StreamService) to clients that connect via WebSocket. It is responsible for managing client connections, ensuring that real-time data is transmitted efficiently and continuously to each connected client.

Integration with StreamService: The StreamController receives an instance of StreamService in its constructor, allowing it to access the data stream generated from the messages consumed from the RabbitMQ queue. This creates a bridge between the streaming service and the clients connected via WebSocket.

getData() Method: This is the main method of the StreamController, responsible for initiating the data stream for a specific client. When a client connects, getData() is called, and it obtains the clientStream from StreamService. The clientStream is then connected to a custom Writable stream, which sends the data directly to the client through the WebSocket.

Creating the Writable Stream: The private method writeStreamOnObj() creates a Writable stream that receives data from the clientStream and emits it to the connected client. This stream is configured to operate in object mode (objectMode: true), meaning it can handle data as JavaScript objects, facilitating the transmission of structured data.

Connection Management: The getData() method also returns an onClose function, which is called when the client disconnects. This allows the StreamController to clean up any resources or connections associated with the client, ensuring that the system continues to function efficiently even with multiple connections and disconnections.

The StreamController plays a crucial role in the real-time data streaming architecture, acting as the intermediary between the StreamService and the clients connected via WebSocket. It ensures that data is transmitted continuously and efficiently, managing client connections and ensuring that each one receives real-time data without interruptions. With this structure, the application is capable of handling multiple simultaneous clients, maintaining the scalability and efficiency required for modern data streaming applications

Step Four: Managing WebSocket Connections for Real-Time Data Streaming

code:

import { Server } from "socket.io"
import { StreamController } from "./controllet"
import { StreamService } from "./service"
import { Queue } from "./Queue"

export const mountRuter = async (io: Server,) => {
  const queue = new Queue()
  await queue.connect()
  const streamService = new StreamService(queue)
  const streamController = new StreamController(streamService)
  io.on("connection", async (socket) => {
    console.log(`user connected: ${socket.id}`)

    const { onClose } = streamController.getData(socket, 'stream_data');

    socket.on("disconnect", () => {
      console.log(`user disconnected: ${socket.id}`)
      onClose()
    })
  })
}

The provided code is responsible for managing WebSocket connections and integrating the real-time data stream with connected clients. It begins by establishing a connection to RabbitMQ through the Queue class, followed by the creation of instances of StreamService and StreamController, which are responsible for consuming and transmitting the data. When a client connects via WebSocket, the StreamController initiates the real-time data transmission to the client through the stream_data channel. The code also handles client disconnections, ensuring that resources are properly released by calling the onClose function when the client disconnects. This ensures that the system continues to operate efficiently, even with multiple connections and disconnections

Step Five: Demonstrating the Real-Time Results

Now that we have explored the architecture and implementation of the real-time data streaming system, it's time to see it all in action. In the following video, available at this LINK, you will see a practical demonstration of how the application consumes random numbers from a RabbitMQ queue and transmits them in real-time to clients connected via WebSocket. The demonstration illustrates the efficiency and smoothness of the system, showing how the data is processed and instantly displayed on the frontend, providing a responsive and dynamic user experience.

Conclusion

In this article, we explored how big tech companies manage high volumes of real-time data using Node.js and RabbitMQ. Through a proof of concept, we demonstrated how a message queue-based architecture can be implemented to efficiently and scalably consume, process, and distribute data. From setting up the queue to transmitting data to clients via WebSocket, each component plays a crucial role in building a robust and responsive system. With this approach, it is possible to ensure that data is processed in real-time, providing a smooth and reliable user experience, even in high-demand scenarios. For those interested in accessing the code and exploring the implementation further, the repository for this project can be found at https://github.com/joao99sb/nodejs-queue.

Danilo Pereira

6 个月

Very helpful

1 次回应

Lucas Wolff

.NET Developer | C# | TDD | Angular | Azure | SQL

6 个月

Great article Jo?o Victor

1 次回应

Diogo de Souza

6 个月

Excellent post! You've provided a comprehensive overview of how to manage high-volume real-time data streaming with Node.js and RabbitMQ. Thanks for sharing, Jo?o Victor Fran?a Dias!

2 次回应

Idalio Pessoa

6 个月

Fascinating article on how big techs manage high volumes of real-time data streaming with Node.js! The use of message queues like RabbitMQ and Apache Kafka is a game-changer for scalability and efficiency.?

1 次回应

Ezequiel Cardoso

6 个月

Great content

1 次回应

查看更多评论

要查看或添加评论，请登录

Jo?o Victor Fran?a Dias的更多文章

Simplifying Global State in React Applications with Context API: A Practical Guide

2024年9月17日

Simplifying Global State in React Applications with Context API: A Practical Guide

In React applications, data is often passed from parent components to children through props. However, when many…

27 条评论
Mastering Promise Management in JavaScript: A Practical Overview

2024年9月11日

Mastering Promise Management in JavaScript: A Practical Overview

In modern JavaScript application development, efficient management of asynchronous operations is crucial for…

21 条评论
The Design Pattern That Will Save Your Project's Life

2024年8月26日

The Design Pattern That Will Save Your Project's Life

In the current digital era, maintenance and scalability of software are more than just necessities; they are…

14 条评论
How to Process Large Volumes of Data Without Overloading Your Application in Go: Efficient and Practical Strategies

2024年8月19日

How to Process Large Volumes of Data Without Overloading Your Application in Go: Efficient and Practical Strategies

Introduction In today's software development landscape, handling large volumes of data efficiently is not just an…

13 条评论

How Big Techs Manage High Volume of Real-Time Data Streaming with Node.js

Jo?o Victor Fran?a Dias

Senior Fullstack Software Engineer | Typescript | Node | React | Nextjs | Python| Golang | AWS

Building a Real-Time Data Consumption Architecture with Node.js and RabbitMQ

Connecting Node.js to a RabbitMQ Queue: The First Step

Code:

Step two: Implementing the Real-Time Data Streaming Service

Code:

领英推荐

Step Three: Controlling Data Flow with the StreamController

Code:

Step Four: Managing WebSocket Connections for Real-Time Data Streaming

code:

Step Five: Demonstrating the Real-Time Results

Conclusion

Jo?o Victor Fran?a Dias的更多文章

社区洞察

其他会员也浏览了

Topics – The Redpanda Newsletter (Issue #022)

StreamNative October Newsletter

Unveiling Kafka: An Introduction to Event Streaming

BYOC Self-Service: Greater Control and Convenience

Ursa engine is now here!

Beyond Integration: Navigating the Future of Data Strategy with Event Streaming and Adaptive Pricing Models

"Apache Kafka: The Powerful Real-Time Data Streaming Platform"

Apr 3: Biweekly Product Update

Redefining data productization with Composable Mesh, EDA, streaming platforms, and Shift Left architecture

Load Balancing Azure OpenAI Requests using Azure API Management for Uninterrupted Performance

Building a Real-Time Data Consumption Architecture with Node.js and RabbitMQ

Connecting Node.js to a RabbitMQ Queue: The First Step

Code:

Step two: Implementing the Real-Time Data Streaming Service

Code:

领英推荐

Step Three: Controlling Data Flow with the StreamController

Code:

Step Four: Managing WebSocket Connections for Real-Time Data Streaming

code:

Step Five: Demonstrating the Real-Time Results

Conclusion

Jo?o Victor Fran?a Dias的更多文章

Simplifying Global State in React Applications with Context API: A Practical Guide

Mastering Promise Management in JavaScript: A Practical Overview

The Design Pattern That Will Save Your Project's Life

How to Process Large Volumes of Data Without Overloading Your Application in Go: Efficient and Practical Strategies

社区洞察

其他会员也浏览了

Topics – The Redpanda Newsletter (Issue #022)

StreamNative October Newsletter

Unveiling Kafka: An Introduction to Event Streaming

BYOC Self-Service: Greater Control and Convenience

Ursa engine is now here!

Beyond Integration: Navigating the Future of Data Strategy with Event Streaming and Adaptive Pricing Models

"Apache Kafka: The Powerful Real-Time Data Streaming Platform"

Apr 3: Biweekly Product Update

Redefining data productization with Composable Mesh, EDA, streaming platforms, and Shift Left architecture

Load Balancing Azure OpenAI Requests using Azure API Management for Uninterrupted Performance