The potential of SONiC, an open source network operating system and unleashing the power of Dual ToR (Active/Active)
SONiC

The potential of SONiC, an open source network operating system and unleashing the power of Dual ToR (Active/Active)

In the evolving landscape of networking technology, traditional routing and switching devices were once confined to tightly integrated hardware and software systems. Vendors offered closed-source proprietary stacks, restricting network operators from implementing tailored features and stalling innovation. This conventional model proved costly, time-consuming, and inherently unscalable, necessitating vendor intervention for any device modifications.

Recognizing the limitations of this approach, the industry pivoted towards a more flexible paradigm. Enter the era of white-box switches and versatile Network Operating Systems (NOSs) designed to accommodate multiple vendors and diverse Application Specific Integrated Circuits (ASICs). This transformative model, known as "disaggregated networking," represents a fundamental shift. Here, software and hardware are decoupled, allowing switching, such as Broadcom, to seamlessly integrate with various NOS platforms, including the innovative SONiC (Software for Open Networking in the Cloud).

This shift not only heralds a new era of customization and innovation in the networking sphere but also empowers network operators to adapt swiftly to changing demands without relying on vendor constraints. In this article, we explore the concept of disaggregated networking with SONiC architecture, its benefits, and industry implications. Highlighting the significance of Active-Active dual Top of Rack (ToR) setup, we discuss how they promote load balancing and continuous operation. This piece simplifies the layers of innovation, customization, and resilience shaping the future of networking.

SONiC (The Software for Open Networking in the Cloud):

Disaggregating the hardware from software in whitebox switches has driven the constant development and maintenance of open-source Network Operating Systems (NOS). SONiC is a Linux-based open-source NOS initially developed by Microsoft and the Open Compute Project (OCP), and currently hosted by the Linux Foundation. SONiC offers a comprehensive set of network functionalities, including BGP and Remote Direct Memory Access (RDMA), and is compatible with switches from various vendors and ASICs.

SONiC comprises several modules, housed either in Docker containers or the Linux-host system itself. Docker containers are lightweight, self-sufficient packages containing the necessary elements for running an application, including code, runtime, system tools, libraries, and settings. In SONiC's high-level architecture, it operates within the user space. Each module within SONiC has a specific role, such as managing DHCP requests, handling Link Layer Discovery Protocol (LLDP) functions, providing a Command Line Interface (CLI) and system configuration options, and running FRR or Quagga routing stacks.

SONiC System Architecture Source:

Main Components:

  • SONiC comprises various network modules with distinct functions, such as BGP, LLDP, SNMP, and more.
  • The redis-database engine (database container), serving as a critical repository for SONiC applications. Within this container, there are several key databases: APPL_DB: stores application-generated data such as routes and neighbors, acting as the primary entry point for applications to interact with other SONiC subsystems. CONFIG_DB stores configuration data, including port settings and VLAN configurations. STATE_DB manages operational data resolving dependencies between different subsystems, ensuring smooth interactions even when configurations reference undetermined system entities. ASIC_DB contains state information essential for configuring and operating the ASIC, formatted for simplified interaction with asic SDKs. COUNTERS_DB tracks port-specific counters and statistics, useful for both local CLI requests and remote telemetry purposes.
  • The Switch State Service (SwSS) plays a vital role in facilitating communication across various SONiC modules. It acts as a coordinator between network applications, the database, and the system kernel. One of its key components is the Orchestration Agent (Orchagent), which stands out as a critical element within the SwSS subsystem. The Orchestration Agent is responsible for interpreting control configurations and configuring the dataplane correspondingly.
  • The syncd container facilitates synchronization between the switch's network state and its hardware/ASIC. It consists of key components: Syncd, executing synchronization logic by linking with the vendor's ASIC SDK library; SAI API, offering a vendor-independent interface for controlling forwarding elements; and the ASIC SDK, provided by hardware vendors to drive their ASICs. Syncd subscribes to ASIC_DB for state updates and acts as a publisher, ensuring consistent communication between network state and hardware, enhancing network efficiency and reliability.

Switch Abstraction Interface (SAI):

The Switch Abstraction Interface (SAI) serves as the backbone of the SONiC dataplane configuration. This open-source toolkit provides a vital bridge between SONiC's generic functionalities and the intricate hardware-specific details of various networking platforms. Acting as an isolation layer, SAI ensures seamless communication between SONiC and diverse switch components, enabling SONiC to function effectively across a wide array of hardware.

Switch Abstraction Interface (SAI)

SAI's significance lies in its establishment of a standardized interface dedicated to controlling switching Application-Specific Integrated Circuits (ASICs). This meticulously crafted interface offers a vendor-independent approach, allowing the management of diverse switching entities, including hardware ASICs, Network Processing Units (NPUs), and software switches, in a uniform manner. By providing a consistent method of interaction, SAI simplifies the complexity associated with managing different types of switches.

SAI in System Architecture

Furthermore, SAI's architecture goes beyond standardization. It empowers network engineers by enabling the exposure of vendor-specific functionalities and extensions. This flexibility permits the customization of networking solutions according to specific requirements. As a result, SAI not only ensures interoperability and operational efficiency but also promotes innovation by allowing tailored integrations and optimizations.

SAI acts as a pivotal component, enabling SONiC to seamlessly integrate with diverse switch platforms. Its standardized approach, vendor independence, and customizable nature not only enhance SONiC's functionality but also foster innovation within the networking ecosystem.

High Availability with Active-Active Dual ToR:

ToR (Top of Rack) switch is a single point of failure for all the rack of servers. SONiC OS introduces dual Top of Rack (ToR) support to boost network reliability by minimizing the risk of a single point of failure. In the event of a failure in one ToR switch or its associated link, the second ToR switch acts as a backup, maintaining uninterrupted connectivity.

.Active-Active dual ToR support with a single NIC ensures load balancing, optimized bandwidth utilization, and continuous operation even if one ToR switch fails. Each server will have a NIC connected to 2 ToRs with 2 100Gbps DAC cables.

Dual-ToR Cluster Topology

  1. Both the upper ToR (UT0) and lower ToR (LT0) will advertise same IP to upstream T1s, each T1 will see 2 available next hops for the VLAN.
  2. Under typical conditions, both UT0 and LT0 are designed to carry traffic.
  3. The server host's software stack is equipped with a high-performance 200 Gbps Network Interface Card (NIC), enabling efficient data processing and transmission.

Active-Active IO Model

Server Functionalities:

The server NIC plays a critical role in managing traffic flow ensuring both high availability and efficient communication between ToR devices and applications running on the server host.

  1. Southbound Traffic (ToRs to Server) Handling: The server NIC is responsible for delivering southbound traffic from Tier 0 devices to the server. In this scenario, the ToRs present identical IP addresses and MAC addresses to the server on both links.
  2. Northbound Traffic (Server to ToRs) Handling: For northbound traffic originating from applications on the server and destined for tier 0 devices, the server NIC intelligently distributes the traffic between the two active links at the IO stream level. Each stream, defined by its 5-tuple characteristics, is dispatched to one of the two uplinks. This load balancing strategy optimizes bandwidth utilization and enhances network performance. Moreover, the server NIC continuously monitors link states, ensuring traffic redirection in the event of link failures. When a link state changes, the server NIC seamlessly switches the traffic flow to maintain uninterrupted connectivity.
  3. gRPC Control and Traffic Replication: To control traffic forwarding, gRPC (a high-performance, open-source RPC framework) is used. Each ToR is assigned a well-known IP address. The server NIC dispatches gRPC replies to these specific IP addresses, directing control messages to the corresponding uplinks. This mechanism ensures precise communication and control between the server and the ToRs, enabling dynamic traffic management based on network conditions.
  4. Traffic Replication to Both ToRs: In addition to managing regular data traffic, the server NIC replicates specific types of northbound traffic to both ToRs. This includes ICMP replies used for probing link health status, ARP propagation for address resolution, and IPv6 router solicitation, neighbor solicitation, and neighbor advertisements. By replicating these essential control messages, the server NIC enhances network reliability, enabling accurate monitoring of link health and efficient resolution of network address-related queries.

In summary, the server NIC acts as a pivotal component in this network architecture, ensuring robust southbound and northbound traffic handling, dynamic control through gRPC communication, and intelligent replication of critical control messages to both ToRs.

ToR Functionalities:

  1. Active-Active Mode: SONiC on ToRs introduces active-active mode into MUX state machine.
  2. Link Health Monitoring: SONiC continuously checks link health using ICMP to prevent network issues.
  3. NIC Signaling: SONiC informs NIC about ToR state changes (active -> standby), ensuring seamless communication.
  4. Peer ToR Failover: SONiC swiftly switches traffic to a ToR if the peer ToR fails.
  5. Unblocking Traffic: SONiC quickly restores traffic flow if the cable control channel is unreachable.

The evolving landscape of networking technology has seen a transformative shift from traditional closed-source systems to the realm of disaggregated networking. This paradigm, marked by the decoupling of software and hardware, has opened the doors to unparalleled customization and innovation within the industry. At the heart of this transformation stands SONiC (Software for Open Networking in the Cloud), a Linux-based open-source Network Operating System developed by Microsoft and the Open Compute Project (OCP). In dual ToR design, the server NIC takes center stage, managing traffic flow, enabling dynamic control through gRPC communication, and intelligently replicating essential control messages to both ToRs. This approach not only fosters innovation and adaptability but also empowers network operators to meet changing demands seamlessly, breaking free from vendor constraints and shaping the future of networking with unprecedented flexibility, scalability, and reliability.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了