The Sweet Spots of Disaggregation 
in Networking

The Sweet Spots of Disaggregation in Networking

Networking, whether it is container, data center (inside), or wide area, plays a vital role in distributed cloud computing. Often much-touted disaggregation efforts have not caught on due to the cost and complexity of integration. The following is a high-level recap of how the industry is stealthily finding “sweet spots” of disaggregation in networking to unlock value, including some key tech trends to watch.

No alt text provided for this image
Fig. 1: Vertical cross-section of networking at play

?1.??????Container Networking

The success of modern-day applications, with their scalability, agility, and resilience, has been made possible by the underlying microservices (routable deployment units) that run on containers (isolated running instances), disaggregated from the underlying physical infrastructure. The meteoric success of Kubernetes container orchestration was largely due to the multi-vendor-friendly open and disaggregated approach, where developers could build code easily; though key tenets like auto-scaling, self-healing, and extendibility preceded Kubernetes. The webscalers were quick to embrace open-source Kubernetes to deliver on a variety of hardened offerings with auto-scaling (e.g., GKE, ECS, EKS, and AKS) and further enhanced their offerings with serverless and language flexibility (e.g., CloudRun, Fargate).

Service mesh (SM) further complemented container networking by leveraging open-source developments such as Istio to deliver simplified observability, traffic management, and policy-driven security. As more demanding, high-performance workloads emerged, network service mesh (NSM) was added to deliver assured Layer 2/3 POD networking.

Disaggregated functions implemented as software constructs, coupled with abstractions to orchestrate for optimal placement on the right compute hardware at scale, have helped extract enormous value and quicker time to market (TTM).

2.??????Inside Data Center Networking

Over 70% of data traffic today is intra-data center (aka east-west traffic). This is due to the explosive growth in parallel batch processing, driven by AI/ML and data center (DC) data replication. To support this east-west traffic, cloud operators have built extensive spine-leaf architectures inside DCs that are now beginning to present scaling, resource stranding, and power consumption challenges.

  • Infra workload off-load: Back in the late 1980s, CPUs were ~20 times faster than networking. Fast-forward to today: now networks are ~30 times faster than CPUs. CPUs are not able to keep up with networking speeds. The industry had to find a way to free up the revenue-generating CPUs from the transport and infra tasks. Enter DPUs: data processing units specialized for high-speed infra processing. DPUs are now delivering on composable hyper disaggregation, enabling compute and storage to grow separately to unlock immense value (e.g., Nvidia Bluefield, Fungible F1, AWS Nitro, Marvell, Intel IPU).
  • Photonic Processing and Interconnect: Most recent deep learnings have been enabled by GPUs and AI accelerators, fueled by tensor processing units (TPUs). The biggest challenge in neural network pre-trained matrix multiplication (aka inference) is excessive power consumption. In fact, 80% of this power budget is spent on electric connections/data transfers within the ASIC. This is where silicon photonics photonic integrated circuit-enabled neural networking begins to gain traction given its high speeds and ultra-low (1/10) power consumption (e.g., Lightmatter).
  • Hardware Disaggregation: The DC industry (inside DC) was the first to disaggregate the bare metal switch fabrics to ride the merchant silicon cost curve and use pluggable optics. As switch ASICs evolve from 12.8T to 25.6T, 51T, and 102T, the pluggable optics are trying to keep pace by using 200G/400G/800G transport rates. For inter-campus applications, coherent pluggables at 1.6T and 3.2T are also being explored. In parallel, the industry continues to invest in Linear Direct Drive & co-packaged optics (CPOs), where optical transceivers are integrated next to the switch ASICs to reduce power consumption. Another noteworthy recent development inside DC networking is the coming of age of software-defined MEMS-based all-optical circuit switches (OCS) that substitute the electrical spine switch layer, enabling seamless multi-speed connectivity and reduced power consumption.


3.??????Wide Area Networking

Telecom (the foundational fabric of the modern digital economy) has been relatively slow in unlocking the sweet spots, hindered at times by internal organization silos. Whether consumer, enterprise or even telco network function workloads, chances are that they are all hosted over distributed hybrid multi-cloud environments. The WAN underlays need to adapt to support these evolving overlays.

  • Protection and Restoration – Moving Up: Today, nearly all WAN traffic is IP packets encapsulated in Ethernet frames (vs. myriad protocols just a few years back). Layer 3 protection and restoration have significantly matured in recent years in terms of scale, granularity, and reliability. This is enabling operators to simplify and improve WAN efficiencies by getting the router control plane to take responsibility for protection/restoration while delegating the optical layer to flexible capacity scaling (vs. having to deal with complex multi-layer control plane coordination). That said, the much-touted IP-optical convergence is not just about putting plugs inside routers and running all traffic hop by hop. For flexibility, TCO savings, and "Blast Radius" management, many networks will continue to require ROADMs, some level of optical protection, careful traffic planning, and platform isolation. It will be a journey where transponders in compact modular chassis (aka stackable disaggregated transponder units) will continue to support the latest high-performance coherent engines, while the miniaturized, power-optimized plugs will trail by a couple of years. Subsea and long-haul will be dominated by transponders. Metro and DCI will see increased adoption of disaggregation with the arrival of OpenZR+ standard-based 400G ZR+ plugs. The most coherent plug action is anticipated in the access and aggregation segment, with a variety of coherent plug options such as 100G/200G/400G, XR optics, and PON.
  • Reaching the Shannon Limit: The coherent optical industry is currently at Gen90 (also known as the fifth generation), with baud rates of ~90-107 Gbaud. The next couple of generations of coherent developments (140-200+ Gbaud) are expected to get increasingly difficult as each iteration gets extremely close to Shannon’s theoretical limits. Therefore, it is likely that only a very few vendors with in-house DSP capabilities and optical front-end vertical integration combined with holistic co-design capabilities are positioned to lead in the coherent race. This may further catalyze the disaggregation of host platforms from coherent engines.
  • Open Optical Line Systems: The industry is still reeling from supply chain shocks. Operators are actively diversifying their supply chains by inducting multiple transponder/plug vendors to run on their optical line systems. Operators are also trying to ensure that their vendors have sufficient upstream component supply and fabrication foundry diversity. This will also enable operators to unlock themselves from closed systems to make the most of coherent innovations for improved TCO. As per the 2022 Heavy Reading survey, by the end of 2023, nearly half of the operators surveyed intend to deploy open-line systems. 47% of operators surveyed strongly favor multi-vendor transponder/plug deployments across a single-vendor-supplied standalone open line system (OLS). Operators and vendors recognize that there is still more work to be done in the areas of multi-vendor wavelength management and troubleshooting.

Network disaggregation should not be approached as a solution in search of a problem. As can be seen, the three areas of networking (container, DC, and WAN) are adopting their respective sweet spots of disaggregation at their own pace, as long as they unlock value.

As content consumption increases in the current application-driven networking paradigm, a holistic approach to these three areas of networking will help achieve better system balance and improved efficiencies to unlock further value.


Ryan Perera

Opinions expressed in this article are the author's own and do not necessarily reflect the views of the employer.

要查看或添加评论,请登录

W L Ryan Perera的更多文章

  • Containing the Networking 'Blast Radius'

    Containing the Networking 'Blast Radius'

    As the world relies more and more on cloud computing for its day-to-day functioning and real-time communications, the…

    6 条评论
  • Improving Connect Monetization by Telcos

    Improving Connect Monetization by Telcos

    Over several decades, Telcos have made significant capital investments in network infrastructure (3G, 4G, and now 5G)…

    3 条评论

社区洞察

其他会员也浏览了