登录查看更多内容

KLAT2’s Flat Neighborhood Network

Michael Orr

Computer Networking Architect/Product Manager/Customer Liaison, Due-Diligence, Tech Scout/Ambassador/evangelist, Inventor (40+ Patents), Dad, Martial Arts practitioner (Karate, Aikido), Stock & Option Trader

发布日期: 2018年11月12日

+ 关注

Original Article here: Citeseerx or https://aggregate.org/FNN/

Who: U. of Kentucky

TL;DR for the busy Genius:

Old (Circa 2000) article about implementing a low-latency network to interconnect a DIY HPC cluster, built from 64 Servers at University of Kentucky. Since they could not afford high-radix switches, but still needed a lowest-latency network they could build, they invented Flat-Neighborhood Networks (FNN), that are still cited by Google’s “Jupiter Rising” article in SIGCOMM 2015. ,(Which is what led me to read/summarize this article to begin with) .

This article is an interesting read, and shows ingenuity in action.

To build a network within their limited budget, the authors had to develop a new NW topology that delivers single-switch latency and up to 25.6G Bisection BW (using 100MBbps NICs!)

Key idea: It is “enough” if any two nodes (any two Servers) have at least one path that goes through a single common switch to allow a 1-Hop path between any two servers.

For example, 6 servers, each with 2 NIC-Ports can be interconnected with 4 5-port switches, including allowing for uplinks as shown

For larger cases, it is hard to come up with an FNN by hand, so they implemented a Genetic Algorithm that searched for solutions. These solutions show wiring maps – which server NIC port to which Switch port. If traffic patterns are known (which Servers talk more to each other) the FNN can be customized to support that, by giving the algorithm suitable parameters. Below are two examples of FNN’s for 64-cluster server, each with 4 NIC ports, and using 32-port switches.

(Kudos!)

KLAT2 FNN Summary

“KLAT2” = Kentucky Linux Athlon Testbed 2, a DIY Circa 2000 Supper computer from 64 PC’s (+ 2 hot-standby spares). To save money, they had to develop a new NW topology that delivers single-switch latency and up to 25.6G Bisection BW (using 100MBbps NICs!)

Goal: Lowest Latency possible. Minimum – 1 Switch hop. So for 64 servers (+ 2 spares), they’d need a 66-port Switch – which they could not afford.

As a next-best solution they went DIY to build the equivalent from smaller, cheaper switches.

Key Idea: It is enough for any two servers to share (at least) one common Switch. (They don’t ALL have to share a switch)

Each switch defines a “local Neighborhood” (Subnet), hence the name “Flat Neighborhood Network” (FNN) . They used 4 NICs/PC, and low-radix (i.e. Cheap) switches

As a small example, consider a cluster made from six 2-NIC servers and four 5-port switches. A suitable FNN looks like the drawing on right:

Any 2 nodes have one switch in common, and the cluster as a whole has 2 outgoing uplinks.

Note some X->Y streams have more than one option e.g. A<->B can send over Switch 0 or switch 1 or both

Only Small FNN’s can be done by hand. They built a Genetic Algorithm to figure out FNN’s for larger sets of Nodes. It generates the wiring pattern, while optimizing secondary properties for specific types of communication traffic.

“The complexity of the design problem does explode when a larger system is being designed with additional, secondary, optimization criteria (such as maximizing the number of switches shared by PCs that communicate in various patterns). Because this search space is very large and the optimization criteria are very general (often requiring simulation of each potential network design), use of a genetic search algorithm is much more effective than other means”

FNNs do not (necessarily) good wiring locality, so stringing the cables can be a challenge
Routing between nodes – FNN’s can have some unusual properties. E.g. NW address query for a given PC will give different answers depending from which PC’s you ask.
While FNN guarantees BW and latency along SOME paths (the ones via the single common switch-hop), it also has additional paths W/O the guarantee. Taking advantage of these Extra paths is not simple. You need a sort of LAG

Main Inputs to their Genetic Algorithm

# of nodes, and MAX # of NICs/Node (nodes do not all have to have same # of NICs)
List of available switches by Radix
Designer-supplied evaluation function that judges each possible solution

(Algorithm creates Dummy NICs and/or switches to allow uneven use of real NICs/Switches)

As an example of a bigger FNN, the article shows two separate solutions for the 64 server cluster they started with. Each “Kxx” in the drawing is a 4-NIC server. The drawing has a graphic icon for the switch to which each of the 4 NICs is connected. A close inspection will show that indeed any two servers have at least one switch in common (using 4 32-port switches). It looks from the drawing that at most 31 ports from any switch are in use, and for some switches a lot less. Presumably the rest are used as uplinks, perhaps through a 5th “uplink switch” as in the small example above.

Spending some time trying to come up by hand with an FNN for 64 servers will easily convince you the authors were right to automate this process into a programmed search.

The article notes that the 2nd solution was done to optimize the connectivity patterns to a known work-load, whose communication pattern is known (a Fluid-Dynamics calculation)

Note 1: I’d assume the converse is also possible: by the same token, since you know the wiring patterns actually used, you can structure your workload on the computing cluster to take advantage of the topology implemented.

Note 2: It seems this will be a bitch to wire-up, and while re-wiring for a given application can be beneficial in terms of performance, it will be a lot of work to change, and re-change. A possible solution is to use some form of programmatic patch-panel (tough that will add cost, which is what the original authors were trying to avoid), assuming the Patch-panel does not add too much latency itself

要查看或添加评论，请登录

Michael Orr的更多文章

Can you Apply Kelly's Formula to Investing ?

2023年1月24日

Can you Apply Kelly's Formula to Investing ?

Kelly's formula (sometimes called "fortune's formula") is widely touted as applicable to Investing, and in particular…

2 条评论
IIGR : HPCC: High Precision Congestion Control

2020年8月11日

IIGR : HPCC: High Precision Congestion Control

IIGR[1]: HPCC: High Precision Congestion Control Who : Alibaba, Harvard University, U. of Cambridge, MIT Where: ACM…

1 条评论
IIGR: “HUYGENS” - Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization

2018年12月27日

IIGR: “HUYGENS” - Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization

Who : Stanford: (Yilong Geng, Shiyu Liu, Zi Yin, Balaji Prabhakar, Mendel Rosenblum), Google: (Ashish Naik, Amin…

3 条评论
IIGR: Jupiter Rising: A Decade of CLOS Topologies and Centralized Control in Google’s Datacenter Network

2018年11月25日

IIGR: Jupiter Rising: A Decade of CLOS Topologies and Centralized Control in Google’s Datacenter Network

IIGR: Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network Who : Google ,…

1 条评论
IIGR: Let it Flow: Resilient Asymmetric Load Balancing with Flowlet Switching

2018年11月12日

IIGR: Let it Flow: Resilient Asymmetric Load Balancing with Flowlet Switching

Who & Where: MIT, Cisco (Note overlap with authors of CONGA) , NSDI 2017 Original article here : https://people.csail.
Sonata: Query-Driven Streaming Network Telemetry

2018年11月12日

Sonata: Query-Driven Streaming Network Telemetry

Who: Princeton University, KAUST, NIKSUN Where: Apparently prepared for HOTNET16 (https://www.cs.
IIGR: Packet-Level Telemetry in Large DC Networks (“Everflow”)

2018年10月26日

IIGR: Packet-Level Telemetry in Large DC Networks (“Everflow”)

Who: Microsoft, UCSB, Princeton Where: SIGCOMM 2015 Original article here: MS Research/Everflow TL;DR for the…

1 条评论
IIGR: Duet: Cloud Scale Load Balancing with Hardware and Software

2018年10月26日

IIGR: Duet: Cloud Scale Load Balancing with Hardware and Software

Who : Microsoft, Purdue, Yale Where: SIGCOMM 2014 Original article: ACM Library/Duet (slides : SIGCOMM 14/Duet Slides)…

3 条评论
IIGR: "CONGA" - Distributed Congestion-Aware Load Balancing for Datacenters

2018年9月23日

IIGR: "CONGA" - Distributed Congestion-Aware Load Balancing for Datacenters

Who & Where : Microsoft & Cisco, SIGCOMM 2014 Original articles here : Semantic Scholar/Conga TL;DR for the…
IIGR: "Clove" - Congestion-Aware Load Balancing at the Virtual Edge

2018年9月23日

IIGR: "Clove" - Congestion-Aware Load Balancing at the Virtual Edge

Who: Saleforce, Princeton, VMware & Barefoot. Original Article here: ACM Library/Clove TL;DR For the Busy Genius 1.

See all articles

KLAT2’s Flat Neighborhood Network

Michael Orr

Computer Networking Architect/Product Manager/Customer Liaison, Due-Diligence, Tech Scout/Ambassador/evangelist, Inventor (40+ Patents), Dad, Martial Arts practitioner (Karate, Aikido), Stock & Option Trader

TL;DR for the busy Genius:

Key idea: It is “enough” if any two nodes (any two Servers) have at least one path that goes through a single common switch to allow a 1-Hop path between any two servers.

KLAT2 FNN Summary

Michael Orr的更多文章

社区洞察

其他会员也浏览了

New Dell Edge Servers, KIOXIA Goes to Space, New TrueNAS Mini R

Top ten metrics you should be monitoring for your server infrastructure

Accelerated your secure server with TLS1.3 Server 10Gbps IP Core (TLS10GS-IP)

Netforchoice Dedicated Server with Intel Xeon Gold Processor: Unleashing Unmatched Performance and Reliability ????

Chroma 61509 Upgraded to Test Power Input of Redundant Servers

How to design for Fabric server access?（1）

Operating AS112

Route Server.

AI Server vs. Traditional Server: Navigating the Future of Computing

Detailed Anatomy of the Tor Network | Structure of the Tor Network

TL;DR for the busy Genius:

Key idea: It is “enough” if any two nodes (any two Servers) have at least one path that goes through a single common switch to allow a 1-Hop path between any two servers.

KLAT2 FNN Summary

Michael Orr的更多文章

Can you Apply Kelly's Formula to Investing ?

IIGR : HPCC: High Precision Congestion Control

IIGR: “HUYGENS” - Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization

IIGR: Jupiter Rising: A Decade of CLOS Topologies and Centralized Control in Google’s Datacenter Network

IIGR: Let it Flow: Resilient Asymmetric Load Balancing with Flowlet Switching

Sonata: Query-Driven Streaming Network Telemetry

IIGR: Packet-Level Telemetry in Large DC Networks (“Everflow”)

IIGR: Duet: Cloud Scale Load Balancing with Hardware and Software

IIGR: "CONGA" - Distributed Congestion-Aware Load Balancing for Datacenters

IIGR: "Clove" - Congestion-Aware Load Balancing at the Virtual Edge

社区洞察

其他会员也浏览了

New Dell Edge Servers, KIOXIA Goes to Space, New TrueNAS Mini R

Top ten metrics you should be monitoring for your server infrastructure

Accelerated your secure server with TLS1.3 Server 10Gbps IP Core (TLS10GS-IP)

Netforchoice Dedicated Server with Intel Xeon Gold Processor: Unleashing Unmatched Performance and Reliability ????

Chroma 61509 Upgraded to Test Power Input of Redundant Servers

How to design for Fabric server access?（1）

Operating AS112

Route Server.

AI Server vs. Traditional Server: Navigating the Future of Computing

Detailed Anatomy of the Tor Network | Structure of the Tor Network