KLAT2’s Flat Neighborhood Network
Michael Orr
Computer Networking Architect/Product Manager/Customer Liaison, Due-Diligence, Tech Scout/Ambassador/evangelist, Inventor (40+ Patents), Dad, Martial Arts practitioner (Karate, Aikido), Stock & Option Trader
Original Article here: Citeseerx or https://aggregate.org/FNN/
Who: U. of Kentucky
TL;DR for the busy Genius:
Old (Circa 2000) article about implementing a low-latency network to interconnect a DIY HPC cluster, built from 64 Servers at University of Kentucky. Since they could not afford high-radix switches, but still needed a lowest-latency network they could build, they invented Flat-Neighborhood Networks (FNN), that are still cited by Google’s “Jupiter Rising” article in SIGCOMM 2015. ,(Which is what led me to read/summarize this article to begin with) .
This article is an interesting read, and shows ingenuity in action.
To build a network within their limited budget, the authors had to develop a new NW topology that delivers single-switch latency and up to 25.6G Bisection BW (using 100MBbps NICs!)
Key idea: It is “enough” if any two nodes (any two Servers) have at least one path that goes through a single common switch to allow a 1-Hop path between any two servers.
For example, 6 servers, each with 2 NIC-Ports can be interconnected with 4 5-port switches, including allowing for uplinks as shown
For larger cases, it is hard to come up with an FNN by hand, so they implemented a Genetic Algorithm that searched for solutions. These solutions show wiring maps – which server NIC port to which Switch port. If traffic patterns are known (which Servers talk more to each other) the FNN can be customized to support that, by giving the algorithm suitable parameters. Below are two examples of FNN’s for 64-cluster server, each with 4 NIC ports, and using 32-port switches.
(Kudos!)
KLAT2 FNN Summary
“KLAT2” = Kentucky Linux Athlon Testbed 2, a DIY Circa 2000 Supper computer from 64 PC’s (+ 2 hot-standby spares). To save money, they had to develop a new NW topology that delivers single-switch latency and up to 25.6G Bisection BW (using 100MBbps NICs!)
Goal: Lowest Latency possible. Minimum – 1 Switch hop. So for 64 servers (+ 2 spares), they’d need a 66-port Switch – which they could not afford.
As a next-best solution they went DIY to build the equivalent from smaller, cheaper switches.
Key Idea: It is enough for any two servers to share (at least) one common Switch. (They don’t ALL have to share a switch)
Each switch defines a “local Neighborhood” (Subnet), hence the name “Flat Neighborhood Network” (FNN) . They used 4 NICs/PC, and low-radix (i.e. Cheap) switches
As a small example, consider a cluster made from six 2-NIC servers and four 5-port switches. A suitable FNN looks like the drawing on right:
Any 2 nodes have one switch in common, and the cluster as a whole has 2 outgoing uplinks.
Note some X->Y streams have more than one option e.g. A<->B can send over Switch 0 or switch 1 or both
- Only Small FNN’s can be done by hand. They built a Genetic Algorithm to figure out FNN’s for larger sets of Nodes. It generates the wiring pattern, while optimizing secondary properties for specific types of communication traffic.
“The complexity of the design problem does explode when a larger system is being designed with additional, secondary, optimization criteria (such as maximizing the number of switches shared by PCs that communicate in various patterns). Because this search space is very large and the optimization criteria are very general (often requiring simulation of each potential network design), use of a genetic search algorithm is much more effective than other means”
- FNNs do not (necessarily) good wiring locality, so stringing the cables can be a challenge
- Routing between nodes – FNN’s can have some unusual properties. E.g. NW address query for a given PC will give different answers depending from which PC’s you ask.
- While FNN guarantees BW and latency along SOME paths (the ones via the single common switch-hop), it also has additional paths W/O the guarantee. Taking advantage of these Extra paths is not simple. You need a sort of LAG
Main Inputs to their Genetic Algorithm
- # of nodes, and MAX # of NICs/Node (nodes do not all have to have same # of NICs)
- List of available switches by Radix
- Designer-supplied evaluation function that judges each possible solution
(Algorithm creates Dummy NICs and/or switches to allow uneven use of real NICs/Switches)
As an example of a bigger FNN, the article shows two separate solutions for the 64 server cluster they started with. Each “Kxx” in the drawing is a 4-NIC server. The drawing has a graphic icon for the switch to which each of the 4 NICs is connected. A close inspection will show that indeed any two servers have at least one switch in common (using 4 32-port switches). It looks from the drawing that at most 31 ports from any switch are in use, and for some switches a lot less. Presumably the rest are used as uplinks, perhaps through a 5th “uplink switch” as in the small example above.
Spending some time trying to come up by hand with an FNN for 64 servers will easily convince you the authors were right to automate this process into a programmed search.
The article notes that the 2nd solution was done to optimize the connectivity patterns to a known work-load, whose communication pattern is known (a Fluid-Dynamics calculation)
Note 1: I’d assume the converse is also possible: by the same token, since you know the wiring patterns actually used, you can structure your workload on the computing cluster to take advantage of the topology implemented.
Note 2: It seems this will be a bitch to wire-up, and while re-wiring for a given application can be beneficial in terms of performance, it will be a lot of work to change, and re-change. A possible solution is to use some form of programmatic patch-panel (tough that will add cost, which is what the original authors were trying to avoid), assuming the Patch-panel does not add too much latency itself