Too Much Redundancy

Too Much Redundancy

[A light topic for a Friday posting ...]

Networks evolve over time.?

We’re all familiar with technical debt. One form of that is that old WAN links don’t go away overnight. Although if they cost a lot, removal should be expedited. But sometimes it’s hard to do that. With datacenters, the problem is often old equipment that just can’t be replaced or removed for (reasons). So you end up with an old server or two as the only thing holding up phasing out the ancient datacenter. In some cases, for years.?

This is a common problem when migrating to a new datacenter. Old facilities or costly mid-city real-estate are two reasons for datacenter migrations. Improving reliability by moving to a CoLo with better power, cooling, security, etc. is another reason. And getting your racks out of a fugly large closet with ducts and pipes etc. is a darn good reason.?

In the meantime, however, you can end up with unnecessary WAN links. Site connections tend to be homed to the datacenter. With two datacenters, network folks often home sites to both. With migration to a new datacenter interruptus (change of plans), you end with even more connections, often of three vintages (to the oldest, to the new datacenter, and to the second new datacenter).?

Here’s an adapted real-world diagram involving datacenters.

No alt text provided for this image

Site A is the original site. When Site B was added, they connected it as shown in black. The top L3 switches may have been added later as WAN routers.?

When site C was added, the black connections were added as shown.?

For (reasons), Site D was more recently added, and eventual phase-out of Sites A and C planned, subject to change. I put the names of B and D in red to emphasize that they are the “new core”.?

There’s actually more stuff going on, like gradually phasing in new core switches at B and D, but the above seems adequate for the points I wish to make.

Challenge:?Predict the routing under various failure scenarios.?

If you look closely, there are some links that might reasonably be phased out. Which would you remove??

Perhaps some red coloring will help.

No alt text provided for this image

See them now??

Why those? Well, my thought process is that sites A and C should be dual-connected to B and D, the new cores.?

Also, eliminating the red links provides better predictability of traffic flows, and troubleshooting. Yes, that is probably not clear from the above diagram.?

Re-drawing the diagram makes that clearer:

No alt text provided for this image


There’s still a lot of redundancy, but the structure is clearer. Sites 1 and 3 have dual links to each of the other two. And the hub two sites also have direct links to each other.?

I’ve seen something like this at another organization. Two sites in California, one in the mid-West, and one on the East Coast. The “main” two were subject to discussion. It turned out there was one clear main site in California and the other nearby site might be folded into it. The other “main” site was somewhat of a toss-up, but it turned out vacating the mid-West site was on the radar, for substantial cost savings.?

The remaining question now is whether to treat the East Coast site as a backup datacenter, or to shift to a cloud-centric approach. Geography / latency was a consideration.?

In yet another organization, the core is 6 routers, 2 at each of three sites, one of which is a CoLo. A fair number of other sites dual-connect to the core via diverse providers. That’s workable. Moving servers/apps etc. is a major consideration that likely will prevent reducing the core to 4 devices at 2 sites.?

The Design Principle

That section title is perhaps a bit overly grandiose, but …?

One common strategy is to pick two “hub” sites, and connect other sites to those two. Eliminate other connections unless you have major traffic flows or other reasons that justify the costs.?

If your network is geographically wide-spread, then perhaps do that per-continent, i.e. two hubs in the U.S., etc. Or more, depending on number of devices in various regions.?

This aligns with CoLo-centric or Cloud-centric networking, as I’ve discussed elsewhere. You can use VPN to connect sites to say two cloud hubs (or more for larger geographic presence). Alternatively, connect at least major sites to nearby CoLo’s for agile NaaS, etc. and cloud connectivity.?

Recycling Design Ideas

You may be thinking “WAN, that’s so old-fashioned”. Well, yes. However, I’ll note that the above design issues recurred in the context of dual CoLo facilities. And that remains an active design approach if you’re using dual (or more) CoLo’s as WAN and/or SD-WAN hubs, in part since CoLo to Cloud NaaS provides agility that may not be available for the last mile.?

The even newer variant is using cloud provider locations as hub sites.?

Think Hierarchically!

If you look at this a bit differently, it’s just hierarchical networking.?

Have a core, connect “everything” dually to the core.?

If your network is global, have a global core, with dual core members in each continent, say. And dual-connect (if possible) sites within a region (continent or whatever) to the continent’s core. That loosely describes a couple of WAN or SD-WAN topologies I’ve seen. Internationally, putting core switches into CoLo’s helps with availability and cost of the long-haul fiber connections.?

Conclusions

I’m not convinced the diagrams above told the story perfectly, but that’s the real world.?

There are two main conclusions:

  • Impose hierarchy on network designs, avoiding too much redundancy.
  • If you can’t look at the diagram and describe the routing in a simple way, re-design it! Random WAN meshes mostly went away 20-30 years ago! And with that, traceroute still can be useful, but you also can predict what you think the path should be!?

Comments

Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!?

Hashtags:?#NetCraftsmen #CiscoChampion?

Disclosure statement

Twitter:?@pjwelcher

LinkedIn:?Peter Welcher

No alt text provided for this image
No alt text provided for this image


Jo?l Fran?ois

Senior IP Engineer | CCIEx2 (RS,SP) #55635 | CCDE in preparation

3 年

Eventually also consider transport type in the design (Eg: dark fiber / shared risk link group with xWDM, etc…)

回复
Palash Barua

SDN/IP/MPLS/Cloud-Native/Solution-Architect/Automation/Linux/DB (CCIE Enterprise # 60345)

3 年

Nice one sir ??

回复

要查看或添加评论,请登录

Peter Welcher的更多文章

  • Introduction to Microsegmentation

    Introduction to Microsegmentation

    This blog begins an introductory series of moderately long blogs, covering key aspects of Microsegmentation and Zero…

    3 条评论
  • Pete’s Take: Catchpoint at Cloud Field Day 22

    Pete’s Take: Catchpoint at Cloud Field Day 22

    Tech Field Day always produces such great technical content! However, it can be a challenge keeping up with it due to…

  • AI Ate My Blog on RoCEv2

    AI Ate My Blog on RoCEv2

    I acknowledge I’ve been a blog technology summarizer for quite a while. It served to help me broaden/solidify my skills…

  • AI Datacenter Switch Math

    AI Datacenter Switch Math

    Author: Pete Welcher, Coauthor: Brad Gregory This is blog #3 in a small series about Networking for AI Datacenters…

  • AI Requirements for Datacenter Networking

    AI Requirements for Datacenter Networking

    Author: Pete Welcher. Coauthor: Brad Gregory.

  • Quick Takes #2, February 2025

    Quick Takes #2, February 2025

    I’m working on some longer blogs that I hope to be able post in the next week or two. In the meantime, lots of exciting…

  • Quick Takes: February 2025

    Quick Takes: February 2025

    I’ve got some longer technical blogs in the works. For this week, it’s time again for some of my “Quick Takes”:…

  • Pete’s Take: Pain Points in Networking and IT

    Pete’s Take: Pain Points in Networking and IT

    It’s a new year, so time to look at how Networking and IT have been evolving. Ignoring the AI elephant in the room.

    1 条评论
  • Pete’s Take: Pondering NetOps/AIOps Strategy

    Pete’s Take: Pondering NetOps/AIOps Strategy

    What’s new in NetOps, including AIOps, and where are things heading? Some thoughts ..

    1 条评论
  • Pete's Take: AI/ML and Error

    Pete's Take: AI/ML and Error

    Artificial Intelligence (AI) has certainly received a lot of press lately. And achieved new levels of hype.

社区洞察

其他会员也浏览了