Pete’s Take: Pain Points in Networking and IT

Pete’s Take: Pain Points in Networking and IT

It’s a new year, so time to look at how Networking and IT have been evolving. Ignoring the AI elephant in the room.

Some of my recent blogs have touched on other aspects of this. If I repeat myself some, it probably confirms I’m getting old.

One big change over the last year or so seems to be that experienced people are getting harder to find, and cost more. Some networks have cleaned up their act and can operate with fewer staff. Others are struggling.

This blog will discuss some of the interplay between budget, staff levels, skills, tools, and outsourcing. My hope is that discussing this may help folks take a step back from a hectic and frustrating day-to-day experience and consider what changes might help improve things.

If your team is adequately funded, staffed, has solid skills, a modern network which was solidly designed, and great tools that are well-deployed and staff has solid skills in using, then this blog is not for you!

Otherwise, the short version of this blog is: if your staff and network are struggling, perhaps it is time to consider making some bigger changes.

“If you don’t like the game, change it!”

Image suggesting networking, dark clouds, chaos, and a person.
DeepAI-generated image

Some Symptoms

My impression from years of consulting is that networking and IT teams tend to run very lean, both on head-count and sometimes on skills. This is likely due to management not wanting to spend more than they have to. There’s a fine line betwen that and spending less than is needed, one which erodes over time.

It can also be due to lack of good upwards communication. No offense, but networking teams often seem to be a weak on social/communication skills, especially with management. Or for some reason, they just get short-changed in upwards communication with management (and in getting funding). Possibly because management doesn’t realize how poorly designed, implemented, fragile (etc.) the network is.

Communicating upwards is something I perhaps could have done better on myself, both with my employers but also in moving further up the management chain when consulting.

Ultimately, all that’s visible to management is that the network works, and when it doesn’t work, or doesn’t work well.

But if it were a car, would it would be emitting clouds of smoke, noisy, and the engine sputtering once in a while?

Some related symptoms: gradual erosion of skills, burnout, staff leaving, poor quality of network buildout and maintenance, pretty much total lack of documentation, tools unused due to lack of skills or lack of budget to replace them with better tools, stand up those tools, and train staff to use them. Or changes/upgrades only come when things stay broken for a while, staff are fired, and a staffing etc. refresh actually gets budgeted.

And yes, some of the consulting I’ve done has been when someone senior in networking, or a new manager, brought me/us in to help them determine the problem(s), prioritize, and report that to management.


Budget Constraints

Budget constraints happen.

Even if network and other IT teams have done a good job of communicating actual present, likely future, and other issue and considerations upwards, sometimes the budget just isn’t there.

Clearly, if this situation persists and gets painful enough for staff, those with skills and sufficient resume will probably move on.

Recently, the IT job market seems to be fairly tight. Normally good skills help, but compensation levels for experienced or highly skilled staff may be a budgetary constraint, from management’s perspective.

So you may need to stay put and reduce the pain level!


Solving That

When things get bad, it’s time for something to change. Sometimes a “cleanup” initiative or whatever isn’t going to be enough, it’s time for bigger, more radical change.

Call me optimistic, but this may be where having a frank conversation upwards might help.

Things you can do to reduce pressure on yourself/staff:

  • Get the network to a robust state, where fewer things break. Thereby reducing your headaches and late nights. Justifying one-time cost to outsource this cleanup might be easier than ongoing costs for more staff. Cleanup should probably include consistency and accuracy of device configurations.
  • Get the network documented. Outsourcing is one way to get this done. Documentation might include naming conventions, labelling devices and cables, etc.
  • Get the Ops tooling to where it helps you resolve outages and performance problems faster. Sometimes (often?) this means replacing legacy tools with more modern, more capable, easier-to-use tools. Which is mostly a one-time cost. Outsourcing deployment could also be a reasonable sell to management, and a win.
  • That might apply to a whole automation tool chain, or to network (etc.) management tool(s).
  • Get protocol, design, architecture advice. Networks built in ad hoc fashion or using out-dated techniques are costly to maintain. Examples: extensive use of spanning tree. Yes, it’s simple in some ways. Building a hierarchical routed network and capturing configurations can also be fairly simple, and do a better job of localizing problems, which makes them easier to solve. Random use of routing protocols, lack of trustworthy redundancy are two related symptoms.
  • If you need to identify problem sources and prioritize them, bringing in a consultant for an overall assessment can help. It also redirects any management hostility to the consultant.


If staff comp and/retention challenges lead to staff and/or skills shortages, goal-focused outsourcing to consultants can help with headcount or skills.

  • Contract for 1-2 days/week of senior consultant (with real skills at that level, not just a “senior” or “architect” title a costly consulting firm has given them). This can be especially helpful for tasks such as robust design/design corrections, cleaning up fragile parts of the network, planning, and improving tool usage by staff. [Plug: BlueAlly has been doing a good bit of this. It seems to work well for customers.]
  • Contract additional staffing to do basic tasks (cabling, cabling cleanup, as-built documentation, labelling).
  • If you’re not sure what the problem(s) are, bring in experienced consultants for an “assessment”. Leaving the scope somewhat open may lead to finding problems that are blind spots as far as the organization. I say that because almost every site I consulted at had major problem sources they just took for granted.


Change Your Paradigm

Something to consider as well is more radical change. Shift your paradigm!

How can you cut out some part of your costs or how your staff’s time is consumed?

For example, have you considered doing Wi-Fi-only for (parts of) the edge?

  • Some college campuses have been moving to Wi-Fi in dorms and common spaces for user network access. Reason: lower infrastructure costs, good enough that wired ports are no longer needed, and it’s what students and staff generally only use anyway. You can still provide many fewer wired switch ports in closets if needed, e.g. for HVAC, security or other local systems.
  • Will that practice accelerate? Will businesses start doing that, especially with staff hotelling and in-building mobility for meetings?
  • A contrary point of view: Wi-Fi bandwidth is a somewhat limited resource. So reserve Wi-Fi for mobile and selected sensor use cases, but get as much other traffic onto wired neworking? It’s a choice. It’s one I suspect few are considering right now.
  • To generalize: is there something you could change about how you’re doing networking (or some other aspect of IT), to simplify things and/or reduce costs?


Cloud is another example of a paradigm shift. It perhaps addresses agility more than cost savings. Bear in mind the cost includes maintenance and support tasks, shifting those responsibilities off in-house staff. Cloud offloads purchase, deployment, and ongoing maintenance, which could be a win: make them someone else’s problem.

Having a standard approach (architecture) to standing up cloud apps and micro-services in the cloud (or onsite) seems to me as the key to longer-term scalability. Each system being a one-off seems like a potential security, troubleshooting, and maintenance nightmare. If app development or re-development is outsourced, creating many one-offs will lead to serious future technical debt. Longer outages, harder to change, harder to update.

To allow for change, maybe the architecture can be updated, say every 3 or 5 years?

NaaS (Network As A Service) is somewhat similar. Generally “NAAS” is used in a WAN context. Shift costs and staffing to a provider, or better, to two. Given some of SD-WAN complexity, I’m dubious about a provider doing SD-WAN or IPsec/VPN tunneling for you.

Graphiant’s approach seems powerful but much simpler from the end-customer perspective and their support perspective. Tunnels galore is the opposite of that.

Outsourcing the whole network or a significant portion of it is of course a possibility. I suspect it is one that works best for fairly simple networks or parts of networks.

Meter and Nile are the two companies that recently presented at TechFieldDay events. Should they be called Campus-Plus NaaS? The sweet spot for them may be e.g. large retailers or store/restaurant chains with large geographic footprint, where outsourcing site connectivity and small on-site networks might be a major win. That might leave HQ and the data center and external-facing cloud in internal hands, or those might also be outsourced.


TANSTAAFL

There Ain’t No Such Thing As A Free Lunch. (I learned this 1950’s? Acronym from my SF reading: Robert Heinlein).

In English: changing how you do Networking, NetOps, or IT in general isn’t going to be free. And doing it cheaply or poorly will likely have later costs.

But the win can be shifting some fairly routine workload to outside, allowing more internal focus on what matters the most.

Some of what I’m preaching here applies in personal life. I’m an ongoing project in better documenting and filing paper or electronic copies, so I don’t have to expend large effort digging through files to reconstruct something. Like when I had the hot water heater last maintained, or repaired or replaced.


Coming Attractions

Some network management products are now doing a good job of mapping networks, with useful multi-layered diagrams. BlueAlly’s Dan Wade spoke at AutoCon2 about testing implementation functionality and verifying correct configurations.

AI agents calling on an AI back-end system may increase the need for robust reliable low-latency networking.

Sensors and IOT may greatly increase use of the network, as in 10 times the number of endpoints, or more.

Location sensing, e.g. staff phones, ditto.

Security can greatly increase network complexity, depending on how it is implemented.? Micro-segmentation is one example. Adding lots of VLANs could be painful. In-switch security enforcement, less so. While VLANs might be a bit stronger security in principle, getting them right and ongoing costs might make them actually less secure. And that’s all I intend to say here on the topic of network security!

If you haven’t noticed: the network is never called upon to do LESS!


Conclusion

If your job is constant chaos, long hours, or constant re-verifying how some part of the network works, consider ways to address that. Perhaps outsourcing something(s) that will help increase stability or make troubleshooting easier. Or appropriate network re-design can make configurations simpler, easier to maintain.


Links

I don’t have any links on the “how to dig yourself out of a hole” theme above. Sorry!

Here are some links on the automation and tools side of things. All good stuff, IF you can get your head “above water” to where you have time to work on network accuracy (implementation matching intent). I strongly buy into the automated deployment and consistency verification perspective: test before hand, automate, verify correctness, and in general have far fewer problems.

https://packetpushers.net/blog/autocon2-talk-summary-step-0-test-the-network-danny-wade/

Video: Step 0: Test the Network

I really like the work Jeremy Schulman has been doing with verification testing what gets deployed, etc.

https://packetpushers.net/blog/autocon2-talk-summary-ai-driven-advanced-network-observability-jeremy-schulman/

AutoCon2 Talk Summary: AI Driven Advanced Network Observability – Jeremy Schulman

The other Autocon presentations were all interesting.

In general, Dinesh Dutt’s SuzieQ can be a great way to pull in network-wide data about aspects of the network. The open-source version is, of course, free: https://suzieq.readthedocs.io/en/latest/.

On the more-skilled end of things: Modern NetOps Needs More From Network Engineers. That seems like a LOT of new skills for most people in networking to acquire! Does outsourced installation and training plus tools plus consulting service address that?


Miscellany

Reminder: you may want to check back on my articles on LinkedIn to review any comments or comment threads. They can be a quick way to have a discussion, correct me, or share you perspectives on technology.

Hashtags: #PeterWelcher #CCIE1773 #NetOps #NetStaffing #NetBacklog #ParadigmChanges

FTC disclosure statement: https://www.dhirubhai.net/pulse/ftc-disclosure-statement-peter-welcher-y8wle/

Twitter: @pjwelcher

LinkedIn: Peter Welcher, https://www.dhirubhai.net/in/pjwelcher/

Mastodon: @[email protected]

BlueSky: https://bsky.app/profile/pjwelcher.bsky.social


Three logos: Cisco Champion 2024, CCIE Lifetime Emeritus, Networking Field Day

? ?

Raj Gupta

CEO of Staffwiz and Business World Travel, driving innovation in virtual staffing and luxury travel delivering top talent solutions and exceptional client experiences.

2 周

Great insights, Peter! Networking is indeed one of the most powerful tools for growth. Identifying pain points is the first step in making more meaningful connections. Looking forward to reading more of your thoughts on this topic!

回复

要查看或添加评论,请登录

Peter Welcher的更多文章

  • AI Ate My Blog on RoCEv2

    AI Ate My Blog on RoCEv2

    I acknowledge I’ve been a blog technology summarizer for quite a while. It served to help me broaden/solidify my skills…

  • AI Datacenter Switch Math

    AI Datacenter Switch Math

    Author: Pete Welcher, Coauthor: Brad Gregory This is blog #3 in a small series about Networking for AI Datacenters…

  • AI Requirements for Datacenter Networking

    AI Requirements for Datacenter Networking

    Author: Pete Welcher. Coauthor: Brad Gregory.

  • Quick Takes #2, February 2025

    Quick Takes #2, February 2025

    I’m working on some longer blogs that I hope to be able post in the next week or two. In the meantime, lots of exciting…

  • Quick Takes: February 2025

    Quick Takes: February 2025

    I’ve got some longer technical blogs in the works. For this week, it’s time again for some of my “Quick Takes”:…

  • Pete’s Take: Pondering NetOps/AIOps Strategy

    Pete’s Take: Pondering NetOps/AIOps Strategy

    What’s new in NetOps, including AIOps, and where are things heading? Some thoughts ..

    1 条评论
  • Pete's Take: AI/ML and Error

    Pete's Take: AI/ML and Error

    Artificial Intelligence (AI) has certainly received a lot of press lately. And achieved new levels of hype.

  • Book Review: Machine Learning for Network (etc.) by Javier Antich

    Book Review: Machine Learning for Network (etc.) by Javier Antich

    Welcome to 2025. I’m easing back into blogging for 2025 after fun and (sort of) relaxing holidays with visits by 3…

    1 条评论
  • Selector.AI Delivers AIOPs

    Selector.AI Delivers AIOPs

    Selector.AI’s management tool is built around delivering AIOps with a user-friendly natural language interface.

    1 条评论
  • Quick Takes 12-10-2024

    Quick Takes 12-10-2024

    I do a “tech skim” most days, reviewing my favorite social media sites for interesting tech news, and for possible blog…

社区洞察

其他会员也浏览了