AI and Networking

We’ve all been hearing about how hot AI is. I suspect most of us are intrigued, and a few (especially those doing some software development already) may be considering shifting career paths. For the rest of us, there’s intellectual curiosity and then there’s “how does this affect $day_job?”

I’ve been trying to track Networking-related aspects of Artificial Intelligence (AI). This blog contains some topics and links I found interesting, and some of my musings (which may or may not be non-artificially dumb).

Some Overall AI Comments

Is it me or is AI hype a bit bigger than previous hype cycles?

One suspicion I have is that the investment and spending for AI is huge, driving press attention. However, investment is not necessarily equal to broad market size.

The AI and some cloud vendors are investing heavily in AI gear. The cost of building AI computing farms is huge. There also are or will be some large firms seeking to deliver future AI products (big wins), and government or other organizations with security/privacy of data concerns, and with in-house LLMs.

My belief is that the cloud-AI vendors see a huge market for organizations consuming their expertise and resources via LLMs-as-a-Service. Given the costs, experimenting with some internal AI projects using commercial LLM and AI services, to build internal skills likely makes MUCH better financial sense.

Aside from some potential break-through areas, the early data I’ve seen suggests that IF DONE RIGHT there may be 25-30% ROI on things like AI for customer service, but there is also significant risk of failure as well (for various reasons).

One question is how much customer demand, how fast. Will the AI providers over-build? How many organizations will do substantial AI datacenter buildouts, given the costs (equipment, possibly megawatts of power, cooling, etc.)

I’ve interacted with enough pathetic customer service call-trees and chat assistants to be strongly impressed by how bad they are at more than the most basic call triage. That carries over to AI for customer service. The problem may be due to development by non-programmer personnel or badly-understood internal processes. Or economics: cost to add functionality versus costs saved.

On the other hand, I’m a believer in AI for image recognition, and perhaps better speech transcription. I’ve always preferred transcripts to video recording just because they’re searchable. (Coupling the two provides the benefits of both: search, find the clip, then view.) So I think there’s strong potential for AI to access, index, process, etc. images and videos. The same for audio recordings.

This has some scary potential. I’m getting text hacking probes spoofing being from the (name of) the CEO, presumably obtained via email and corporate web page lookup. AI or even just good hacker code might be able to glean a LOT of information about a person from various websites. On the fly. So location tracking based on your face is now becoming possible, as well as real-time data lookup.

See also:

https://www.macrumors.com/2024/10/02/meta-smart-glasses-facial-recognition/

Imagine an AI video and correlation hacker walking up to you, greeting you by name, maybe saying they’re someone you knew 10 years ago, and asking for assistance, money, whatever…

Networking/IT Use Cases for AI

So … what are the current Networking and IT Use Cases for AI? This is a sort of verbal Venn diagram: overlap of networking (IT) and AI. Here are the broad networking use cases I’m aware of:

  • Network (etc.) management via AIOps
  • Building datacenter and other networks to support AI. (Sensor nets!)
  • Tools for network (etc.) data analysis. Probably home-brew at first.

There may be more. Later, I appeal for you to tell me what I missed!

The following sections discuss each briefly.

AIOps

AIOps refers to using AI/ML (Machine Learning) to filter log and telemetry data, detect failures or performance issues, and identify the root cause(s).

The pre-requisite for AIOps is devices that send useful log data and telemetry to a large back-end database. Possibly supplemented by tools that probe to monitor performance statistics, such as ThousandEyes or CatchPoint, but also with App monitoring (e.g. Cisco AppDynamics).

Over the years, I’ve been amazed at how massive and pointless log messages some systems send. So some filtering is very much needed. And most tools now provide functionality like standardizing database fields from log messages and pre-processing steps. Also needed. I experimented a while ago with an ELK stack and log data from a large hotel chain at one point. Normalizing the log data was a nightmare! Even just the Cisco data. The feeble results are up on github, posted in a blog a while back. Didn’t want to invest a lot of time to take it further.

Some of what I’ve seen from early attempts at telemetry seem similar. Heck, SNMP had issues with MIB entries for all sorts of things that I consider useless, and a NM tool I’ve used exposed graphs for many of them. I explored some of the early telemetry APi’s and data, which provided great access to way too much information only the vendor could love.

I’ll spare you the rest of that rant. Summary: hard finding the flowers for all the weeds.

So where is AIOps coming from? Seem like we’re likely to buy it with our One Big Network Management Tool (“OBNMT”), not DIY. I’d think. If our employer can afford the OBNMT. Having a complete OBNMT seems a bit far off right now, but may be where we end up in a few years.

That is, NM tools right now seem to cover parts of the OBNMT space, but I’m not aware of an overall OBNMT. I thought about trying to do a Venn diagram of NM tool capabilities — maybe a future blog, maybe not at all. Checking off which vendors are in which Gartner capabilities lists might be a better starting point (if one has access).

Come to think of it, a degree of NM tool modularity seems like good thing. Big expensive tools have an owner. This leads to reports focused on their needs and not others. E.g. ServiceNow with management reports but no way for the technical side to get reports built, and no access to DIY. (And sometime stale automation of tasks because of ownership, approval, budget, and other barriers.)

So some firms use other tools and forward only say outage info and perhaps two-sigma telemetry events (excessively high or low) to ServiceNow. Another thing I’ve heard sites doing is reducing the costly log volume to ServiceNow — part of the sales pitch for some log management tools. (Feed logs to the tool, filter, pass key items to Service Now.)

What I hope for from AIOps is not just spotting abnormal telemetry or probe values. That’s just the starting point. But also correlating them, perhaps starting with temporal correlation (close occurrence in time). With some human assistance/coding, maybe not only correlating but identifying which of the events is the cause. Perhaps with some implicit or explicit domain-aware rules about which events may or may not be correlated. Having an internal model of the topology (including routing peerings) seems key for that.

AIOps Comparisons

One issue I’ve seem coming after checking out the marketing documents (etc.) for Juniper’s MIST and Cisco’s Catalyst Center. They each do some thresholding and perhaps AI for troubleshooting. Maybe some correlation. Where is the list of what each of? them detects and reports on? How can a buyer compare the two?

I doubt we’ll ever see such lists. Vendors won’t want to give the competition a list of things to develop.

So how does a buyer compare capabilities? Or are vendor-specific hooks enough that other platforms aren’t contenders?

Note: Selector.AI is an interesting player in this space. See my prior blogs about them, and prior TechFieldDay presentations by them. E.g. https://techfieldday.com/companies/selector-ai/

They do custom deep domain-specific correlation etc. logic for Service Provider networks. They now have the amazing John Capobianco (ex-Cisco) spawning ideas and tech marketing for them too!

In-House AI Datacenter

Great market for those with the huge budget needed.

Buying the latest AI GPU chips, servers, and mesh of multiple 800 Gbps (or faster links) is going to be extraordinarily expensive. Updating every 3 years to stay competitive/bleeding edge, per something seen recently, just adds to the cost.

And lately I’ve been seeing discussions of AI cloud provider challenges, like potentially powering and cooling racks, each of which draws 40-60 kilowatts of power. Which explains why the vendors are buying up old nuclear power plants, etc.

How many companies are up for doing this internally? Worth it for major competitive edge perhaps, but is that edge really there to be found? A strong vision and plausible path to success seem likely required to get approval to proceed! I gather a number of companies are in fact going down this path.

Home-Brew AI Uses

Just as we’ve dabbled in coding and network automation to various degrees, we can experiment with AI. Tools using AI to make data analysis and reporting easier seem like the big potential win right now. I was underwhelmed until recently. Not having to read timestamps and do math with WireShark output does look rather useful! And thanks to John Capobianco for sharing his experiments with us!

Great blog by Phil Gervasi:

https://networkphil.com/2024/10/29/beginners-guide-for-using-large-language-models-in-network-operations/

Community Feedback PLEASE!

I’m only one person, and I’m interacting with fewer companies and people lately. Kind of goes with retirement. I throw my ideas & what I think is happening out there in blogs, but <gasp!> I could be wrong.

It would be great to have and share more opinions and information.

What is your take on some of the topics I bring up?

So I’d like to encourage feedback via any of the media I post the blog links on: LinkedIn, Twitter, Mastodon, BlueSky. (I’m also on Tribel and Spoutible, but they don’t seem to have many tech people on them, and so I don’t check them often. And Twitter is close to exceeding my “too annoying” threshold. I use a separate ID there to try to keep the politics separate from the tech.)

Don’t violate any NDA’s, and please be somewhat polite. I’d just as soon not have to delete a lot of four letter words and #!@$$ characters :-)

And what I can offer in return is to try to summarize the feedback and even post the comments in a summary follow-up blog. Assuming there are three or more comments, say.

If you don’t want your name mentioned, please put “<noname>” in your question/comment. And that’s as far as I’ve thought this idea through.

Crowd-sourcing FTW?

Plug

I’ll be a delegate at Network Field Day 36 in San Jose next week, along with several old and new friends. Tune in for some presentations and technical discussion.

More info: #NFD36 https://techfieldday.com/event/nfd36/

Miscellaneous

Hashtags: #PeterWelcher #CCIE1773 #AIOps #AIiNetworking

FTC disclosure statement: https://www.dhirubhai.net/pulse/ftc-disclosure-statement-peter-welcher-y8wle/

Twitter: @pjwelcher

LinkedIn: Peter Welcher

Mastodon: @[email protected]





Joel W. K.

[former] Distinguished Engineer | CCIE # 1846 (ret.) | Journalist, Tech Writer | Photographer

3 周

Pete, given the need for network architects to communicate to a broad audience, Grammarly is the most useful AI tool I've used. I view it as an educational tool, teaching the author how to write more effectively as part of the process.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了