Networking Field Day 35 Pt2
Last week I wrote an article on three of the presenters at NFD35 Arrcus, Inc. , Hedgehog , and 英特尔 . The focus was on Whitebox Networking and Data Processing Units (DPU) or SmartNIC. You can find that article here -> Networking Field Day 35 Newbie Pt.1.
The Promise of AI/ML Ops
For almost 40 years when the RFC 1065 for Simple Network Management Protocol was published we drank from the well of 5 minute polling intervals. We missed events, microbursts, and the polling was CPU intensive.
Deep Packet Inspection (DPI) and Network Flow Data with Netflow, IPFix, JFlow, and proprietary formats provided flow data that was provided by infrastructure that did Flow Export to a Flow Collector. Flow Export could be CPU intensive and again we missed critical events due to flow export limits.
We begged, pleased and cajoled our vendors that we need more data, we need near real time events from the infrastructure. They delivered through Streaming Telemetry and we went from the water fountain to the Waterfall.
The problem was systems that could make sense of the waterfall were all reactive. The correlation of the information from all this streaming telemetry was mostly a manual effort of spreadsheet exports and custom SQL queries. We could build alerts and events from the data but we had to know what we were looking for and we never knew until it was already too late.
AI/ML Operations is the ability to build a data lake of Network Events and using LLMs develop correlations that network operators can use to proactively identify events or pre-events to become more proactive.
In English please? Imagine if you could ask you monitoring system a question like - 'Do I have any WAN interfaces with 50% packet loss? Are all those links form the same carrier? Are they all in the same region?' or 'Did anyone upload confidential data to Dropbox?'. No searching logs, no reading packet captures, no research. As a Network operator try to imagine all the data sources you need ot build the answers to that data and the time it would take ot get an answer. Then imagine just asking the machine for the answer and saying in response to the machines answer 'Let me know what these conditions happen'. How much could you reduce Mean time to Resolution (MTTR)?
领英推荐
cPacket demonstrated the ability to have their system proactively produce a 'card' or alert on infrastructure state based on predefined conditions where the platform can correlate from multiple sources. The wrote a pretty good article explaining AI Observability located here -> The 3 P’s of AI Network Observability and Security Monitoring: Why High-Quality Data Matters.
cPacket also demonstrated an innovative way to capture data in Public Cloud so that you can achieve Total Network Observability. They have a free demo you can trial on their website.
Selector demonstrated a platform that takes in data from almost any source about the current state of the network infrastructure and using advanced LLM capabilities develops a Data Model of correlated events that you can query using Natural Language Queries or standard SQL select. Imagine taking all the 'white noise' that your infrastructure creates and putting into a system that can learn what you really need ot pay attention to. How much time would that save? How many customers could you keep happier through incident avoidance or reductions in resolution time.
The ability to take in Syslog, SNMP, Network CMDB, Telemetry, and almost Event source and then make a deduction of the 'real' events. The opportunities are something I am excited to watch develop.
Why Should I Care
Great takeaways, Michael Wynston! We appreciated your participation during our presentation. Thank you!