Network Automation Notes from wizard we can learn from

Network Automation Notes from wizard we can learn from

Lots to take in here. We are putting each of these area/tools into our internal training program.

Lately a lot of our time has been importing things into NetBox so we have a source of truth. For network devices it's taking JSON output of the config/chassis and parsing it to then create resources in NetBox to match, for servers. We use Ansible facts to grab data, then parsing that to create things in NetBox. We have a lot of automation/processes so this is really the start of it. Our toolset is Python, Ansible, NetBox, NAPALM mostly at the moment. We intend to moving to CircleCI, GitLab CI, and GitHub Actions as the CI/CD tool this year.

For some of our clients we built homegrown applications similar to NetBox that was their inventory system and had a zero-touch provision functionality. We plug in Cisco, Juniper and Arista devices into the network, they come up, get DHCP, get instructions to retrieve firmware + config. We use Ansible to pull data from that inventory system, combine it with data in variables and then render device templates, push config to devices, get a diff, commit it. We have automation to upgrade devices because their devices were so behind on firmware so we literally spent a year doing scheduled upgrades of all the devices. The application looks at inventory to decide what model it has, what the desired firmware version is for that model, and then interact with the device to install it and reboot it. If there were special things we needed to do beforehand it would do that too, eg shutdown BGP sessions, etc.

For the maintenances we add a field in inventory where we record when the upgrade for it was to happen, and we had something on a schedule look for devices with upgrades happening within 7 days, query the server inventory for all servers in those racks, and send emails to those teams telling them about the maintenance. We intended on having functionality that would kick off the upgrade automatically, but we never got to the point where we were comfortable letting it do it, so a human still facilitated the upgrade.

There was a lot of "quick wins" that we got from the inventory system. You can generate all your monitoring configuration by pulling data from inventory and rendering them out, you can add webhooks to call other systems when an event happens (eg device is created, interface state is changed, etc.). Generating reverse DNS entries for interfaces based on some naming scheme.

On a personal level I've been working on a tool based off some of that, that is written in Python using the Flask framework. It's specifically for "ZTPing" devices, which all have their different methods. It works with Arista devices, although I almost exclusively work with Juniper nowadays, and I've spent a bunch of time adding PXE functionality. My next thing would be JunOS devices now that I have a spare device to use. The idea is that I plug in a device to the network, it gets an address from DHCP and instructions on where to get it's configuration based on it's serial number, it retrieves that config and applies it.

There's a lot of value in automation around the basic provisioning/configuration of devices. Having an inventory/source of truth is huge for documentation/discovery purposes, as well as a place where data is stored that is then used for other purposes (such as generating monitoring configurations). You can ensure consistent configuration for basic stuff like DNS/NTP servers, SNMP configuration, etc.

If you work in a company that has a lot of PoPs and/or lots of transit/peering, then having tooling to manage all those BGP sessions would be amazing. There's tools like Peering Manager that solve some of that.

Even having more automated processes for troubleshooting can be super helpful. I've written little tools for myself sometimes just to do that. I remember one was we would have optics die some frequently, so the tool just pulled interface error information, light levels from both devices on a link and displayed it so I could determine which optic to replace. It was just provide a hostname and interface, it would connect, grab info, look at LLDP to see the other device, connect to it, grab info and display both.

I haven't had the opportunity to do it yet, but there is a lot of cool event driven automation you can do. I believe I have seen people use Stackstorm for it. You have automation that runs in response to an event. You could have it auto remediate the issue or take some proactive measure to avoid an issue in response to whatever event you define.

These are some good companies around networking/automation

https://netdev.chat/

https://www.networktocode.com/community/

Being able to create web applications is very helpful, especially APIs. I honestly do not like making GUIs so I avoid that stuff as much as possible. It's also super valuable to know how to consume APIs.

The tool I'm working on is using Flask but there is no UI. It is just an application that things talk to via HTTP to get information back. I just render things like iPXE configuration, cloud-init, preseed, ignition, etc. The things meant to talk to it are network devices, servers, and VMs, not humans.

Having a web application that is an interface to multiple other systems can be useful. Your application can be written to send/retrieve data from the other systems and display it to a user. The user can do things in your app and end up affecting the other systems without needed access/knowledge about them.

Brook Oldre

President @ EtherSign | Blockchain Innovator

1 年

Joe, thanks for sharing!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了