Balancing Act: Network Automation Using Pure Python vs. Established Libraries - A Comprehensive Analysis of Pros and Cons
Cisco DevNet

Balancing Act: Network Automation Using Pure Python vs. Established Libraries - A Comprehensive Analysis of Pros and Cons

In the constantly evolving scenario of network automation, the availability of powerful libraries and tools, such as Netmiko, NAPALM, Nornir, PyATS, PyEZ, Ansible, and many others, has significantly transformed how network operations are handled these days. These libraries and tools have gained immense popularity and have become essential tools for network engineers due to their comprehensive functionalities and their ability to simplify really complex network tasks.

However, despite the robustness and popularity of these libraries, there are several specific situations where?writing pure Python code instead can offer distinct advantages. One of the main benefits of writing your own Python code to accomplish network automation is its flexibility. Network engineers can leverage the full capabilities of the Python programming language to tailor their code specifically to their unique needs and tackle complex challenges with precision. In other words, writing pure Python code empowers network engineers with greater control and customization over the automation process. By directly interacting with network devices and manipulating data structures, engineers can fine-tune their scripts to achieve optimal performance and efficiency, ensuring that the automation process is perfectly aligned with their network infrastructure.

More than just customizing code to meet specific needs, automating the network by writing your own Python code for all of its required functions and goals allows seamless integration with other tools and technologies because you have complete control of what it can deliver and how. The comprehensive Python ecosystem offers a wide variety of libraries and modules that can be used to enhance the automation process, and I am not saying the ones targeted explicitly for network automation; I meant libraries to achieve many different things and accomplish specific tasks in our code. Engineers can leverage these libraries to integrate monitoring, logging, and analysis systems, enabling them to gain deeper insights into network performance and make more informed decisions.

Additionally, writing Python code without relying on well-known network automation libraries promotes continuous learning and skill development, which takes a lot of time but is blissful. Network engineers who invest time and effort in mastering Python can enhance their programming skills over time, which benefits not only their automation efforts but also opens up opportunities to explore other areas of software development and data analysis.

Trust me on this: there are many organizations where libraries like Netmiko, NAPALM, and Nornir cannot be used for automating the networking infrastructures due to a variety of reasons (scale, performance, systems integrations, missing features, technical constraints, etc.), but you still need to accomplish many things that these libraries would do for you - but imagine without using them! And that's why I decided to write this article.

While libraries like Netmiko, NAPALM, and Nornir have undoubtedly revolutionized and simplified network automation, allowing many network engineers to start coding and automating their networks while not possessing in-depth knowledge of Python and overall programming languages, there are scenarios where writing pure Python code can offer unique advantages. By combining the power of these libraries with the flexibility and customization of Python, network engineers can unlock the full potential of network automation, optimize their workflows, and effectively tackle the ever-evolving challenges of managing modern large-scale computer networks.

It ain't easy, we know that. It takes time, effort, commitment. But networks must be dealt with entirely through infrastructure as code (IaC) nowadays, so there is nowhere to run: you must dedicate your time and improve your skillset, and you will get really good at it!

By no means am I saying you should not use Netmiko, NAPALM, Nornir, Ansible, or whatever, for this matter; I am definitely not saying it! This article discusses the pros and cons of working with your own code to achieve things without depending on these libraries, what you get in return, what the benefits and challenges are, and what sorts of issues you'll run into. It's about knowing what you're doing and why, that's all. These libraries do their job well in most cases, and you will eventually find yourself working with them to achieve your network automation goals in a much more straightforward and manageable way than having to write and customize your complex Python code to do the same thing. But, again, there are pros and cons to each approach, and this is what I discuss here today.

Direct control and customization

Advantage: With pure Python, you have the flexibility to fully control and customize every aspect of the implementation. This means that you can adjust the code to optimize performance, meet specific use cases, and align it precisely with the requirements of your network infrastructure. This level of customization proves highly advantageous, especially when working with distinct network devices or non-standard protocols. Additionally, you can leverage Python's extensive libraries and frameworks to enhance and extend the functionality of your code, giving you even more power to achieve desired results.

Challenge: To successfully implement this approach, it is necessary to have a comprehensive understanding of the underlying protocols, such as SSH, SNMP, or HTTP/HTTPS, as well as all the data structures to be consumed by your network automation functions (i.e., JSON, YAML, XML). This may require writing additional code to ensure proper system functioning and handle potential complications that may arise during the implementation process; that's why many engineers prefer to use Paramiko instead of Netmiko to handle the SSH part of the code.

Advantages of a lean dependency footprint

Advantage: One of the main advantages of using pure Python without specific third-party libraries aimed at network automation is having a lean dependency footprint. This means you can reduce the number of external dependencies, which can benefit specific environments with resource constraints. It is also useful when deploying scripts with minimal external dependencies for security or compliance reasons.

Challenge: While having a lean dependency footprint can be advantageous, it is essential to note that there are some challenges associated with this approach. By not using dedicated libraries, you may sacrifice some of the time savings provided by high-level abstractions. These dedicated libraries often provide convenient functions and features that simplify development and improve efficiency. Therefore, it is crucial to carefully consider the trade-offs between a lean dependency footprint and the benefits provided by using dedicated libraries.

Educational Value

Advantage: Writing network automation scripts from scratch can be a highly beneficial educational exercise and a pleasant one, to be honest with you, but there is a steep price. It provides an opportunity to delve into the intricacies of the Python programming language and gain a comprehensive understanding of the network protocols you are dealing with. By engaging in this hands-on approach, you can enhance your knowledge and skills in both programming and networking.

Challenge: While the advantages of this educational exercise are significant, it is crucial to recognize the potential difficulties that may arise. The learning curve for network automation through scripts can be quite steep, requiring tons of dedication, patience, and perseverance. Additionally, the initial development phase may require way more time and effort. However, the valuable knowledge and experience gained through this process make it a worthwhile endeavor with long-lasting results for your career and skillset.

Performance Optimization

Advantage: In some cases, it is possible to achieve better performance by writing Python code specifically tailored to handle specific tasks, thereby avoiding the potential overhead associated with more generic libraries.

Challenge: However, it is important to note that performance analysis and code optimization require a deep understanding and expertise, which can require considerable time and effort. However, the potential gains in terms of performance can make this a valuable endeavor.

Benefits of Simplified Debugging and Troubleshooting

Advantage:?A significant advantage of debugging pure Python code is that it can often be more straightforward. With fewer layers of abstraction to navigate, when an issue arises, you don't have to deal with the additional complexity of using third-party libraries. This simplified approach allows you to identify and resolve problems more efficiently.

Challenge:?However, it is important to note that there is a trade-off. By relying solely on pure Python code, you may miss the collective knowledge and variety of debugging tools developed and integrated into existing libraries over time. These libraries have been refined and optimized through years of experience, making them valuable resources for troubleshooting and problem-solving.

Therefore, while simplified debugging and troubleshooting may offer their advantages, it is essential to consider the potential limitations and additional support that established libraries can provide in the long run.

Flexibility in data handling

Advantage: One of the main advantages of having flexibility in data handling is the ability to design custom data structures and handling mechanisms tailored to your specific data sources and destinations. This means you can create solutions ideally suited for non-standard data formats or when you need to integrate with custom APIs. This level of flexibility can be crucial in certain situations.

Challenge: However, it is essential to note that leveraging this flexibility may require additional effort to ensure that your data handling is as robust and error-tolerant as what has already been developed in community-driven libraries. While you have the freedom to create your own solutions, it is essential to carefully consider and address the potential challenges and pitfalls that may arise along the way.

No risks of external maintenance

Advantage: One advantage of not relying on external libraries is that it reduces the risk of facing maintenance issues in the future. By using only pure Python code, you have complete control over the maintenance process, ensuring long-term stability and reliability.

Challenge: However, it is essential to note that by relying solely on pure Python code, you also take on the responsibility of handling maintenance, updates, and security fixes on your own. This means you need to invest time and effort to keep your code up-to-date and secure, which can be a challenging task. However, by taking on this burden, you can tailor the maintenance process according to your specific needs and requirements, leading to a more robust and customized solution.

Intellectual Property and Security

Advantage: For organizations that prioritize the protection of their intellectual property and security, pure Python code can provide a valuable solution. Using Python, these organizations can develop proprietary software that reduces the risk of exposing internal workings to potential attackers or competitors. This helps maintain the confidentiality and integrity of their intellectual property.

Challenge: However, it is essential to note that building a secure and robust system from the start can be a challenging task. This requires a deep understanding of security principles and practices. Organizations need to invest time and resources in acquiring the necessary security knowledge to ensure that their Python code effectively protects their valuable assets.

Challenges of managing large-scale networks with well-known network automation libraries

Overhead, performance, and scalability

  • Scalability: These libraries can introduce significant overhead due to the abstraction layers they bring. Additionally, it is important to mention that while these layers provide flexibility and ease of use, they can also increase processing time. This overhead can become problematic in large-scale deployments as scripts may take longer to execute. Especially when operations are performed sequentially on thousands of devices, the processing time can be considerably higher. However, it is crucial to highlight that scalability is a fundamental aspect of these tools, allowing them to effectively handle complex network environments and efficiently manage many devices.
  • Efficiency: The performance of certain operations may only be optimized for some types of devices or network scenarios. This can lead to inconsistent performance between different vendors or in various parts of a large network. However, the efficiency of these tools can be improved by adjusting and customizing settings to meet specific device or network requirements. By optimizing these settings, organizations can achieve better overall performance and maximize the benefits of using these tools.

Limitations of multivendor support

  • Inconsistency: Although these tools aim to provide a uniform way of dealing with devices from multiple vendors, the reality can differ. Each vendor may implement network protocols and features differently, which can lead to inconsistent behavior in automation tools. For example, a device from one vendor may respond differently to an automation command compared to another device from a different vendor. This can result in unexpected behaviors and difficulties implementing efficient and consistent automation in a multivendor environment.
  • Lack of support: Not all vendors or devices are fully supported by the available automation tools. This means that there may be situations where certain devices are left out of automation workflows, requiring manual intervention or the development of custom scripts to handle these specific cases. This lack of support may be due to a lack of resources or incompatibility between the automation tool and the device in question. It is important to consider these limitations when planning and implementing automation strategies in a multivendor environment.

Challenges with abstraction layers

  • Increased complexity: One of the main challenges with abstraction layers is that they can add complexity to the debugging process. When a problem occurs, you not only need to debug your own code but also understand how the library code and the device interact, which can be time-consuming and difficult.
  • Limitation in error identification: Another challenge is that these libraries may hide or obscure errors, providing generic messages with little information about the underlying problem. This can be especially problematic in a multivendor environment where different devices and libraries are used.
  • Testing dependencies: Additionally, using abstraction layers often introduces dependencies on external libraries or frameworks, which can complicate the testing process. It may be necessary to set up additional test environments or simulate certain components to effectively test the code.
  • Documentation and support: Finally, working with abstraction layers may require consulting extensive documentation and seeking help from library developers or community forums. This additional overhead can slow down the development process and increase the learning curve for developers.

Dependency Management and Conflicts

Managing dependency conflicts and versioning issues are important considerations in a network automation infrastructure:

  • Library conflicts: In large-scale network automation infrastructures, different libraries may have conflicting dependencies. This can create complexities when ensuring smooth operation and compatibility throughout the system.
  • Versioning issues: Another challenge arises from frequent updates released by libraries like Ansible. Not all plugins and modules are updated at the same pace, which can result in compatibility issues. It is crucial to address these versioning discrepancies to maintain a stable and efficient network automation environment.

Steeper Learning Curve

  • Complexity: Each of these tools has its own syntax, conventions, and operational paradigms. Teams must invest time to learn these systems, which can be a significant investment in large-scale operations. Additionally, understanding the nuances of these tools and their interactions with various network technologies is crucial for successfully implementing them in complex scenarios.
  • Expertise Requirement: To effectively use these tools in complex scenarios, deep expertise is often required, not only in the tool itself but also in the network technologies they integrate with. This expertise goes beyond mere familiarity with the tool and encompasses a comprehensive understanding of the underlying network architecture and protocols.

Limited Customization

  • Rigidity:?A potential challenge with pre-built modules and plugins is their rigidity. These solutions only sometimes align perfectly with the specific requirements of your network or particular use case, resulting in a lack of flexibility. This limitation can make it difficult to fully customize and adapt automation to your needs.
  • Workarounds:?Additionally, when faced with such limitations, you may need to resort to complex alternative solutions to achieve the desired functionality. These workarounds are often intricate and complicated, which can negatively impact the clarity and maintenance of your automation scripts. It is important to consider these potential trade-offs when working with limited customization options.

Update, Maintenance, and Dependencies

  • Update Cycles: When using this tool, it is important to consider the update cycles as you will rely on them to get bug fixes and new features. If the device does not receive proper maintenance, it can become a problem in the future.
  • Breaking Changes: It is essential to be aware that updates can introduce breaking changes, which means that existing automation workflows may need to be modified to fit the new versions.

Security Concerns

  • Exposure to vulnerabilities: Using third-party tools can increase the risk of exposure to vulnerabilities within those tools, which can be a significant concern if these vulnerabilities are not promptly addressed.
  • Compliance: Especially in regulated industries, third-party software may raise compliance issues that need to be thoroughly managed and ensured to avoid potential legal problems or penalties.
  • Software quality: It is crucial to consider the quality of third-party software, as poor quality can affect the overall security and effectiveness of the system.
  • Support and maintenance: When using third-party tools, it is essential to assess the support and care offered, ensuring an adequate plan to deal with technical issues or necessary updates.

Conclusion

Although third-party libraries such as Netmiko, NAPALM, and Nornir have undoubtedly accelerated the field of network automation, it is vital to consider the decision not to rely on them in certain circumstances. Python and other non-network automation-related libraries can provide the necessary flexibility for customization, mainly when operating in restricted environments. Additionally, a "pure" Python code allows for a deeper understanding and control of the automation process.

When choosing between your own Python functions and specialized libraries, it is crucial to carefully evaluate the specific project requirements. The skill set of the development team should also be taken into consideration. Furthermore, the long-term maintenance plan for the automation solution should guide the decision-making process.

It is worth noting that each approach has its own merits. While specialized libraries offer efficiency and convenience, pure Python can provide precision and flexibility. In some cases, a hybrid approach that combines the advantages of both approaches may be the best solution. This allows for leveraging the efficiencies of existing libraries while using pure Python for specific tasks that require more control and customization.

Brown Bag extras

As a token of appreciation, I am sharing one simple Python code that you can use to learn and practice network automation with Python and some of its well-known libraries. In addition, I go over each section, including imports, functions, variables, and execution. I hope you enjoy it.

The script below exemplifies a real-world - yet simple - network automation task where Python's standard libraries and third-party libraries like Netmiko are used to handle network connections, parallel execution, and file operations, making it a practical tool for network engineers who want to learn and embark on network automation and infrastructure as code with Python. The objective here is to connect to multiple network devices in parallel, obtain their running configurations - just like a backup - and store them in files inside a folder. As simple as that.

import argparse
import yaml
import logging
import time
import os
import re
import getpass
from concurrent.futures import ThreadPoolExecutor, as_completed
from netmiko import ConnectHandler, NetmikoTimeoutException, NetmikoAuthenticationException
from datetime import datetime

# Define log configurations
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Lists to store report data
successful_devices = []
failed_devices = []

# Function to read YAML file
def read_yaml(file):
    try:
        with open(file, 'r') as stream:
            return yaml.safe_load(stream)
    except FileNotFoundError:
        logging.error(f"The file {file} was not found.")
        raise
    except yaml.YAMLError as exc:
        logging.error(f"Error parsing YAML file: {exc}")
        raise

# Function to retrieve configuration
def retrieve_config(device_details, username, password):
    try:
        connection = ConnectHandler(
            device_type=device_details['device_type'],
            ip=device_details['ip_address'],
            username=username,
            password=password,
            secret=device_details.get('secret', '')
        )
    except (NetmikoTimeoutException, NetmikoAuthenticationException) as e:
        logging.error(f"Failed to connect to device {device_details['hostname']}: {e}")
        failed_devices.append((device_details['hostname'], str(e)))
        return

    # Get running-config
    try:
        running_config = connection.send_command("show running-config")
    except Exception as e:
        logging.error(f"Error retrieving running-config from device {device_details['hostname']}: {e}")
        failed_devices.append((device_details['hostname'], str(e)))
        return

    # Save running-config to a file
    hostname = device_details.get('hostname', None)
    if hostname:
        date_today = datetime.today().strftime('%m-%d-%Y-%H-%M')
        filename = f"{hostname}-{date_today}.cfg"
        output_directory = f"/Users/lfurtado/Documents/code/get-runningconfigs/configs/{hostname}/"
        os.makedirs(output_directory, exist_ok=True)
        with open(os.path.join(output_directory, filename), 'w') as f:
            f.write(running_config)
        logging.info(f"Configuration obtained from device {device_details['hostname']}")
        successful_devices.append(hostname)
    else:
        logging.error(f"Unable to determine the hostname of device {device_details['ip_address']}")

    # Disconnect
    connection.disconnect()

# Main function
def main(file_name, max_workers, ip_address=None):
    username = input("Enter your username: ")
    password = getpass.getpass("Enter your password: ")

    data = read_yaml(file_name)
    device_details_list = data['devices']

    if ip_address:
        device_details_list = [d for d in device_details_list if d['ip_address'] == ip_address]

    start_time = time.time()

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(retrieve_config, device_details, username, password) for device_details in device_details_list]
        for future in as_completed(futures):
            try:
                future.result()
            except Exception as e:
                logging.error(f"Operation failed for a device: {e}")

    end_time = time.time()

    # Print report
    print("\\n--- Execution Report ---")
    print(f"\\nExecution Time: {end_time - start_time} seconds")
    print("\\nSuccessful Devices:")
    for device in successful_devices:
        print(f"- {device}")
    print("\\nFailed Devices:")
    for device, reason in failed_devices:
        print(f"- {device}, Reason: {reason}")

# Set up command-line arguments
parser = argparse.ArgumentParser(description="""
    - Leonardo Furtado -
    This script is used to retrieve the running-config from our network devices and store it in files.
    The devices and their details are specified in the provided YAML files.
    The script connects to the devices concurrently, improving the efficiency of the process.
    It uses the Python library 'netmiko' to handle SSH connections and commands on the devices.
""")
parser.add_argument('file', help='The YAML file containing the list of devices.')
parser.add_argument('--max_workers', default=5, help='The maximum number of concurrent workers.')
parser.add_argument('--ip_address', default=None, help='The IP address of a specific device for this operation.')
args = parser.parse_args()

if __name__ == "__main__":
    main(args.file, args.max_workers, args.ip_address)        
devices:
  - ip_address: 10.2.0.1
    device_type: 'cisco_xe'
    hostname: 'router_01'
  - ip_address: 10.2.0.2
    device_type: 'cisco_xe'
    hostname: 'router_02'
  - ip_address: 10.2.0.3
    device_type: 'cisco_xe'
    hostname: 'router_03'
  - ip_address: 10.2.0.4
    device_type: 'cisco_xr'
    hostname: 'router_04'
  - ip_address: 10.2.0.5
    device_type: 'cisco_xr'
    hostname: 'router_04'        

Let's explain this code in more detail:

  • argparse: This module is used to parse command-line arguments, making it easier to handle user input.
  • yaml: The yaml module is responsible for handling YAML file formats. It allows for easy configuration and data file management.
  • logging: The logging module facilitates the logging of messages with different severity levels. This is useful for debugging and tracking the execution of a program.
  • time: The time module provides various functions related to time. It can be used for tasks such as measuring execution time or adding delays in a program.
  • os: The os module provides a way of using operating system-dependent functionality. It allows for tasks such as file and directory manipulation.
  • re: The re module provides regular expression matching operations. It can be used for tasks such as pattern matching and text manipulation.
  • getpass: The getpass module allows the program to safely prompt the user for a password without echoing it. This ensures the security of sensitive information.
  • concurrent.futures (ThreadPoolExecutor, as_completed): The concurrent.futures module provides a high-level interface for asynchronously executing multiple tasks. This enables the launching of parallel tasks, improving the efficiency of the program.
  • netmiko (ConnectHandler, NetmikoTimeoutException, NetmikoAuthenticationException): The netmiko library is a powerful tool for handling network connections. It provides a simple and consistent interface for interacting with network devices.
  • datetime: The datetime module supplies classes for manipulating dates and times. It can be used for tasks such as calculating time differences or formatting dates.

Functions:

  1. read_yaml(file): Reads a YAML file and returns the contents. It handles errors like file not found or YAML parsing errors with exception handling.
  2. retrieve_config(device_details, username, password): Uses Netmiko's ConnectHandler to connect to a network device, retrieve its running configuration, and save that configuration to a file. It also logs any errors encountered during the process.
  3. main(file_name, max_workers, ip_address=None): The main function of the script. It prompts the user for their username and password, reads device details from a YAML file, and then uses a ThreadPoolExecutor to retrieve configurations for multiple devices concurrently. It takes command-line arguments to specify the YAML file and how many parallel workers to use. The function also measures and prints the execution time and results.

Variables:

  • logging: Configured to display informational messages, including timestamps and the severity level of the message.
  • successful_devices: A list that will store the hostnames of devices for which the configuration was successfully retrieved.
  • failed_devices: A list that will store tuples of device hostnames and the error messages for devices where the retrieval failed.

Command-line Argument Configuration:

  • The script uses the argparse module to define how it should interpret command-line inputs. It specifies that the script requires a YAML file (file) and optionally takes the number of workers (--max_workers) and an IP address (--ip_address) for a specific device.

Execution:

  • The if __name__ == "__main__": block checks if the script is being run directly (not imported) and then calls the main function with the parsed arguments.

I hope you enjoyed reading through this article!

Cheers,

Leonardo Furtado


O sistema operativo da Cisco, normalmente é mais fácil de lidar com o Python conjugado com o Netmiko, mas quando vamos para outros sistemas operativos, como por exemplo, F5 Big IP, Checkpoint ou Fortinet, pode tornar-se mais complexo.

Great article Leonardo. I agree when you saying that Netmiko does not cover everything, because I try to use it with Python to get data on a specific operating system, like for example F5 Big IP, and it's been hard.

Bruno Lima

CEO UPISP | Netwoork Consultant | Business Advisor | Speaker

1 年

top demais!

Alexandre Silva Nano

IT Network Analyst and Consultant / Consultor e Analista de Redes e TIC

1 年

Concordo! As toolkits existentes ajudam demais, mas com o python "puro" temos um bocado de liberdade. Experimentei criar uma coleta de dados via Netconf, para passar para um frontend em node.js e foi "infinitamente" mais fácil fazer isso com meu próprio script do que utilizar as solu??es prontas. Inclusive, pude fazer um tratamento adequado dos dados antes de os enviar. Mais um conteúdo top do mestre! Sucesso!

Miguel Zambrano

Network Development Engineer II at Amazon Web Services (AWS)

1 年

Glad that you are back ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了