Preparing VIOS Update. NIB

Preparing VIOS Update. NIB

Network Interface Backup at VIO client LPARs is the most flexible, but at the same time it is the most difficult networking configuration in a PowerVM environment. Flexibility in this case means also many variants, how you can implement it to achieve your performance and availability goals.

Usually we can summarize it in two cases:

  • Network Interface Backup with one virtual switch
  • Network Interface Backup with two virtual switches

Of course other types are possible too. Especially if you have a complex network infrastructure with different security zones. All these configurations have something in common:

  • they use Shared Ethernet Adapter on Virtual I/O Server side
  • they don't use high-availability features on Virtual I/O Server side
  • they use Network Interface Backup configuration on Virtual I/O client side

Let's try to build up on these common features. First we look at the configuration with one virtual switch.

No alt text provided for this image
NIB configuration with one virtual switch

The point in this configuration that you should use two different VLAN IDs on Virtual I/O Server side. It is OK if you have just one VLAN to virtualize and normal access ports on the switches. The switches will drop off your VLAN IDs and set their own anyway. But it means also that you use one physical port for each VLAN you have.

If you have more VLANs (as you can place ports in the server) and your network guys are ready to provide with trunking ports, then you should switch to the configuration with two (or more) virtual switches.

No alt text provided for this image
NIB configuration with two virtual switches

In this case you can use real VLAN IDs on your virtual switches (even if I write VLAN ID 1) and you can use the same VLAN IDs on both Virtual I/O Servers because you use different virtual switches.

If we want to automate our IT infrastructure, we should reduce the number of different variants we use in it. That's why you can consider this article just as a guide line to automate your infrastructure and not as a completely automation solution which can help you in 100% cases.

I look into just one case. I have an IBM Power server with two Virtual I/O Servers and two virtual switches on it. As clients I have AIX, IBM i and Linux LPARs, but only some AIX LPARs are using NIB configuration.

There are several prerequisites if we want with NIB on AIX LPAR and would like to automate the switching between interfaces in NIB.

First of all AIX will switch only if your interface is online. If your interface is offline, it doesn't work anyway and AIX will not do anything with it. Usually it is not a problem, but if someone will try to get the interface online during VIOS update, you might get one.

Second you must set auto_recovery attribute of your NIB device to no. You want to control your NIB manually. OK, not really manually but using Ansible. If you set it to yes (it is default value), AIX will try to control the device. If AIX sees, that the primary VIO is already up and running, it will switch the device back to it. Even if you want to do something more on the VIO side. That's why set auto_recovery on NIB to no.

No alt text provided for this image
Check auto_recovery attribute on NIB device

After you set up everything, you can check through which Ethernet adapter NIB is running by using entstat command:

No alt text provided for this image
Search for active channel using entstat

As you can see my NIB is working now through the backup channel. Which one is my backup channel? Use lsattr!

No alt text provided for this image
Show primary and backup adapters for NIB

My backup adapter is ent0. The last piece of information is the name of the active virtual switch. Again we can find it using entstat and a little bit awk:

No alt text provided for this image
Searching for virtual switch names and corresponding Ethernet devices

If we want to switch from our backup adapter ent0 (vswitch2) to the primary adapter ent1 which is working through vswitch1, we use the command ethchan_config:

No alt text provided for this image
Switching channels in AIX NIB

We are ready to automate it!

In our Ansible playbook we worked earlier only with VIO servers, we have to update. Now we want to work with VIO client LPARs too.

We can manually write all LPAR names into our inventory and set the marks which of them are VIO, or AIX, or IBM i, or Linux LPARs. I hope you remember how important a good naming convention is.

In this case I suggest to use ibm.community_hmc collection from IBM. It provides an Ansible plugin to automate IBM PowerVM inventories. I use it to get all my AIX boxes.

To use it we have to install the collection and create a file which ends with '.power_hmc.yml'. Because I'm too lazy to write huge file names I named it so without any prefix.

No alt text provided for this image
IBM PowerVM inventory sample configuration

Line 1 tells that this is a configuration file for ibm.power_hmc.powervm_inventory plugin.

Line 2 makes some checks not so strict in their behaviour. I don't know why I need it, but without the line my complex "ifs" in lines 13 and 14 didn't work.

Lines 3 to 6 define HMC to use and credentials for the HMC.

Lines 7 and 8 define which IBM Power managed system I'd like to use. You can use another filters or drop the lines and you all of your managed systems available to the HMC.

Lines 9 and 10 define that I want to work with running LPARs only. In this case I don't care about LPARs which are not running. I can't change anything there.

The next block "compose" (lines 11-14) defines some important variables. Here is some magic in the definition because we have different Python interpreters and different connection users for different operating systems. Just as a small side note. In contrary to me you don't use root for Ansible connection, do you?

The last lines 15 to 19 define groups according to the operating systems of the LPARs.

You can always check your inventory configuration using ansible-inventory command.

No alt text provided for this image
Checking inventory with ansible-inventory

Let's start with the playbook development and write our short prologue:

No alt text provided for this image
Playbook general information

In this case I use only one collection - community.general. We used it already as we automated SEA failover. In the similar way and for the similar reasons we will use it in this playbook too.

I defined one additional variable - changeto. The variable contains the name of the switch we should switch our NIBs to. If you have a good naming convention, you can write some tasks to figure out the name. I hard-code it in the playbook because of simplicity. We will have enough complexity in the playbook.

As you can see (line 2) I don't collect information from all hosts, because I don't need it. I collect only information I need and do it in the following lines:

No alt text provided for this image
Collecting information from AIX hosts

I could do the same if I change the line 1 to "hosts: AIX" (execute the playbook on AIX group only) and line 2 to "gather_facts: yes". This short playbook is only about NIB failover. If your playbook contains more steps, including VIO update or Linux specifics for example, your line 1 will be the same as mine including all the hosts from the inventory.

No alt text provided for this image
Getting information about Etherchannel devices on AIX

The next step is to extract information about Etherchannel devices. We can do it only if we collected the information (line 23) and the host is an AIX host (lines 21 and 22).

The same as in the SEA failover playbook we search for devices with the type "EtherChannel" and get some information about them. Please note - the filter will find all Etherchannel and 802.3ad LACP devices and doesn't distinguish between them. If you have VIO client LPARs with real LACP - let's say you have virtual ports for the management network and additionally two physical ports in a LACP for your backup network - the filter will find all them.

Using the filter we create a new variable called ethchan, which will contain Etherchannel information.

No alt text provided for this image
Getting additional etherchannel information from AIX

Unfortunately there are some informations about Etherchannel which Ansible doesn't get. To get these informations we use the small awk script, I used before. You know it already - there is a shell script under the hood of each Ansible playbook.

We loop through the information, we've collected from our AIX hosts (line 28) and only if we have an Etherchannel device (line 32), we execute the script (line 26) and save the output in the variable ec_output (line 27).

No alt text provided for this image
Parsing output of the script

In the next step we parse the output of the script and saves the information in the variable ec_info (line 35). But only if the script did produce some output (line 39).

No alt text provided for this image
Adding the parsed information to ethchan variable

After we parsed the information from entstat, we add the information to the variable ethchan, we defined earlier.

No alt text provided for this image
Debug output of the collected information

I just print out the information about Etherchannels to make debugging easier. These lines have nothing to do in production-ready playbooks. But we just play, don't we?

No alt text provided for this image
Warning about auto_recovery

The warning message I wouldn't use in a production-ready playbooks. Mostly because nobody looks at the output. Either the playbook run or did not run. If it didn't run, you search for failures. But if it run, why should you care about warnings?

No alt text provided for this image
Getting the name of the active virtual switch

To decide if we must switch our NIB device to another virtual switch, we first should find the name of the active virtual switch.

No alt text provided for this image
Our ethchan structure (the real reason I did the debug output in the playbook)

If we look at out etherchan structure, we see that the field "active" contains information, if we work through primary or through backup channel. Accordingly fields "backup" and "active" contain the information about the corresponding ethernet devices. The fields with the names of the ethernet devices (in this case - "ent0" and "ent1") contain the information about the corresponding virtual switches. The whole picture results in the line 69 - get the value of ethchan.active field, get the value of the field with the name like the value of ethchan.active field and then get the value of the field of the field of the field... Sorry, I can't count further. It works and it save the name of the virtual switch in the variable active_switch.

No alt text provided for this image
Switching NIB device

The last and the most important step is to switch NIB device. We execute ethchan_config command (line 77), but only if our current virtual switch is not the target virtual switch defined in the variable "changeto" (line 83).


Now you know the reason I like simple configurations. This was very complex even for me. I hope you can use it as a model for your own Ansible playbooks.


Have fun with Ansible!

Andrey

Igor Novotny

Dominus pascit me, et nihil mihi deerit.

1 年

Hi Andrey, is there any way how to aggregate or load-balance (802.ad) LPAR's traffic through dual virtual switches? We use active/backup NIB for a years, but I'm looking for something more advanced now...

回复

要查看或添加评论,请登录

???????Andrey Klyachkin的更多文章

社区洞察