Preparing VIOS Update. VNIC.

Preparing VIOS Update. VNIC.

Last time I wrote about switching over a Shared Ethernet Adapter with Ansible. Today is the time for vNIC. VNIC is a "virtual" part of SR-IOV adapters which enables LPM and failover for you.

If you configured vNICs, you'll find vnicserver devices on your VIOS side:

No alt text provided for this image
lsmap output from VIOS

But there is nothing that you could change there. It is not as in SEA-case, where you can set an attribute, and the failover happens. Or you enabled largesend. Or did something else.

In vNIC-case your first tool is HMC. You create vNICs from HMC, you change them from HMC and you switch them from HMC.

If you look at HMC command line, you will see the cryptical line like below on the screenshot.

No alt text provided for this image
Example of vNIC configuration data

An AIX LPAR with unknown (greyed out) name and ID 189 has a vNIC adapter in the virtual slot number 4, which is backed by two SR-IOV Virtual Functions (VF). Each SR-IOV VF is "Operational". If you have good eyes, you can find out and calculate, which backing device is primary and which is secondary.

But it is easier to go the AIX LPAR and issue 'entstat -d entX' command, where entX is your vNIC as it is defined on the AIX LPAR. The last 6 lines of the output will show you the active backing device for your vNIC:

No alt text provided for this image
entstat output with the information about active backing device

If you'd like to update your vio1 and temporarily switch vNIC to vio2, you have to go to HMC again and find another paramer - logical port ID.

The cryptical line we've seen above contains all needed informations. As in the example, we have two backing devices. The first one is "sriov/vio2/2/2/0/27008001/2.0/2.0/60/100.0/100.0".

  • sriov means it is a SR-IOV adapter.
  • vio2 means it is connected to vio2 VIOS LPAR.
  • first 2 is vio2 VIOS LPAR ID.
  • second 2 is SR-IOV adapter ID.
  • 0 is SR-IOV adapter's physical port ID.
  • 27008001 is the logical port ID we are looking for.
  • first 2.0 is the current capacity (2%) of the SR-IOV port we use.
  • second 2.0 is the desired capacity (2%) of the SR-IOV port.
  • 60 is failover priority. Backing device with lower number will be primary.
  • first 100 is the current maximum capacity (100%) of the SR-IOV port we use.
  • second 100 is the desired maximum capacity (100%) of the SR-IOV port.

As for us the most important is the logical port ID - 27008001. We want to free up our vio1 and switch the vNIC over to vio2. That's why we use the logical port ID of the vio2's backing device. With the following command we switch the vNIC in slot 4 on LPAR $p to the logical port 27008001.

chhwres
Manual failover of vNIC

After we switched vNIC over to vio2, we can see it on the AIX LPAR using entstat:

No alt text provided for this image
entstat output after vNIC failover

If we would try the same command on the HMC again, we'd get the following message:

No alt text provided for this image
Repeatable failover is impossible

The logical port is already active and there is nothing to do. But we still can fail back to the original VIOS when we finished with our update. We just need to select another logical port ID:

No alt text provided for this image
Returning back to the original backing device

Now it's time to automate it! I don't like the idea looking every time for port IDs in the output of HMC commands. I just want to switch all my vNICs to vio2, do my update and then switch them back.

Because all the actions with vNICs are done on HMC, we need to introduce one more Ansible collection to our VIOS update playbook - ibm.power_hmc. The collection has a module called hmc_command, which allows us to execute HMC commands directly from Ansible playbook. Same as with standard Ansible modules command and shell, it is not an idempotent module. It just executes commands on HMC. It is up to you to decide, if you want to execute them or not.

Our prologue in the playbook gets longer:

No alt text provided for this image
New collection ibm.power_hmc and new variables

We add new ibm.power_hmc collection to the list of collections (line 7) and we need to define several variables. Of course we can hard-code them in the commands in our tasks later, but because we will need them several times, it is better to define them in the prologue as variables. Because we will issue HMC commands, we must know, which HMC we will use (line 12), and with which credentials can we connect to it (lines 13-14). In line 11 we define the managed system where our VIOS resides.

First we must get all vNICs defined on the managed system.

No alt text provided for this image
Getting vNIC information from HMC

We execute hmc_command module from ibm.power_hmc collection (line 18) on our Ansible controller host (line 25). Of course our Ansible controller host must be able to connect to our HMC. In lines 19 to 22 we define the HMC, we want to connect to, and credentials, we use to connect to HMC.

The command itself is in line 23. Because it is lshwres (list hardware resources) command, we don't expect any failures and don't need any special handling of the command output. The whole command output we register to the variable vnics (line 24).

As the next step we define a variable where we will save the information about vNICs we've got from HMC.

No alt text provided for this image
Initializing our future vNIC information store

Why do we do it? We can have a lot of LPARs with vNICs and we need an array, to which we append information after we parsed it from the HMC output. If we don't have it, Ansible will throw an error, because it doesn't know where to append the information to.

Now we parse the output of the HMC's command lshwres and save it to the variable vnics_data:

No alt text provided for this image
Parsing lshwres output

We go through each line of lshwres output (line 32) and parse it (line 31, regex_findall). The information we've got from the output we append to vnics_data array (line 31, vnics_data + ), and save it back to vnics_data.

The regular expression in regex_findall suits to the output of lshwres if you have two backing devices. If you have more than two, all others will be ignored. If you have less than two... OK, you have another problem and may be you should reconsider your availability concept first.

After we parsed the output, we get an array for each line in lshwres output and it means for each LPAR with vNICs. We have the following data in the array:

  • VIO client LPAR name with vNIC
  • Slot number for vNIC
  • VIO server name for the first backing device
  • Logical port ID for the first backing device
  • VIO server name for the second backing device
  • Logical port ID for the second backing device

Now we are ready to fail over our vNICs.

No alt text provided for this image
Failover is hidden in another file

Oops. Why do we need another file? Why we can't pack everything in one file?

Ansible uses YAML (Yet Another Markup Language) to structure the playbooks. YAML is not a programming language and it is missing many useful constructs which are you usual in "normal" programming languages.

Ansible can loop over simple single tasks. But vNIC failover will have more than one task in it and here we have to fight with YAML limitations. Ansible can't loop over several tasks. That's why we move a block of our tasks to another file and include it here using include_tasks (line 34).

We also define several additional variables for the failover:

  • lpar is VIO client LPAR name from lshwres output
  • slot is vNIC slot from lshwres output
  • vio is our VIO server, which we want to update

Now let's take a look into vnic-fo.yml.

No alt text provided for this image
vNIC Failover in Ansible

The first 8 lines are the selection of the appropriate logical port from the vNIC data we've collected. The variable vio contains our VIOS we want to update. If our VIOS backs the second device, we need to switch to the first device (lines 1-4). If our VIOS backs the first device, we need to switch to the second device (lines 5-8). This is the whole logic in these 8 lines.

Of course it works only if you have two backing devices. If you have more or less, you have a homework to be done.

All other lines - from 9 to 22 - are one task to switch the vNIC to the logical port, we've found in lines 1-8.

We use hmc_command module from ibm.power_hmc collection again (line 10) to perform the failover.

Lines 11-14 are the information how to connect to the HMC.

Line 15 is the command we want to execute on the HMC.

We execute the command on our Ansible controller node (line 16), not on the target VIOS.

We save the result of the command execution in the variable fo_cmd (line 17). We need the result to understand if the command failed or not.

Usually a command fails if the return code of the command is not 0 (zero). But in our case it is not so. We have two cases when the command succeeds:

  • The command performs failover and the active port is switched
  • The command doesn't perform failover because the port is already active

In the first case we get return code 0 and everything is OK. In the second case we get return code 1 and the error message HSCLAB3F, we've seen above. We need to catch both cases as legit (OK) situations and everything else is a failure.

We define it in lines 18-20. If the HMC command returns 0 (everything is OK), it does not have the field "msg" in the result. If it has the field msg, it means that the return code was not 0.

If the return code was not 0, we check for the message in "msg". If it contains HSCLAB3F, it means that the vNIC is already switched to the correct logical port and we don't have to do anything. If it doesn't have HSCLAB3F, then we have some problem and the command failed.

That's why our failed_when is:

  • if the field msg is defined in the variable fo_cmd
  • AND if the field msg does not contain HSCLAB3F.

Lines 21-22 are just some cosmetics to make the output of the playbook a little bit smoother. If we have field msg in the variable fo_cmd, then vNIC was not switched over and nothing was changed on the configuration of the system. Only if we don't have the field, the port was really changed.

Now let's try to execute our big playbook.

No alt text provided for this image
The first half of the playbook's output

In the first half of the output we've got information about vNICs from the HMC and parsed it. We included our failover tasks as many time as we've found vNICs.

No alt text provided for this image
The second half of the playbook's output

In the second half we switched over our VNICs to another VIOS, but they were already there and nothing happened. As you can see even if the HMC command is executed, the task "Perform switch over to the next VIO" is marked as OK, because:

  • configuration is not changed
  • we defined it in lines 21-22 of vnic-fo.yml when the configuration changes.

No alt text provided for this image
Switching vNICs back

If we switch our vNICs back, we see that the tasks are marked as "changed".

I think it is enough about vNICs failover and you can now automate it in your environment.

Stay tuned! We still have some more topics to discuss in regards of VIOS updates and automation.


Have fun with PowerVM!

Andrey

要查看或添加评论,请登录

???????Andrey Klyachkin的更多文章

社区洞察