登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Preparing VIOS Update. VNIC.

??????Andrey Klyachkin

#1 in Automating IBM Power and IBM AIX Infrastructure | POWER DevOps | IT Infrastructure Automation | Speaker | Trainer | Author | IBM Champion | IBM AIX Community Advocate

发布日期: 2023年3月16日

Last time I wrote about switching over a Shared Ethernet Adapter with Ansible. Today is the time for vNIC. VNIC is a "virtual" part of SR-IOV adapters which enables LPM and failover for you.

If you configured vNICs, you'll find vnicserver devices on your VIOS side:

No alt text provided for this image — lsmap output from VIOS

But there is nothing that you could change there. It is not as in SEA-case, where you can set an attribute, and the failover happens. Or you enabled largesend. Or did something else.

In vNIC-case your first tool is HMC. You create vNICs from HMC, you change them from HMC and you switch them from HMC.

If you look at HMC command line, you will see the cryptical line like below on the screenshot.

An AIX LPAR with unknown (greyed out) name and ID 189 has a vNIC adapter in the virtual slot number 4, which is backed by two SR-IOV Virtual Functions (VF). Each SR-IOV VF is "Operational". If you have good eyes, you can find out and calculate, which backing device is primary and which is secondary.

But it is easier to go the AIX LPAR and issue 'entstat -d entX' command, where entX is your vNIC as it is defined on the AIX LPAR. The last 6 lines of the output will show you the active backing device for your vNIC:

If you'd like to update your vio1 and temporarily switch vNIC to vio2, you have to go to HMC again and find another paramer - logical port ID.

The cryptical line we've seen above contains all needed informations. As in the example, we have two backing devices. The first one is "sriov/vio2/2/2/0/27008001/2.0/2.0/60/100.0/100.0".

sriov means it is a SR-IOV adapter.
vio2 means it is connected to vio2 VIOS LPAR.
first 2 is vio2 VIOS LPAR ID.
second 2 is SR-IOV adapter ID.
0 is SR-IOV adapter's physical port ID.
27008001 is the logical port ID we are looking for.
first 2.0 is the current capacity (2%) of the SR-IOV port we use.
second 2.0 is the desired capacity (2%) of the SR-IOV port.
60 is failover priority. Backing device with lower number will be primary.
first 100 is the current maximum capacity (100%) of the SR-IOV port we use.
second 100 is the desired maximum capacity (100%) of the SR-IOV port.

As for us the most important is the logical port ID - 27008001. We want to free up our vio1 and switch the vNIC over to vio2. That's why we use the logical port ID of the vio2's backing device. With the following command we switch the vNIC in slot 4 on LPAR $p to the logical port 27008001.

After we switched vNIC over to vio2, we can see it on the AIX LPAR using entstat:

If we would try the same command on the HMC again, we'd get the following message:

The logical port is already active and there is nothing to do. But we still can fail back to the original VIOS when we finished with our update. We just need to select another logical port ID:

Now it's time to automate it! I don't like the idea looking every time for port IDs in the output of HMC commands. I just want to switch all my vNICs to vio2, do my update and then switch them back.

Because all the actions with vNICs are done on HMC, we need to introduce one more Ansible collection to our VIOS update playbook - ibm.power_hmc. The collection has a module called hmc_command, which allows us to execute HMC commands directly from Ansible playbook. Same as with standard Ansible modules command and shell, it is not an idempotent module. It just executes commands on HMC. It is up to you to decide, if you want to execute them or not.

Our prologue in the playbook gets longer:

We add new ibm.power_hmc collection to the list of collections (line 7) and we need to define several variables. Of course we can hard-code them in the commands in our tasks later, but because we will need them several times, it is better to define them in the prologue as variables. Because we will issue HMC commands, we must know, which HMC we will use (line 12), and with which credentials can we connect to it (lines 13-14). In line 11 we define the managed system where our VIOS resides.

First we must get all vNICs defined on the managed system.

We execute hmc_command module from ibm.power_hmc collection (line 18) on our Ansible controller host (line 25). Of course our Ansible controller host must be able to connect to our HMC. In lines 19 to 22 we define the HMC, we want to connect to, and credentials, we use to connect to HMC.

The command itself is in line 23. Because it is lshwres (list hardware resources) command, we don't expect any failures and don't need any special handling of the command output. The whole command output we register to the variable vnics (line 24).

As the next step we define a variable where we will save the information about vNICs we've got from HMC.

Why do we do it? We can have a lot of LPARs with vNICs and we need an array, to which we append information after we parsed it from the HMC output. If we don't have it, Ansible will throw an error, because it doesn't know where to append the information to.

Now we parse the output of the HMC's command lshwres and save it to the variable vnics_data:

We go through each line of lshwres output (line 32) and parse it (line 31, regex_findall). The information we've got from the output we append to vnics_data array (line 31, vnics_data + ), and save it back to vnics_data.

The regular expression in regex_findall suits to the output of lshwres if you have two backing devices. If you have more than two, all others will be ignored. If you have less than two... OK, you have another problem and may be you should reconsider your availability concept first.

After we parsed the output, we get an array for each line in lshwres output and it means for each LPAR with vNICs. We have the following data in the array:

VIO client LPAR name with vNIC
Slot number for vNIC
VIO server name for the first backing device
Logical port ID for the first backing device
VIO server name for the second backing device
Logical port ID for the second backing device

Now we are ready to fail over our vNICs.

Oops. Why do we need another file? Why we can't pack everything in one file?

Ansible uses YAML (Yet Another Markup Language) to structure the playbooks. YAML is not a programming language and it is missing many useful constructs which are you usual in "normal" programming languages.

Ansible can loop over simple single tasks. But vNIC failover will have more than one task in it and here we have to fight with YAML limitations. Ansible can't loop over several tasks. That's why we move a block of our tasks to another file and include it here using include_tasks (line 34).

We also define several additional variables for the failover:

lpar is VIO client LPAR name from lshwres output
slot is vNIC slot from lshwres output
vio is our VIO server, which we want to update

Now let's take a look into vnic-fo.yml.

The first 8 lines are the selection of the appropriate logical port from the vNIC data we've collected. The variable vio contains our VIOS we want to update. If our VIOS backs the second device, we need to switch to the first device (lines 1-4). If our VIOS backs the first device, we need to switch to the second device (lines 5-8). This is the whole logic in these 8 lines.

Of course it works only if you have two backing devices. If you have more or less, you have a homework to be done.

All other lines - from 9 to 22 - are one task to switch the vNIC to the logical port, we've found in lines 1-8.

We use hmc_command module from ibm.power_hmc collection again (line 10) to perform the failover.

Lines 11-14 are the information how to connect to the HMC.

Line 15 is the command we want to execute on the HMC.

We execute the command on our Ansible controller node (line 16), not on the target VIOS.

We save the result of the command execution in the variable fo_cmd (line 17). We need the result to understand if the command failed or not.

Usually a command fails if the return code of the command is not 0 (zero). But in our case it is not so. We have two cases when the command succeeds:

The command performs failover and the active port is switched
The command doesn't perform failover because the port is already active

In the first case we get return code 0 and everything is OK. In the second case we get return code 1 and the error message HSCLAB3F, we've seen above. We need to catch both cases as legit (OK) situations and everything else is a failure.

We define it in lines 18-20. If the HMC command returns 0 (everything is OK), it does not have the field "msg" in the result. If it has the field msg, it means that the return code was not 0.

If the return code was not 0, we check for the message in "msg". If it contains HSCLAB3F, it means that the vNIC is already switched to the correct logical port and we don't have to do anything. If it doesn't have HSCLAB3F, then we have some problem and the command failed.

That's why our failed_when is:

if the field msg is defined in the variable fo_cmd
AND if the field msg does not contain HSCLAB3F.

Lines 21-22 are just some cosmetics to make the output of the playbook a little bit smoother. If we have field msg in the variable fo_cmd, then vNIC was not switched over and nothing was changed on the configuration of the system. Only if we don't have the field, the port was really changed.

Now let's try to execute our big playbook.

In the first half of the output we've got information about vNICs from the HMC and parsed it. We included our failover tasks as many time as we've found vNICs.

In the second half we switched over our VNICs to another VIOS, but they were already there and nothing happened. As you can see even if the HMC command is executed, the task "Perform switch over to the next VIO" is marked as OK, because:

configuration is not changed
we defined it in lines 21-22 of vnic-fo.yml when the configuration changes.

If we switch our vNICs back, we see that the tasks are marked as "changed".

I think it is enough about vNICs failover and you can now automate it in your environment.

Stay tuned! We still have some more topics to discuss in regards of VIOS updates and automation.

Have fun with PowerVM!

Andrey

要查看或添加评论，请登录

???????Andrey Klyachkin的更多文章

Installing certbot on IBM AIX

2024年5月14日

Installing certbot on IBM AIX

The question was asked on IBM TechXchange Community. I posted my answer there and decided to duplicate it in the…

3 条评论
My 2023 overview

2023年12月22日

My 2023 overview

We started the year with the FUD. The Register, I usually like to read, published an article about moving AIX…

8 条评论
Changing environment variables in Ansible Automation Platform

2023年10月2日

Changing environment variables in Ansible Automation Platform

Everything was good with the automation. It worked and delivered what it should.
A small guide to powervm_inventory (Using HMC as Ansible inventory)

2023年9月20日

A small guide to powervm_inventory (Using HMC as Ansible inventory)

Several years ago I wanted to get list of my LPARs from HMC as inventory to some of my playbooks. There was no…

3 条评论
The way to Red Hat Certified Engineer

2023年6月2日

The way to Red Hat Certified Engineer

As usually I’ve got a question and promised to answer it. The answer to the question can be as small as one sentence…
Automating IBM PowerHA cluster deployment on AIX with Ansible

2023年5月23日

Automating IBM PowerHA cluster deployment on AIX with Ansible

We spoke about PowerHA on AIX. The first question is always the same – can you deploy a PowerHA cluster? But this time…

16 条评论
Creating shared volumes in PowerVC using Ansible OpenStack modules

2023年5月8日

Creating shared volumes in PowerVC using Ansible OpenStack modules

IBM PowerVC is cool software. Own private cloud based on OpenStack, but it works with my favourite hardware - IBM Power.

2 条评论
User authentication on IBM AIX using Red Hat Identity Management

2023年5月3日

User authentication on IBM AIX using Red Hat Identity Management

If you have more than 5 AIX servers, I think you already played with the thoughts to centralize your users somewhere…

9 条评论
Preparing VIOS Update. VSCSI

2023年4月11日

Preparing VIOS Update. VSCSI

In the last articles we automated network switch-over using #Ansible: Shared Ethernet Adapter SR-IOV with VNICs Network…
Preparing VIOS Update. NIB

2023年3月28日

Preparing VIOS Update. NIB

Network Interface Backup at VIO client LPARs is the most flexible, but at the same time it is the most difficult…

2 条评论

See all articles

???????Andrey Klyachkin的更多文章

Installing certbot on IBM AIX

My 2023 overview

Changing environment variables in Ansible Automation Platform

A small guide to powervm_inventory (Using HMC as Ansible inventory)

The way to Red Hat Certified Engineer

Automating IBM PowerHA cluster deployment on AIX with Ansible

Creating shared volumes in PowerVC using Ansible OpenStack modules

User authentication on IBM AIX using Red Hat Identity Management

Preparing VIOS Update. VSCSI

Preparing VIOS Update. NIB

社区洞察