zCX Foundation for OpenShift -- automation using Ansible
Setting up a provisioning workflow

zCX Foundation for OpenShift -- automation using Ansible

z/OS 2.4 introduced z/OS Container Extensions (zCX), a subsystem that enables running containerised workloads under z/OS. Customers have been using zCX to bring new types of services to z/OS, where they can be incorporated into z/OS management practices and benefit from running under the umbrella of z/OS.

Some customers have been asking for more, though -- they want to have Kubernetes management of these containerised workloads running in z/OS. So, last month (March 2022) IBM released zCX Foundation for Red Hat OpenShift. This new deliverable provides the ability to run the Red Hat OpenShift Container Platform inside z/OS Container Extensions! The offering provides a set of z/OS Management Facility (z/OSMF) workflows that simplify the job of provisioning the zCX instances for an OpenShift cluster.

I had an opportunity to work with the zCX Foundation for OpenShift workflows, and they certainly make a complex task very easy. While I've been using z/OS systems for many years, I'm somewhat new to Workflows in z/OSMF... but that lack of experience didn't hold me back at all! There are a number of different workflow definitions provided, for the different actions required during the lifecycle of zCX instances running OpenShift: provisioning, making changes, and decommissioning.

Provisioning

To provision an OCP cluster on z/OS you will need to run a number of provisioning workflows, you need to run one for each node of the OpenShift cluster to be provisioned. The workflows are very straightforward and prompt for all the details required -- best of all, after it's run a workflow saves its output in the form of a "properties file", which you can duplicate and edit to provide the settings for the remaining workflows needed.

As stated in the documentation, there are some setup tasks required in z/OS before the provisioning workflows can be run. Some setup is needed in z/OS Communications Server (provisioning of IP addresses for the zCX instances) and also some storage management configuration. As you might expect of z/OS product documentation, it is very thorough in explaining what is needed. There are also external dependencies that are required for any OpenShift installation, such as DNS and load-balancer. I have pre-configured Linux virtual machines running under z/VM that provide these functions for other OpenShift clusters I build, so I cloned one of these virtual machines on my z/VM system to support the cluster I built on z/OS.

Automation using Ansible

My pre-prepared Linux system has a number of scripts and Ansible automation flows that completely automate the deployment of an OpenShift cluster on a z/VM system. After I had built an OpenShift cluster on z/OS manually, I decided to try my automation against z/OS! I needed to make very few changes to the automation to have it successfully automate building an OpenShift cluster on z/OS.

For those unfamiliar with Ansible, one of its strong capabilities is templating. You can deploy a configuration file to a large number of servers using a template; by placing the right keywords in the template, any server-specific values (such as host names or IP addresses) are automatically populated into the copy of the file deployed to each server. To modify my automation scripting for z/OS, I started with my existing template for the creation of the OpenShift install-config.yaml file. This file contains the definition of the OpenShift cluster and is read by the installation program to create a set of Ignition files that define the cluster nodes. I found that the template for install-config.yaml needed no changes for deploying a cluster to z/OS instead of z/VM. So I was off to a good start!

Once Ignition and boot files are prepared, the next thing to do is create the node instances and boot them to perform the installation. On z/VM I use the System Management APIs (SMAPI) to define virtual machines, add disk space, and boot. On z/OS this is replaced by the z/OSMF workflows provided by the zCX Foundation for OpenShift. At first I wasn't sure how I would invoke a workflow from Ansible, but I found a module called zmf-workflow which is part of the Red Hat Ansible Certified Content for IBM Z. I used Ansible Galaxy to install the collection that contains the required modules to my Linux system:

$ ansible-galaxy?collection install ibm.ibm_zosmf
Process install dependency map
Starting collection install process
Installing 'ibm.ibm_zosmf:1.1.0' to '/home/support/.ansible/collections/ansible_collections/ibm/ibm_zosmf'        

The next piece of the puzzle was how to create the workflow. I didn't want to create the workflow in z/OSMF in advance, and then run it via Ansible -- I wanted to create the workflow as part of the Ansible playbook, using settings in the Ansible inventory (the same settings used in other parts of the playbook). To do this, I obtained one of the workflow definition files created from one of my manual workflow runs, and turned it into a template by adding the required Jinja2 syntax. Here's a small snippet:

# Specify a unique instance name
ZCX_INSTNAME : {{ cluster_nodes[coreos_role][item].guest_name }}

# Specify the type of OpenShift node
ZCX_OPENSHIFT_NODE : {{ cluster_nodes[coreos_role][item].node_type }}

# Provide the URL for the Ignition file
ZCX_RHCOS_IGNITION_URL : https://{{ isn_public_ip_address }}:8080/ignition/{{ cluster_nodes[coreos_role][item].ign_profile }}        

The {{ }} brackets indicate variable substitution fields, and are replaced by the value of the given variable at that point in the execution of the playbook. The isn_public_ip_address variable is a simple defined IP address, but the others are defined as part of a dictionary -- a YAML data structure that allows more complex sets of data to be stored. To help illustrate, here is a snippet from the section of the inventory that defines the cluster_nodes dictionary:

guest_pfx: "OCP1"
cluster_nodes:
? bootstrap:
? ? bootstrap-0:
? ? ? guest_name: "{{ guest_pfx }}BOOT"
? ? ? ip: "10.40.23.200"
? ? ? disk: "dasda"
? ? ? node_type: bootstrap
? ? ? ign_profile: bootstrap.ign
? control:
? ? control-0:
? ? ? guest_name: "{{ guest_pfx }}CTL0"
? ? ? ip: "10.40.23.201"
? ? ? disk: "dasda"
? ? ? node_type: "control-node"
? ? ? ign_profile: master.ign
    control-1:
      . . .
    control-2:
      . . .
  compute:
    compute-0:
? ? ? guest_name: "{{ guest_pfx }}CMP0"
? ? ? ip: "10.40.23.204"
? ? ? disk: "dasda"
? ? ? node_type: "compute-node"
? ? ? ign_profile: worker.ign
    compute-1:
      . . .        

You can think of dictionaries as being a bit like arrays, in that they can refer to a number of individual data items and they can be looked up by-value. The cluster_nodes dictionary contains three items, bootstrap, control, and compute, each of which happens to be a dictionary in its own right. The bootstrap dictionary has one item only, while control has three items and compute has two. Each of those items is again a dictionary, containing the specific values for a given node in the OpenShift cluster being built (in the snippet above I've shown only the first example of each type of node).

Finally, to round out the process, here are the section of the playbook task definition that uses the dictionary defined above:

. . .

- name: boot the bootstrap node
? include_tasks: zcx-workflow-create-instance.yml
? vars:
? ? coreos_role: bootstrap
? with_items: "{{ cluster_nodes[coreos_role] }}"

- name: boot the control nodes
? include_tasks: zcx-workflow-create-instance.yml
? vars:
? ? coreos_role: "control"
? with_items: "{{ cluster_nodes[coreos_role] }}"

- name: boot the compute nodes
? include_tasks: zcx-workflow-create-instance.yml
? vars:
? ? coreos_role: "compute"
? with_items: "{{ cluster_nodes[coreos_role] }}"
? when: cluster_nodes.compute is defined

. . .        

...and a portion of the task file that does the work:

# zcx-workflow-create-instance.yml
#

- name: template the workflow definition file
? template:
? ? src: workflow_variables.openshift.properties.j2
? ? dest: "{{ workdir }}/{{ cluster_nodes[coreos_role][item].guest_name }}-user.properties"
? ? mode: 0644

. . .

- name: Authenticate with z/OSMF server
? ibm.ibm_zosmf.zmf_authenticate:
? ? zmf_host: "sysa.zosnet.example.com"
? ? zmf_port: "2443"
? ? zmf_user: "{{ zmf_user }}"
? ? zmf_password: "{{ zmf_password }}"
? register: result_auth

- name: start a zCX workflow to create an instance
? ibm.ibm_zosmf.zmf_workflow:
? ? state: "started"
? ? zmf_credential: "{{ result_auth }}"
? ? workflow_owner: "{{ zmf_user }}"
? ? workflow_name: "ansible_workflow_{{ cluster_nodes[coreos_role][item].guest_name }}_create"
? ? workflow_file: "/usr/lpp/zcx_ocp/workflows/ocp_provision.xml"
? ? workflow_file_system: "SYSA"
? ? workflow_vars_file: "/global/zcx_zos/ocp_properties/{{ cluster_nodes[coreos_role][item].guest_name }}-user.properties"
? ? workflow_host: "SYSA"        

(Watch out for line wrap in these snippets.)

How Ansible processes the playbook

When the playbook runs, it does the tasks that create the cluster Ignition files and other setup work before reaching the task named "boot the bootstrap node". This task instructs Ansible to look for the tasks defined in the file named zcx-workflow-create-instance.yml and execute those tasks with two parameters set:

  1. The parameter vars sets the variable coreos_role to "bootstrap"
  2. The with_items parameter says that the tasks in the included file will be repeated once for each value defined in the dictionary cluster_nodes[coreos_role], with a variable item set to the value of that item. This time, since coreos_role is set to bootstrap, it means the tasks are run for each entry in the dictionary cluster_nodes["bootstrap"]. Since there is only one item in that dictionary, "bootstrap-0", the tasks are run once and item contains the dictionary "bootstrap-0".

When Ansible expands zcx-workflow-create-instance.yml, it first encounters the template module which tells it to create a copy of the workflow properties template file in a new file called OCP1BOOT-user.properties in a preset working directory. Take a moment to work out how the filename was generated (I'll wait!). If it's not clear, maybe the hint is that in this first iteration, the item variable is the dictionary bootstrap-0... Got it? Looking in the inventory, the value of guest_name in the bootstrap-0 directory is "{{ guest_pfx }}BOOT" which tells Ansible to take the value of guest_pfx -- hard-coded in this example to OCP1 -- and append BOOT. So our zCX instance will be called OCP1BOOT! This kind of variable substitution is done for all of the Jinja2 variable texts in the template file, and the output is stored in a temporary file. The file is transferred to z/OS (I did not include that task in the snippet), ready for the next tasks.

The last two tasks are where the magic happens... we log on to z/OSMF using the zmf_authenticate module. The variables zmf_user and zmf_password were prompted from the user running the playbook at the very beginning of the playbook execution, and are not coded in the inventory. Ansible has a facility called Ansible Vault, however, which can provide a way for credentials to be stored alongside the inventory in an encrypted form. The zmf_authenticate module returns an authentication token which we "register" in an Ansible variable. Next, the zmf_workflow module creates the workflow using the workflow definition file (hard-coded here, but could be parameterised) and the workflow properties file (workflow_vars_file) we created and transferred earlier. The workflow is defined so that the zCX instance is automatically booted when it is provisioned.

With that, the first part of the cluster build is complete -- at this point the zCX workflow will be executing all the required z/OS steps needed to create the zCX instance for the OCP bootstrap node. There is no direct feedback to the Ansible playbook in this process, but you can expect the workflow to run successfully as long as all the z/OS dependencies have been met (including having enough disk space available).

Wash, rinse, repeat...

The Ansible playbook then proceeds to the next task in the main task file, "boot the control nodes". The process is exactly the same as just described for the bootstrap node, except that this time there are three dictionaries under cluster_nodes["control"]. So the with_items parameter will run the included tasks three times: once for control-0, once for control-1, and again for control-2.

Finally, the main playbook reaches "boot the compute nodes". This time there is an extra parameter that allows this part of the playbook to be skipped if there are no compute nodes defined in the inventory (that is, if you want a minimal three-node cluster). This way, you can use the same playbook for deploying any supported configuration of nodes just by changing the inventory appropriately.

Once all the nodes are started, the playbook pauses to ensure the nodes become accessible via SSH. This is indirect feedback of the success of the provisioning process -- if the SSH test times out for any of the nodes it means there was a problem with the provisioning.

Finishing the cluster build

Once the nodes pass the SSH test, the playbook runs the rest of the tasks needed to build the cluster. The command "openshift-install" is used to check the progress of the build, and once the successful build of the cluster is confirmed the playbook runs some final configuration actions (replacing the default ingress certificate, configuring LDAP authentication, and adding user IDs with "cluster-admin" authority). The cluster is now ready for "Day 2" configuration of persistent storage and installation of applications.

More to do

The surprising thing I found about this exercise is how little I had to change between the z/VM and z/OS versions of the automation. It came down to the one task file (using the z/OSMF workflow module instead of z/VM SMAPI) and a couple of extra variables defined in the inventory. All the rest of the playbook worked the same!

Having said that, there are some areas that I'll continue to develop. I'd like to see if some of the prerequisite tasks on z/OS could be automated as well, using other z/OSMF interfaces or perhaps even other subsystems such as z/OS Connect. Also, I'd like to pursue having the process detect an issue (such as the failure to provision only one of the nodes) and work out a way to resole that without having an operator scratch everything and start again. Finally, I'd like to incorporate building OpenShift clusters on z/OS into my existing user interface for deploying on z/VM.

Your turn!

This was not meant to be a z/OSMF Workflow tutorial, nor an Ansible tutorial, but I hope there is enough material here to show how using Ansible can streamline activities involving z/OSMF workflows and make the build of an OpenShift Container Platform cluster even easier. I look forward to hearing how you use these new z/OS technologies to solve business issues in your organisation!

Links:

IBM zCX Foundation for Red Hat OpenShift 1.1 Resources: US Announcement Letter, Product page, Documentation

Red Hat OpenShift Product page

Red Hat Ansible Certified Content for IBM Z: https://www.ibm.com/support/z-content-solutions/ansible/


Rajeev Bajania

z/OS(Mainframe) Specialist

2 年

Vic on fire. I have to start paying attention to zCX.

要查看或添加评论,请登录

Vic Cross的更多文章

  • Making changes to Git repository access in CodeReady Workspaces

    Making changes to Git repository access in CodeReady Workspaces

    In this article, I describe how to tailor a Red Hat CodeReady Workspaces installation to work in an environment where…

    3 条评论
  • Everything old becomes... new?

    Everything old becomes... new?

    My kids and I often play a game while travelling in the car: we concoct an amusing sentence fragment from the three…

    4 条评论
  • Sitting an exam is tough... tried writing one?

    Sitting an exam is tough... tried writing one?

    Earlier today I posted a badge that I received from IBM, one of two that I received recently. I don't usually post…

    4 条评论

社区洞察

其他会员也浏览了