Configure HA Cluster on top of AWS using terraform and ansible
Kritik Sachdeva
Technical Support Professional at IBM | RHCA-XII | Openshift | Ceph |Satellite| 3Scale | Gluster | Ansible | Red Hatter
In this post, I will cover only the setting up of the HA cluster, not the resource creation part. So let's start with the definition. The acronym HA stands for High Availability Cluster which is an infrastructure level tool designed to create a highly available service like NFS, Apache HTTPd, or Openshift Master Node or Kubernetes Master node, etc
This tool is usually used on physical infrastructure, not on top of AWS, but here for demonstration, I am using AWS to create resources.
High Availability is the concept, the actual technology that is been used behind it is pacemaker and corosync. A pacemaker is the high available cluster management program that is responsible for managing the resources we want to be highly available. A corosync is the program that maintains cluster membership and messaging capabilities.
Let's take an example to explain how the high availability works through pacemaker and corosyn:
On node-1, one application is running i.e Apache webserver which is created as a resource by the pacemaker. And if due to any reason that node-1 fails or gets corrupted, the corosync will send a message that the node is failed and move over the resources located on that failed node to the different node.
But there is one pre-requisite for this to happen smoothly, and that is whatever service we want to make highly available their software, firewall, and other related elements must be set up on all the cluster nodes and configuration must be synchronized. For example in this above example of apache webserver the following elements should be set up on all the nodes:
- An HTTPd software package must be installed
- A firewall must be allowed to connect to HTTPd
- And all the nodes must be accessible to web content through a share centralized or distributed storage
Set up on AWS:
For the demonstration, the setup I am going to use contains a 4 node HA cluster installed on the ec2 instance in 2 different subnets to ensure the high availability of the cluster nodes. And for configuring the cluster setup I have used ansible.
Note: The resource availability on was might be different for different regions, like in Mumbai region we have 3 az but we can't launch one type of ec2 instance in the 3rd az.
Terraform steps to create the set up on AWS:
- Create my own public-private key using terraform
//This profile is configured for the Mumbai Region provider "aws" { region = "ap-south-1" profile = "kritik" } // Generating a private key resource "tls_private_key" "tls_key" { algorithm = "RSA" rsa_bits = 2048 } // Creating a private key file using the above content generation in the current location resource "local_file" "key_file" { content = tls_private_key.tls_key.private_key_pem filename = "ha-cluster.pem" file_permission = "0400" depends_on = [ tls_private_key.tls_key ] } // Creating the AWS Key pair using the private key resource "aws_key_pair" "ha_key" { key_name = "ha-key" public_key = tls_private_key.tls_key.public_key_openssh depends_on = [ tls_private_key.tls_key ] }
2. Create a vpc and 2 subnets as said about the setup
// Create an VPC resource "aws_vpc" "hacluster_vpc" { cidr_block = "192.168.0.0/16" instance_tenancy = "default" enable_dns_hostnames = true tags = { "Name" = "ha" } } // Create a subnets inside my vpc for each availability zone resource "aws_subnet" "hapublic-1" { depends_on = [ aws_vpc.hacluster_vpc ] vpc_id = aws_vpc.hacluster_vpc.id cidr_block = "192.168.0.0/24" availability_zone = "ap-south-1a" // In which zone does this subnet would be created map_public_ip_on_launch = true // Assign public ip to the instances launched into this subnet tags = { "Name" = "public_Subnet-1" } } resource "aws_subnet" "hapublic-2" { depends_on = [ aws_vpc.hacluster_vpc ] vpc_id = aws_vpc.hacluster_vpc.id cidr_block = "192.168.1.0/24" availability_zone = "ap-south-1b" // In which zone does this subnet would be created map_public_ip_on_launch = true // Assign public ip to the instances launched into this subnet tags = { "Name" = "public_Subnet-2" } }
3. Create an IGW so that the public world can connect to the instances and add its entry in the routing table
// Now create an IGW and attach it to all the subnets so that all gets accessible from public world resource "aws_internet_gateway" "gw" { depends_on = [ aws_vpc.hacluster_vpc, aws_subnet.hapublic-1, aws_subnet.hapublic-2] //, aws_subnet.hapublic-3] This should only run after the vpc and subnet vpc_id = aws_vpc.hacluster_vpc.id tags = { "Name" = "ha_gw" } }
Note:
#Each route must contain either a gateway_id, an instance_id, a nat_gateway_id, a vpc_peering_connection_id or a network_interface_id.
# Note that the default route, mapping the VPC’s CIDR block to “local”, is created implicitly and cannot be specified
resource "aws_route_table" "haroute" { depends_on = [ aws_internet_gateway.gw ] vpc_id = aws_vpc.hacluster_vpc.id route { # This rule is for going or connecting to the public world cidr_block = "0.0.0.0/0" // Represents the destination or where we wants to gp gateway_id = aws_internet_gateway.gw.id // This is the target from where we can go to the respective destination } tags = { "Name" = "public-rule" } }
4. Associate the routing table with the 2 subnets and set that routing table as the main entry
Note: When we create a vpc on AWS using terraform, it creates its own routing table as well, so when we create our own routing table, we get the two routing tables both having the same id. So make sure you make this as the main routing table for the vpc
resource "aws_route_table_association" "subnetAssociation1" { depends_on = [ aws_route_table.haroute ] subnet_id = aws_subnet.hapublic-1.id route_table_id = aws_route_table.haroute.id } resource "aws_route_table_association" "subnetAssociation2" { depends_on = [ aws_route_table.haroute ] subnet_id = aws_subnet.hapublic-2.id route_table_id = aws_route_table.haroute.id }
Change the main routing table for the current vpc:
# Main route table association resource "aws_main_route_table_association" "a" { vpc_id = aws_vpc.hacluster_vpc.id route_table_id = aws_route_table.haroute.id }
5. Create security group rules to allow firewall port for the ha cluster intercommunication
(port number: 22/TCP, 2224/TCP, 3121/TCP, 5403/TCP, 5404/UDP, 5405/UDP, 21064/TCP, 9929/TCP, 9929/UDP )
// Add the security group rules resource "aws_security_group" "allowed_rules" { depends_on = [ aws_vpc.hacluster_vpc ] name = "hacluster" description = "Security Group rules for the HAcluster" vpc_id = aws_vpc.hacluster_vpc.id # Ingrees rules for HA-Cluster ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing ssh connectivity" from_port = 22 protocol = "tcp" to_port = 22 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing pcsd connectivity" from_port = 2224 protocol = "tcp" to_port = 2224 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing crmd connectivity" from_port = 3121 protocol = "tcp" to_port = 3121 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing corosync-qnetd connectivity" from_port = 5403 protocol = "tcp" to_port = 5403 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing corosync multicast-udp connectivity" from_port = 5404 protocol = "udp" to_port = 5404 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing corosync connectivity" from_port = 5405 protocol = "udp" to_port = 5405 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing CLVM connectivity" from_port = 21064 protocol = "tcp" to_port = 21064 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing booth-ticket manager connectivity" from_port = 9929 protocol = "tcp" to_port = 9929 } ingress { cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin description = "Allowing booth-ticket manager connectivity" from_port = 9929 protocol = "udp" to_port = 9929 } egress { cidr_blocks = [ "0.0.0.0/0" ] from_port = 0 protocol = -1 to_port = 0 } tags = { "Name" = "HA-firewall-rules" } }
6. Create ec2 instances in two different subnets each containing 2 instance
# Now launch the 4 instances in the 2 different public subnets resource "aws_instance" "ha-nodes-1" { count = var.instances_per_subnet ami = var.aws_ami_id instance_type = var.instance_type key_name = "ha-key" subnet_id = aws_subnet.hapublic-1.id vpc_security_group_ids = [ aws_security_group.allowed_rules.id ] } resource "aws_instance" "ha-nodes-2" { count = var.instances_per_subnet ami = var.aws_ami_id instance_type = var.instance_type key_name = "ha-key" subnet_id = aws_subnet.hapublic-2.id vpc_security_group_ids = [ aws_security_group.allowed_rules.id ] }
(If your region supports multi ebs attachment then you can add this snippet)
# Create an EBS Volume resource "aws_ebs_volume" "terra-vol-1" { depends_on = [ aws_instance.ha-nodes-1 ] availability_zone = aws_instance.ha-nodes-1[0].availability_zone size = var.ebs_size multi_attach_enabled = true tags = { Name = "ebs-vol" } } # Now attach this volume to the ec2 instances resource "aws_volume_attachment" "ebs_att-1" { count = var.instances_per_subnet device_name = var.ebs_device_name volume_id = aws_ebs_volume.terra-vol-1.id instance_id = element(aws_instance.ha-nodes-1.*.id, count.index) force_detach = true depends_on = [ aws_instance.ha-nodes-1, aws_ebs_volume.terra-vol-1 ] } # Create an EBS Volume resource "aws_ebs_volume" "terra-vol-2" { depends_on = [ aws_instance.ha-nodes-2 ] availability_zone = aws_instance.ha-nodes-2[0].availability_zone size = var.ebs_size # multi_attach_enabled = true tags = { Name = "ebs-vol-2" } } # Now attach this volume to the ec2 instances resource "aws_volume_attachment" "ebs_att-2" { count = var.instances_per_subnet device_name = var.ebs_device_name volume_id = aws_ebs_volume.terra-vol-2.id instance_id = element(aws_instance.ha-nodes-2.*.id, count.index) force_detach = true depends_on = [ aws_instance.ha-nodes-2, aws_ebs_volume.terra-vol-2 ] }
7. Now we have to set up the instances so that ansible-playbook can run successfully as python interpreter should be installed on all the instances.
For this, I have used the remote-exec provisioner under the null_resource resource.
# Prepare the ha cluster nodes to be served through ansible resource "null_resource" "setupRemoteNodes-1" { count = var.instances_per_subnet depends_on = [ aws_instance.ha-nodes-1 ] # Ansible requires that the remote system has python already installed in it provisioner "remote-exec" { inline = ["sudo yum install python3 -y"] } connection { type = "ssh" host = element(aws_instance.ha-nodes-1.*.public_ip, count.index) private_key = file(var.private_key) user = var.ansible_user } } resource "null_resource" "setupRemoteNodes-2" { count = var.instances_per_subnet depends_on = [ aws_instance.ha-nodes-2 ] # Ansible requires that the remote system has python already installed in it provisioner "remote-exec" { inline = ["sudo yum install python3 -y"] } connection { type = "ssh" host = element(aws_instance.ha-nodes-2.*.public_ip, count.index) private_key = file(var.private_key) user = var.ansible_user } }
8. Finally for ansible-playbook to execute we need an inventory storing the login and IP address of the instances. So for this, I have used the Linux commands and run them locally as:
resource "null_resource" "setupAnsible" { depends_on = [ aws_instance.ha-nodes-2 ] provisioner "local-exec" { command = <<EOT sleep 20; >./playbooks/inventory.ini; echo "[hanodes_public]" | tee -a ../playbooks/inventory.ini; echo "${aws_instance.ha-nodes-1[0].public_dns} private_ip=${aws_instance.ha-nodes-1[0].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini; echo "${aws_instance.ha-nodes-1[1].public_dns} private_ip=${aws_instance.ha-nodes-1[1].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini; echo "${aws_instance.ha-nodes-2[0].public_dns} private_ip=${aws_instance.ha-nodes-2[0].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini; echo "${aws_instance.ha-nodes-2[1].public_dns} private_ip=${aws_instance.ha-nodes-2[1].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini; export ANSIBLE_HOST_KEY_CHECKING=False; cd ./playbooks; ansible-playbook -i inventory.ini ha-cluster.yaml --vault-password-file .passwd; EOT } }
The output of the inventory file looks like:
anodes_public] ec2-65-0-80-53.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-0-206.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem ec2-13-127-111-30.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-0-159.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem ec2-13-233-157-253.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-1-152.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem ec2-13-126-12-90.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-1-181.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem
Next comes the ansible file:
To configure I have created two files: credentials.yaml and ha-cluster.yaml
In the credentials.yaml file I have stored the login details as required by the ha cluster while setup. And ha-cluster.yaml file will set up, start, and enable the cluster.
In setting up the cluster, there are some elements that need to configured or executed on all the nodes while the other only on the single node.
The following snippet is required to run on all the nodes as:
# In this, we are installing software packages, enable High availability repo, starting the cluster service, and setting the password for the user
( HA cluster by default uses the hacluster user and we only need to set up the password for it)
# In this, we are authenticating the cluster with the specified user and password
Note: Here I have manually put an entry for all the hosts, we can also make use of the jinja template with the help of a script file
- name: Gather cluster status command: "pcs cluster status" register: cluster_status ignore_errors: true
Use this just to add the idempotency in setting up the cluster
# In this, we setting up the cluster with the name cluster0 and only running it on one of the cluster nodes and printing its output using the debug module.
Finally to check if the cluster is set up or not:
The output would be:
And tomorrow I will post my pending post of iscsi integration with Openshift. And if you have any doubt regarding the topic, feel free to drop me a message.
Thank you.