Configure HA Cluster on top of AWS using terraform and ansible
#redhat #aws #terraform #anisble #automation #highavailability #cluster #ec2

Configure HA Cluster on top of AWS using terraform and ansible

In this post, I will cover only the setting up of the HA cluster, not the resource creation part. So let's start with the definition. The acronym HA stands for High Availability Cluster which is an infrastructure level tool designed to create a highly available service like NFS, Apache HTTPd, or Openshift Master Node or Kubernetes Master node, etc

This tool is usually used on physical infrastructure, not on top of AWS, but here for demonstration, I am using AWS to create resources.

High Availability is the concept, the actual technology that is been used behind it is pacemaker and corosync. A pacemaker is the high available cluster management program that is responsible for managing the resources we want to be highly available. A corosync is the program that maintains cluster membership and messaging capabilities.

Let's take an example to explain how the high availability works through pacemaker and corosyn:

No alt text provided for this image

On node-1, one application is running i.e Apache webserver which is created as a resource by the pacemaker. And if due to any reason that node-1 fails or gets corrupted, the corosync will send a message that the node is failed and move over the resources located on that failed node to the different node.

But there is one pre-requisite for this to happen smoothly, and that is whatever service we want to make highly available their software, firewall, and other related elements must be set up on all the cluster nodes and configuration must be synchronized. For example in this above example of apache webserver the following elements should be set up on all the nodes:

  1. An HTTPd software package must be installed
  2. A firewall must be allowed to connect to HTTPd
  3. And all the nodes must be accessible to web content through a share centralized or distributed storage

Set up on AWS:

For the demonstration, the setup I am going to use contains a 4 node HA cluster installed on the ec2 instance in 2 different subnets to ensure the high availability of the cluster nodes. And for configuring the cluster setup I have used ansible.

Note: The resource availability on was might be different for different regions, like in Mumbai region we have 3 az but we can't launch one type of ec2 instance in the 3rd az.

Terraform steps to create the set up on AWS:

  1. Create my own public-private key using terraform
//This profile is configured for the Mumbai Region
provider "aws" {
  region = "ap-south-1"
  profile = "kritik"  
}


// Generating a private key
resource "tls_private_key" "tls_key" {
  algorithm = "RSA"
  rsa_bits = 2048
}


// Creating a private key file using the above content generation in the current location

resource "local_file" "key_file" {
  content = tls_private_key.tls_key.private_key_pem
  filename = "ha-cluster.pem"
  file_permission = "0400"


  depends_on = [ tls_private_key.tls_key ]
}


// Creating the AWS Key pair using the private key

resource "aws_key_pair" "ha_key" {
  key_name = "ha-key"
  public_key = tls_private_key.tls_key.public_key_openssh


  depends_on = [ tls_private_key.tls_key ]
}

2. Create a vpc and 2 subnets as said about the setup

// Create an VPC

resource "aws_vpc" "hacluster_vpc" {
  cidr_block = "192.168.0.0/16"
  instance_tenancy = "default"
  enable_dns_hostnames = true
  tags = {
    "Name" = "ha"
  }
}


// Create a subnets inside my vpc for each availability zone

resource "aws_subnet" "hapublic-1" {
  depends_on = [ aws_vpc.hacluster_vpc ]


  vpc_id = aws_vpc.hacluster_vpc.id
  cidr_block = "192.168.0.0/24"
  availability_zone = "ap-south-1a" // In which zone does this subnet would be created
  map_public_ip_on_launch = true // Assign public ip to the instances launched into this subnet


  tags = {
    "Name" = "public_Subnet-1"
  }
}


resource "aws_subnet" "hapublic-2" {
  depends_on = [ aws_vpc.hacluster_vpc ]

  vpc_id = aws_vpc.hacluster_vpc.id
  cidr_block = "192.168.1.0/24"
  availability_zone = "ap-south-1b" // In which zone does this subnet would be created
  map_public_ip_on_launch = true // Assign public ip to the instances launched into this subnet


  tags = {
    "Name" = "public_Subnet-2"
  }
}

3. Create an IGW so that the public world can connect to the instances and add its entry in the routing table

// Now create an IGW and attach it to all the subnets so that all gets accessible from public world


resource "aws_internet_gateway" "gw" {
  depends_on = [ aws_vpc.hacluster_vpc, aws_subnet.hapublic-1, aws_subnet.hapublic-2] //, aws_subnet.hapublic-3]  This should only run after the vpc and subnet
  vpc_id = aws_vpc.hacluster_vpc.id
  tags = {
    "Name" = "ha_gw"
  }
}

Note:

#Each route must contain either a gateway_id, an instance_id, a nat_gateway_id, a vpc_peering_connection_id or a network_interface_id.

# Note that the default route, mapping the VPC’s CIDR block to “local”, is created implicitly and cannot be specified

resource "aws_route_table" "haroute" {
  depends_on = [ aws_internet_gateway.gw ]
  vpc_id = aws_vpc.hacluster_vpc.id
  route {
    # This rule is for going or connecting to the public world
    cidr_block = "0.0.0.0/0" // Represents the destination or where we wants to gp
    gateway_id = aws_internet_gateway.gw.id // This is the target from where we can go to the respective destination
  }


  tags = {
   "Name" = "public-rule"
  }
}

4. Associate the routing table with the 2 subnets and set that routing table as the main entry

Note: When we create a vpc on AWS using terraform, it creates its own routing table as well, so when we create our own routing table, we get the two routing tables both having the same id. So make sure you make this as the main routing table for the vpc

resource "aws_route_table_association" "subnetAssociation1" {
  depends_on = [ aws_route_table.haroute ]


  subnet_id = aws_subnet.hapublic-1.id
  route_table_id = aws_route_table.haroute.id
}


resource "aws_route_table_association" "subnetAssociation2" {
  depends_on = [ aws_route_table.haroute ]


  subnet_id = aws_subnet.hapublic-2.id
  route_table_id = aws_route_table.haroute.id
}

Change the main routing table for the current vpc:

# Main route table association
resource "aws_main_route_table_association" "a" {
  vpc_id         = aws_vpc.hacluster_vpc.id
  route_table_id = aws_route_table.haroute.id
}

5. Create security group rules to allow firewall port for the ha cluster intercommunication

(port number: 22/TCP, 2224/TCP, 3121/TCP, 5403/TCP, 5404/UDP, 5405/UDP, 21064/TCP, 9929/TCP, 9929/UDP )

// Add the security group rules

resource "aws_security_group" "allowed_rules" {
  depends_on = [ aws_vpc.hacluster_vpc ]


  name = "hacluster"
  description = "Security Group rules for the HAcluster"
  vpc_id = aws_vpc.hacluster_vpc.id

  # Ingrees rules for HA-Cluster

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing ssh connectivity"
    from_port = 22
    protocol = "tcp"
    to_port = 22
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing pcsd connectivity"
    from_port = 2224
    protocol = "tcp"
    to_port = 2224
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing crmd connectivity"
    from_port = 3121
    protocol = "tcp"
    to_port = 3121
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing corosync-qnetd connectivity"
    from_port = 5403
    protocol = "tcp"
    to_port = 5403
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing corosync multicast-udp connectivity"
    from_port = 5404
    protocol = "udp"
    to_port = 5404
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing corosync connectivity"
    from_port = 5405
    protocol = "udp"
    to_port = 5405
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing CLVM connectivity"
    from_port = 21064
    protocol = "tcp"
    to_port = 21064
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing booth-ticket manager connectivity"
    from_port = 9929
    protocol = "tcp"
    to_port = 9929
  }

  ingress {
    cidr_blocks = [ "0.0.0.0/0" ] # Here it mean from where the client can enter i.e client origin
    description = "Allowing booth-ticket manager connectivity"
    from_port = 9929
    protocol = "udp"
    to_port = 9929
  }

  egress {
    cidr_blocks = [ "0.0.0.0/0" ]
    from_port = 0
    protocol = -1
    to_port = 0
  }


  tags = {
    "Name" = "HA-firewall-rules"
  }
}

6. Create ec2 instances in two different subnets each containing 2 instance

# Now launch the 4 instances in the 2 different public subnets


resource "aws_instance" "ha-nodes-1" {
   
  count = var.instances_per_subnet
  ami = var.aws_ami_id
  instance_type = var.instance_type
  key_name = "ha-key"
  subnet_id = aws_subnet.hapublic-1.id
 
  vpc_security_group_ids = [ aws_security_group.allowed_rules.id ]
}


resource "aws_instance" "ha-nodes-2" {

  count = var.instances_per_subnet
  ami = var.aws_ami_id
  instance_type = var.instance_type
  key_name = "ha-key"
  subnet_id = aws_subnet.hapublic-2.id

  vpc_security_group_ids = [ aws_security_group.allowed_rules.id ]
}

(If your region supports multi ebs attachment then you can add this snippet)

# Create an EBS Volume
resource "aws_ebs_volume" "terra-vol-1" {
  depends_on = [ aws_instance.ha-nodes-1 ]
  availability_zone = aws_instance.ha-nodes-1[0].availability_zone
  size              = var.ebs_size
  multi_attach_enabled = true
  
  tags = {
    Name = "ebs-vol"
  }
}


# Now attach this volume to the ec2 instances
resource "aws_volume_attachment" "ebs_att-1" {
  count =  var.instances_per_subnet
  device_name  = var.ebs_device_name
  volume_id    = aws_ebs_volume.terra-vol-1.id
  instance_id  = element(aws_instance.ha-nodes-1.*.id, count.index)
  force_detach = true
depends_on = [
  aws_instance.ha-nodes-1,
  aws_ebs_volume.terra-vol-1
  ]
}


# Create an EBS Volume
resource "aws_ebs_volume" "terra-vol-2" {
  depends_on = [ aws_instance.ha-nodes-2 ]
  availability_zone = aws_instance.ha-nodes-2[0].availability_zone
  size              = var.ebs_size
  # multi_attach_enabled = true
  
  tags = {
    Name = "ebs-vol-2"
  }
}


# Now attach this volume to the ec2 instances
resource "aws_volume_attachment" "ebs_att-2" {
  count =  var.instances_per_subnet
  device_name  = var.ebs_device_name
  volume_id    = aws_ebs_volume.terra-vol-2.id
  instance_id  = element(aws_instance.ha-nodes-2.*.id, count.index)
  force_detach = true
depends_on = [
  aws_instance.ha-nodes-2,
  aws_ebs_volume.terra-vol-2
  ]
}

7. Now we have to set up the instances so that ansible-playbook can run successfully as python interpreter should be installed on all the instances.

For this, I have used the remote-exec provisioner under the null_resource resource.

# Prepare the ha cluster nodes to be served through ansible


resource "null_resource" "setupRemoteNodes-1" {
  count = var.instances_per_subnet


  depends_on = [ aws_instance.ha-nodes-1 ]
    # Ansible requires that the remote system has python already installed in it
  provisioner "remote-exec" {
  inline = ["sudo yum install python3 -y"]
  }
  connection {
    type = "ssh"
    host = element(aws_instance.ha-nodes-1.*.public_ip, count.index)
    private_key = file(var.private_key)
    user = var.ansible_user
  }


}


resource "null_resource" "setupRemoteNodes-2" {
  count = var.instances_per_subnet


  depends_on = [ aws_instance.ha-nodes-2 ]
    # Ansible requires that the remote system has python already installed in it
  provisioner "remote-exec" {
  inline = ["sudo yum install python3 -y"]
  }
  connection {
    type = "ssh"
    host = element(aws_instance.ha-nodes-2.*.public_ip, count.index)
    private_key = file(var.private_key)
    user = var.ansible_user
  }


}

8. Finally for ansible-playbook to execute we need an inventory storing the login and IP address of the instances. So for this, I have used the Linux commands and run them locally as:

resource "null_resource" "setupAnsible" {
  depends_on = [ aws_instance.ha-nodes-2 ]
  provisioner "local-exec" {
    command = <<EOT
      sleep 20;

      >./playbooks/inventory.ini;

  echo "[hanodes_public]" | tee -a ../playbooks/inventory.ini;

  echo "${aws_instance.ha-nodes-1[0].public_dns} private_ip=${aws_instance.ha-nodes-1[0].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini;

  echo "${aws_instance.ha-nodes-1[1].public_dns} private_ip=${aws_instance.ha-nodes-1[1].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini;

  echo "${aws_instance.ha-nodes-2[0].public_dns} private_ip=${aws_instance.ha-nodes-2[0].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini;

  echo "${aws_instance.ha-nodes-2[1].public_dns} private_ip=${aws_instance.ha-nodes-2[1].private_dns} ansible_user=${var.ansible_user} ansible_ssh_private_key_file=${var.private_key}" | tee -a ./playbooks/inventory.ini;

        export ANSIBLE_HOST_KEY_CHECKING=False;

         cd ./playbooks;

        ansible-playbook -i inventory.ini ha-cluster.yaml --vault-password-file .passwd;

      EOT
  }
}

The output of the inventory file looks like:

No alt text provided for this image
anodes_public]
ec2-65-0-80-53.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-0-206.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem

ec2-13-127-111-30.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-0-159.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem

ec2-13-233-157-253.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-1-152.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem

ec2-13-126-12-90.ap-south-1.compute.amazonaws.com private_ip=ip-192-168-1-181.ap-south-1.compute.internal ansible_user=ec2-user ansible_ssh_private_key_file=../terraform/ha-cluster.pem


Next comes the ansible file:

To configure I have created two files: credentials.yaml and ha-cluster.yaml

In the credentials.yaml file I have stored the login details as required by the ha cluster while setup. And ha-cluster.yaml file will set up, start, and enable the cluster.

In setting up the cluster, there are some elements that need to configured or executed on all the nodes while the other only on the single node.

The following snippet is required to run on all the nodes as:

No alt text provided for this image

# In this, we are installing software packages, enable High availability repo, starting the cluster service, and setting the password for the user

( HA cluster by default uses the hacluster user and we only need to set up the password for it)

No alt text provided for this image
No alt text provided for this image

# In this, we are authenticating the cluster with the specified user and password

Note: Here I have manually put an entry for all the hosts, we can also make use of the jinja template with the help of a script file

    - name: Gather cluster status
      command: "pcs cluster status"
      register: cluster_status
      ignore_errors: true

Use this just to add the idempotency in setting up the cluster

No alt text provided for this image

# In this, we setting up the cluster with the name cluster0 and only running it on one of the cluster nodes and printing its output using the debug module.

Finally to check if the cluster is set up or not:

No alt text provided for this image

The output would be:

No alt text provided for this image

Github link

And tomorrow I will post my pending post of iscsi integration with Openshift. And if you have any doubt regarding the topic, feel free to drop me a message.

Thank you.


要查看或添加评论,请登录

Kritik Sachdeva的更多文章

  • Cephadm | Part-1

    Cephadm | Part-1

    In this small article, I will be covering a one of most of important component and critical component of ceph "cephadm"…

    3 条评论
  • Kubernetes Custom Controller Part-2

    Kubernetes Custom Controller Part-2

    This is a second blog of a two part series for the custom k8s controller. If you have no idea or knowledge about the…

  • Kubernetes Custom Controllers part-1

    Kubernetes Custom Controllers part-1

    What is a Controller? And what is a custom controller? Controller is an application or feature in k8s that looks up for…

    6 条评论
  • Deep dive into Ceph Scrubbing

    Deep dive into Ceph Scrubbing

    Definition Scrubbing is a mechanism in Ceph to maintain data integrity, similar to fsck in the file system, that will…

    13 条评论
  • "Ceph" a new era in the Storage world Part-1

    "Ceph" a new era in the Storage world Part-1

    Ceph is a Software Defined Storage solution created by Sage Weil in 2003 as part of his Ph.D project in California.

    6 条评论
  • Integration of Iscsi Storage with openshift | Part-2

    Integration of Iscsi Storage with openshift | Part-2

    Hi guys, in this post I will cover the iscsi server setup and how to automate it with ansible and use cases of…

    1 条评论
  • Storage Integration with Openshift

    Storage Integration with Openshift

    In this post, I am gonna show you how to integrate some of the storage solutions with openshift. And for the…

  • Podman vs Docker deploying a WordPress application

    Podman vs Docker deploying a WordPress application

    Container technology is the most popular tool used widely for the faster deployment of the application on to servers…

  • Amazon Kubernetes as a Service

    Amazon Kubernetes as a Service

    Why we need Aws to run our Kubernetes? What is Kubernetes? and so on..

    2 条评论
  • Basic Example to explain the Workflow of DevOps using webserver

    Basic Example to explain the Workflow of DevOps using webserver

    In companies, three different teams are working in parallel to push the application into the production world, so that…

社区洞察