AWS RDS to GCP AlloyDB (Postgres) Migration using GCP DMS
Jisto Jose
Data Architect | DE | DBE | DB Consultant | DB Migration Expert | 5x Multi-cloud Certificate
Contents Credits : Raj Kumar
Source and Destination:
Source - AWS RDS PostgreSQL
Destination - GCP AlloyDB PostgreSQL
Source PostgreSQL Version - 13.9. Modify Instance to use the Custom Parameter Group if not done.
Destination PostgreSQL Version - 14
Let’s validate if the source PostgreSQL is up and Running.
dvdrental=> select pg_postmaster_start_time()
? ?pg_postmaster_start_time
-------------------------------
?2023-02-12 00:40:09.588471+00
(1 row)
dvdrental=> select now();
? ? ? ? ? ? ? now
-------------------------------
?2023-02-12 01:07:11.121295+00
(1 row)
dvdrental=>;
We have created the Source Database with 15MB data. Maximum row count is ‘16044’ rows for 'Rental Table. We have Functions, Triggers and views and Sequences created in the Source Database.
dvdrental=# select count(*) from rental
?count
-------
?16044
(1 row)
dvdrental=# select count(*) from payment;
?count
-------
?14596
(1 row)
dvdrental=# select * from rental limit 5;
?rental_id |? ? ?rental_date? ? ?| inventory_id | customer_id |? ? ?return_date? ? ?| staff_id |? ? ?last_update
-----------+---------------------+--------------+-------------+---------------------+----------+---------------------
? ? ? ? ?2 | 2005-05-24 22:54:33 |? ? ? ? ?1525 |? ? ? ? ?459 | 2005-05-28 19:40:33 |? ? ? ? 1 | 2006-02-16 02:30:53
? ? ? ? ?3 | 2005-05-24 23:03:39 |? ? ? ? ?1711 |? ? ? ? ?408 | 2005-06-01 22:12:39 |? ? ? ? 1 | 2006-02-16 02:30:53
? ? ? ? ?4 | 2005-05-24 23:04:41 |? ? ? ? ?2452 |? ? ? ? ?333 | 2005-06-03 01:43:41 |? ? ? ? 2 | 2006-02-16 02:30:53
? ? ? ? ?5 | 2005-05-24 23:05:21 |? ? ? ? ?2079 |? ? ? ? ?222 | 2005-06-02 04:33:21 |? ? ? ? 1 | 2006-02-16 02:30:53
? ? ? ? ?6 | 2005-05-24 23:08:07 |? ? ? ? ?2792 |? ? ? ? ?549 | 2005-05-27 01:32:07 |? ? ? ? 1 | 2006-02-16 02:30:53
(5 rows)
dvdrental=# select * from payment limit 5;
?payment_id | customer_id | staff_id | rental_id | amount |? ? ? ? payment_date
------------+-------------+----------+-----------+--------+----------------------------
? ? ? 17503 |? ? ? ? ?341 |? ? ? ? 2 |? ? ? 1520 |? ?7.99 | 2007-02-15 22:25:46.996577
? ? ? 17504 |? ? ? ? ?341 |? ? ? ? 1 |? ? ? 1778 |? ?1.99 | 2007-02-16 17:23:14.996577
? ? ? 17505 |? ? ? ? ?341 |? ? ? ? 1 |? ? ? 1849 |? ?7.99 | 2007-02-16 22:41:45.996577
? ? ? 17506 |? ? ? ? ?341 |? ? ? ? 2 |? ? ? 2829 |? ?2.99 | 2007-02-19 19:39:56.996577
? ? ? 17507 |? ? ? ? ?341 |? ? ? ? 2 |? ? ? 3130 |? ?7.99 | 2007-02-20 17:31:48.996577
(5 rows)
dvdrental=#
dvdrental=# \d rental
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Table "public.rental"
? ? Column? ? |? ? ? ? ? ? Type? ? ? ? ? ? ?| Collation | Nullable |? ? ? ? ? ? ? ? ? Default
--------------+-----------------------------+-----------+----------+-------------------------------------------
?rental_id? ? | integer? ? ? ? ? ? ? ? ? ? ?|? ? ? ? ? ?| not null | nextval('rental_rental_id_seq'::regclass)
?rental_date? | timestamp without time zone |? ? ? ? ? ?| not null |
?inventory_id | integer? ? ? ? ? ? ? ? ? ? ?|? ? ? ? ? ?| not null |
?customer_id? | smallint? ? ? ? ? ? ? ? ? ? |? ? ? ? ? ?| not null |
?return_date? | timestamp without time zone |? ? ? ? ? ?|? ? ? ? ? |
?staff_id? ? ?| smallint? ? ? ? ? ? ? ? ? ? |? ? ? ? ? ?| not null |
?last_update? | timestamp without time zone |? ? ? ? ? ?| not null | now()
Indexes:
? ? "rental_pkey" PRIMARY KEY, btree (rental_id)
? ? "idx_fk_inventory_id" btree (inventory_id)
? ? "idx_unq_rental_rental_date_inventory_id_customer_id" UNIQUE, btree (rental_date, inventory_id, customer_id)
Foreign-key constraints:
? ? "rental_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customer(customer_id) ON UPDATE CASCADE ON DELETE RESTRICT
? ? "rental_inventory_id_fkey" FOREIGN KEY (inventory_id) REFERENCES inventory(inventory_id) ON UPDATE CASCADE ON DELETE RESTRICT
? ? "rental_staff_id_key" FOREIGN KEY (staff_id) REFERENCES staff(staff_id)
Referenced by:
? ? TABLE "payment" CONSTRAINT "payment_rental_id_fkey" FOREIGN KEY (rental_id) REFERENCES rental(rental_id) ON UPDATE CASCADE ON DELETE SET NULL
Triggers:
? ? last_updated BEFORE UPDATE ON rental FOR EACH ROW EXECUTE FUNCTION last_updated()
dvdrental=#;
Migration Strategy:
Using GCP DMS - Full Load and CDC
Configuring the Source AWS RDS PostgreSQL:
Connect to the instance and set the following parameters, as needed:
Create pglogical Extension at source,
[root@ip-172-31-13-25 restore_dir]# psql -h cbaserdsinstance.capyyxpjdvn7.ap-south-1.rds.amazonaws.com -p 5432 -U postgres -d dvdrenta
Password for user postgres:
psql (13.9)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
dvdrental=> show shared_preload_libraries ;
? ? ? ?shared_preload_libraries
---------------------------------------
?rdsutils,pg_stat_statements,pglogical
(1 row)
dvdrental=> create extension pglogical;
CREATE EXTENSION
dvdrental=> \dx
? ? ? ? ? ? ? ? ? ?List of installed extensions
? ?Name? ? | Version |? ?Schema? ?|? ? ? ? ? Description
-----------+---------+------------+--------------------------------
?pglogical | 2.4.1? ?| pglogical? | PostgreSQL Logical Replication
?plpgsql? ?| 1.0? ? ?| pg_catalog | PL/pgSQL procedural language
(2 rows)
dvdrental=> select * FROM pg_catalog.pg_extension;
? oid? |? extname? | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-------+-----------+----------+--------------+----------------+------------+-----------+--------------
?14281 | plpgsql? ?|? ? ? ?10 |? ? ? ? ? ?11 | f? ? ? ? ? ? ? | 1.0? ? ? ? |? ? ? ? ? ?|
?25305 | pglogical |? ? ? ?10 |? ? ? ? 25304 | f? ? ? ? ? ? ? | 2.4.1? ? ? |? ? ? ? ? ?|
(2 rows)
dvdrental=>
postgres=>? create extension pglogical;
CREATE EXTENSION
postgres=>
postgres=> select * FROM pg_catalog.pg_extension;
? oid? |? extname? | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-------+-----------+----------+--------------+----------------+------------+-----------+--------------
?14281 | plpgsql? ?|? ? ? ?10 |? ? ? ? ? ?11 | f? ? ? ? ? ? ? | 1.0? ? ? ? |? ? ? ? ? ?|
?16931 | pglogical |? ? ? ?10 |? ? ? ? 16930 | f? ? ? ? ? ? ? | 2.4.1? ? ? |? ? ? ? ? ?|
(2 rows)
postgres=>l
Set the rds.logical_replication parameter in your DB CLUSTER parameter group to 1. ( RDS databases → select rds instance → Configuration tab →Parameter Group → Edit parameters ? ). If you set the rds.logical_replication parameter value to 1, AWS RDS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters automatically.
dvdrental=> show wal_level
?wal_level
-----------
?logical
(1 row)
dvdrental=> show max_wal_senders;
?max_wal_senders
-----------------
?25
(1 row)
dvdrental=> show max_replication_slots;
?max_replication_slots
-----------------------
?20
(1 row)
To set this parameter, modify the ‘postgresql.conf’ file in the Data Directory.
Here 0 disables the timeout mechanism that’s used to terminate inactive replication connections.
dvdrental=> show wal_sender_timeout
?wal_sender_timeout
--------------------
?0
(1 row)
For example, if there are 5 databases on the source instance and if there will be 2 migration jobs created for the source, then the number of replication slots must be at least 5 * 2 = 10, plus the number of replication slots already used by you.
To set this parameter, modify the Custom Parameter Group.
dvdrental=> show max_replication_slots
?max_replication_slots
-----------------------
?20
(1 row);
For example, if the max_replication_slots parameter is set to 10, and you’re already using 2 senders, then the number of WAL sender processes running at the same time would be 10 + 2 = 12.
To set this parameter, modify the Custom Parameter Group.
dvdrental=> show max_wal_senders
max_wal_senders
-----------------
25
Configuring the GCP DMS:
Create a migration job
Go to the Migration jobs page in the Google Cloud Console.
Define a source
As we have already created a connection profile, then select it from the list of existing connection profiles.
Click SAVE & CONTINUE.
Create a Destination AlloyDB Cluster & Instance
1. Provide a Destination instance ID (This will be you destination cloud sql instance name)
2. Provide an alphanumeric Root password for the destination Cloud SQL instance. This will be the password for the root administrator account in the instance.
You can either enter the password manually or click GENERATE to have the Database Migration Service for PostgreSQL create one for you automatically.
Tip: To see the password in clear text, click the Reveal icon. This icon looks like an eye.
3. Choose the database version for the destination instance from the list of supported Database Migration Service versions for the specified database engine.
4. Review the cluster type and then configure your cluster,
5. Configure your cluster and then configure your primary instance,
Setup SSH Tunnel between your machine with access to the source and the newly created VM.
Install GCloud CLI in AWS EC2 Machine,
Run the below Script in the source AWS EC2 Machine,
[root@ip-172-31-13-25 ~]# cat reverse_ssh_tunnel.sh
#!/bin/bash
set -e
export VM_NAME=cbasetoalloydbpginstance
export PROJECT_ID=cc-data-sandbox
export VM_ZONE=asia-southeast1-b
export VM_MACHINE_TYPE=n1-standard-1
export SUBNET_NAME=alloydb
export VM_PORT=5432
export SOURCE_DB_LOCAL_IP=cbaserdsinstance.capyyxpjdvn7.ap-south-1.rds.amazonaws.com
export SOURCE_DB_LOCAL_PORT=5432
export VERIFY_CONNECTIVITY=true
function verify_connectivity_to_source_db {
? if [[ "${VERIFY_CONNECTIVITY}" = true ]]; then
? ? if [[ $- =~ e ]]; then
? ? ? USE_E=true
? ? ? set +e
? ? fi
? ? NETCAT=$(which nc)
? ? if [[ -n "${NETCAT}" ]]; then
? ? ? echo "Verifying connectivity via netcat command to source DB: IP ${SOURCE_DB_LOCAL_IP} port ${SOURCE_DB_LOCAL_PORT}"
? ? ? nc -zv "${SOURCE_DB_LOCAL_IP}" "${SOURCE_DB_LOCAL_PORT}" &> /dev/null
? ? ? NC_RES=$?
? ? ? if (( NC_RES == 0 )); then
? ? ? ? echo "Connection to source DB verified"
? ? ? else
? ? ? ? echo "Connection refused, please verify that the machine you are using to run the script can connect to the source database at ${SOURCE_DB_LOCAL_IP}:${SOURCE_DB_LOCAL_PORT}"
? ? ? ? exit $NC_RES
? ? ? fi
? ? else
? ? ? echo "Verifying connectivity via /dev/tcp command to source DB: IP ${SOURCE_DB_LOCAL_IP} port ${SOURCE_DB_LOCAL_PORT}"
? ? ? DEV_TCP_CMD="cat < /dev/null > /dev/tcp/${SOURCE_DB_LOCAL_IP}/${SOURCE_DB_LOCAL_PORT}"
? ? ? timeout 5 bash -c "${DEV_TCP_CMD}" &> /dev/null
? ? ? DEV_TCP_RES=$?
? ? ? ? if (( DEV_TCP_RES == 0 )); then
? ? ? ? ? echo "Connection to source DB verified"
? ? ? ? else
? ? ? ? ? echo "Connection refused, please verify that the machine you are using to run the script can connect to the source database at ${SOURCE_DB_LOCAL_IP}:${SOURCE_DB_LOCAL_PORT}"
? ? ? ? ? exit $DEV_TCP_RES
? ? ? ? fi
? ? fi
? ? if [[ "$USE_E" = true ]]; then
? ? ? set -e
? ? fi
? fi
}
function create_instance {
? echo "Creating a virtual machine (VM) instance named '${VM_NAME}' in zone '${VM_ZONE}'"
? gcloud compute instances create "${VM_NAME}" --machine-type "${VM_MACHINE_TYPE}" --zone "${VM_ZONE}" --project="${PROJECT_ID}" --subnet "${SUBNET_NAME}"
? if [[ $- =~ e ]]; then
? ? USE_E=true
? ? set +e
? fi
? CURRENT_NUM_OF_SSH_RETRIES=0
? MAX_NUM_OF_SSH_RETRIES=10
? SSH_RETRY_INTERVAL=5
? echo "Validating ssh connectivity to ${VM_NAME}"
? until (( CURRENT_NUM_OF_SSH_RETRIES > MAX_NUM_OF_SSH_RETRIES )); do
? ? gcloud compute ssh "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" -- 'true' && break
? ? (( CURRENT_NUM_OF_SSH_RETRIES+=1 ))
? ? if (( CURRENT_NUM_OF_SSH_RETRIES <= MAX_NUM_OF_SSH_RETRIES )); then
? ? ? echo "Retry ${CURRENT_NUM_OF_SSH_RETRIES}/${MAX_NUM_OF_SSH_RETRIES} - ssh to \"${VM_NAME}\" --zone=\"${VM_ZONE}\" --project=\"${PROJECT_ID}\" Failed. Retrying in $SSH_RETRY_INTERVAL seconds"
? ? ? sleep $SSH_RETRY_INTERVAL
? ? fi
? done
? if (( CURRENT_NUM_OF_SSH_RETRIES > MAX_NUM_OF_SSH_RETRIES )); then
? ? echo "Failed to establish ssh connection with ${VM_NAME} in project ${PROJECT_ID} and zone ${VM_ZONE}. Learn more: https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh"
? ? exit 1
? fi
? if [[ "$USE_E" = true ]]; then
? ? set -e
? fi
? echo "${VM_NAME} is accessible through ssh"
? gcloud compute ssh "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" -- 'echo "GatewayPorts yes" | sudo tee -a /etc/ssh/sshd_config && sudo service ssh restart'
? private_ip=$(gcloud compute instances describe "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --format='get(networkInterfaces[0].networkIP)')
? echo "VM instance '${VM_NAME}' created with private ip ${private_ip}"
}
function create_reverse_ssh_tunnel {
? echo "Setting up SSH tunnel between the source and the VM on port '${VM_PORT}'"
gcloud compute ssh "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" -- -f -N -R "${VM_PORT}:${SOURCE_DB_LOCAL_IP}:${SOURCE_DB_LOCAL_PORT}"
? if [[ "$?" -eq 0 ]]; then
? ? echo "SSH tunnel is ready on port ${VM_PORT}"
? fi
}
verify_connectivity_to_source_db
create_instance
create_reverse_ssh_tunnel
[root@ip-172-31-13-25 ~]#
You should run the above script on the source EC2 Machine that has connectivity to the source AWS RDS PostgreSQL Instance. It will create an Compute Engine Machine in GCP Environment and Set up the SSH tunnel between your machine with access to the source and the newly created VM
[root@ip-172-31-13-25 ~]# sh raj_reverse.sh
which: no nc in (/root/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
Verifying connectivity via /dev/tcp command to source DB: IP cbaserdsinstance.cehyey333vti.ap-southeast-1.rds.amazonaws.com port 5432
Connection to source DB verified
Creating a virtual machine (VM) instance named 'rajmiginstancecompute' in zone 'asia-southeast1-b'
Created [https://www.googleapis.com/compute/v1/projects/cc-data-sandbox/zones/asia-southeast1-b/instances/rajmiginstancecompute].
NAME? ? ? ? ? ? ? ? ? ?ZONE? ? ? ? ? ? ? ?MACHINE_TYPE? ?PREEMPTIBLE? INTERNAL_IP? EXTERNAL_IP? ? STATUS
rajmiginstancecompute? asia-southeast1-b? n1-standard-1? ? ? ? ? ? ? ?100.64.0.9? ?34.124.185.34? RUNNING
Updates are available for some Google Cloud CLI components.? To install them,
please run:
? $ gcloud components update
Validating ssh connectivity to rajmiginstancecompute
Warning: Permanently added 'compute.5437436962214291788' (ECDSA) to the list of known hosts.
Connection to 34.124.185.34 closed.
rajmiginstancecompute is accessible through ssh
Existing host keys found in /root/.ssh/google_compute_known_hosts
GatewayPorts yes
Connection to 34.124.185.34 closed.
VM instance 'rajmiginstancecompute' created with private ip 100.64.0.9
Setting up SSH tunnel between the source and the VM on port '5432'
Existing host keys found in /root/.ssh/google_compute_known_hosts
SSH tunnel is ready on port 5432
[root@ip-172-31-13-25 ~]#
After you successfully run the script, enter the GCP Bastion VM IP address to allow connection to your destination.
Monitoring:
Once the job has been started after some time the status of job will be changed as below
And then once dump and restore completed job status will be changed as below
Keep monitoring the job, you can see lag between both the servers as below.
Also keep monitoring the storage usage by destination instance, that will give an idea of how much data has been migrated (in this snap till now 30 GB data has been migrated).
Once you see replication as 0 bytes for a long time, it means all the data has been migrated to the destination instance.
We can view the errors and Log Information from ‘Logs Explorer’ page.
Data Validation:
You need to confirm that Database Migration Service used the migration job to migrate data from your source database instance to the destination AlloyDB database instance.
This confirms that Database Migration Service migrated the data.
Promoting a migration:
Stop all writes, running scripts and client connections to the source database. The downtime period begins here and promote the cloud sql instance as master by clicking on “PROMOTE”. Promotion means that the destination Cloud SQL instance is disconnected from the source, and is promoted from a replica instance to a primary instance.
Promote disconnects the source from the destination, and promotes the destination to be a writeable instance. It's recommended to stop writes to the source, and to wait until the replication delay is at zero before initiating promote.
Thus, we have completed the Migration from AWS RDS PostgreSQL to CloudSQL PostgreSQL successfully.
For establishing successful connection to AlloyDB, you should follow steps to Configure Private Service Connection. Currently, we don’t have permission to do it.
Now that the migration and cutover to alloydb have been accomplished, daatabase will function normally in alloydb. This is the point at which the application must point to the GCP alloydb.
Reference Links
Known limitations ?|? Database Migration Service ?|? Google Cloud https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source.PostgreSQL.Security
Content Credits : @raj kumar
Cloud Engineer || DevOps Engineer || DevSecOps || DBA || I AM Developer || Cloud Support & Development || Open Source Developer
1 年Nice .. Jisto jose , can you tell me how to migrate keycloak-16 version to keycloak-18 version migrate with mssql database in kubernets cluster without any data fail. And azure work space also