Managing PostgreSQL Query Disconnections during AWS Batch Job Termination

Managing PostgreSQL Query Disconnections during AWS Batch Job Termination

Recently, one of our project teams reached out to me about an issue they encountered: terminating an AWS Batch job wasn’t stopping the SQL query running on the PostgreSQL database. This article covers how we addressed this issue and resolve this issue.

Challenge

The team has developed multiple AWS batch jobs using a Docker image that executes an SQL query on PostgreSQL and exports the data to an S3 bucket via a psql command. The query syntax was like

SELECT * from aws_s3.query_export_to_s3( select col1,col2…. from TABLE …        

Most of the jobs take about 10 to 15 minutes to complete the SQL execution. However, we encountered an issue where, if a job was terminated automatically or manually, the SQL query initiated by the batch job would continue running instead of stopping. This can be a bottleneck whenever there are long running jobs, this can dump multiple SQL’s in running state in the database.

Solution

AWS Batch job sends the SIGTERM signals to docker before it terminates the job. In the container, we can capture the SIGTERM signal and control what action need to be performed.

In our case since we were using the bash PSQL command I decided to inject the trap and capture the SIGTERM signal and gracefully kill the running process.

BatchJob Definition at beginning

{
 "jobRoleArn": "${batch_job_role_arn}",
 "image": "${docker_image_tag}",
 "memory": ${memory},
 "vcpus": ${vcpus},
 "command": ["sh", "-c", "psql -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE -f /script.sql"],
 "environment": [{
   ..... 
  } 
 ]
}        

BatchJob Definition after fix

Added “trap ‘kill -TERM $(pgrep psql)’ SIGTERM; exec” this will capture the SIGTERM signal and kills the SQL before the job terminates.

{
 "jobRoleArn": "${batch_job_role_arn}",
 "image": "${docker_image_tag}",
 "memory": ${memory},
 "vcpus": ${vcpus},
 "command": ["sh", "-c", "trap 'kill -TERM $(pgrep psql)' SIGTERM; exec  psql -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE -f /script.sql"],
 "environment": [{
   ..... # Removed 
  } 
 ]
}        

Another Solution

We can improve this further by using Python script and handle it with handle_sigterm.

import os
import subprocess
import signal
import sys
 
pg_username = os.getenv("PG_USERNAME")
pg_password = os.getenv("PG_PASSWORD")
pg_host = os.getenv("PGHOST")
pg_port = os.getenv("PGPORT")
pg_database = os.getenv("PGDATABASE")

def handle_sigterm(signum, frame):
    print("Received SIGTERM, terminating PostgreSQL command...")
    if process:
        process.terminate()   
        process.wait()        
    sys.exit(0)

# Register the SIGTERM handler
signal.signal(signal.SIGTERM, handle_sigterm)

# PostgreSQL command to run with credentials
command = [
    "psql",
    "-h", pg_host,
    "-p", pg_port,
    "-U", pg_username,
    "-d", pg_database,
    "-f", "./script.sql"
]
 
env = os.environ.copy()
env["PGPASSWORD"] = pg_password
 
try:
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env)
    stdout, stderr = process.communicate()   
    print(stdout.decode())  
    if stderr:
        print(stderr.decode())   
except Exception as e:
    print(f"An error occurred: {e}")
finally:
     
    if process and process.poll() is None:
        process.terminate()        

要查看或添加评论,请登录

Nikhil Surendran的更多文章

社区洞察

其他会员也浏览了