Managing PostgreSQL Query Disconnections during AWS Batch Job Termination
Nikhil Surendran
Cloud & DevOps Specialist | Platform Engineering | AWS | GCP , Kubernetes, GitHub Certified | Software Engineer at Heart | Building Scalable, Resilient Infrastructure | Driving Innovation in Tech
Recently, one of our project teams reached out to me about an issue they encountered: terminating an AWS Batch job wasn’t stopping the SQL query running on the PostgreSQL database. This article covers how we addressed this issue and resolve this issue.
Challenge
The team has developed multiple AWS batch jobs using a Docker image that executes an SQL query on PostgreSQL and exports the data to an S3 bucket via a psql command. The query syntax was like
SELECT * from aws_s3.query_export_to_s3( select col1,col2…. from TABLE …
Most of the jobs take about 10 to 15 minutes to complete the SQL execution. However, we encountered an issue where, if a job was terminated automatically or manually, the SQL query initiated by the batch job would continue running instead of stopping. This can be a bottleneck whenever there are long running jobs, this can dump multiple SQL’s in running state in the database.
Solution
AWS Batch job sends the SIGTERM signals to docker before it terminates the job. In the container, we can capture the SIGTERM signal and control what action need to be performed.
In our case since we were using the bash PSQL command I decided to inject the trap and capture the SIGTERM signal and gracefully kill the running process.
领英推荐
BatchJob Definition at beginning
{
"jobRoleArn": "${batch_job_role_arn}",
"image": "${docker_image_tag}",
"memory": ${memory},
"vcpus": ${vcpus},
"command": ["sh", "-c", "psql -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE -f /script.sql"],
"environment": [{
.....
}
]
}
BatchJob Definition after fix
Added “trap ‘kill -TERM $(pgrep psql)’ SIGTERM; exec” this will capture the SIGTERM signal and kills the SQL before the job terminates.
{
"jobRoleArn": "${batch_job_role_arn}",
"image": "${docker_image_tag}",
"memory": ${memory},
"vcpus": ${vcpus},
"command": ["sh", "-c", "trap 'kill -TERM $(pgrep psql)' SIGTERM; exec psql -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE -f /script.sql"],
"environment": [{
..... # Removed
}
]
}
Another Solution
We can improve this further by using Python script and handle it with handle_sigterm.
import os
import subprocess
import signal
import sys
pg_username = os.getenv("PG_USERNAME")
pg_password = os.getenv("PG_PASSWORD")
pg_host = os.getenv("PGHOST")
pg_port = os.getenv("PGPORT")
pg_database = os.getenv("PGDATABASE")
def handle_sigterm(signum, frame):
print("Received SIGTERM, terminating PostgreSQL command...")
if process:
process.terminate()
process.wait()
sys.exit(0)
# Register the SIGTERM handler
signal.signal(signal.SIGTERM, handle_sigterm)
# PostgreSQL command to run with credentials
command = [
"psql",
"-h", pg_host,
"-p", pg_port,
"-U", pg_username,
"-d", pg_database,
"-f", "./script.sql"
]
env = os.environ.copy()
env["PGPASSWORD"] = pg_password
try:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env)
stdout, stderr = process.communicate()
print(stdout.decode())
if stderr:
print(stderr.decode())
except Exception as e:
print(f"An error occurred: {e}")
finally:
if process and process.poll() is None:
process.terminate()