Top 6 Techniques for Identifying and Resolving Database Differences (Postgres)
Kshitij Wagle
Data Engineer | Data Warehousing | Automation | Data Analyst @WesTrac CAT
Read on my blog: https://dataengjourney.com/top-6-techniques-for-identifying-and-resolving-database-differences-postgres/
In the complex world of database management, ensuring consistency between databases, schemas, and tables is crucial for seamless data operations. This article explores six techniques for identifying and resolving differences within Postgres databases, addressing schema variations, field inconsistencies, and other discrepancies.
1. Checking Schema differences using?Schema diff tool
You can use PgAdmin's in built tool to check schema difference.
Navigation: PgAdmin>Tools>Schema diff
But in SQL SERVER, you will have SQL Server Data Tools (SSDT)
Note: To compare using this method, both source and destination SERVER should have same version of Postgres installed.
Example:
Let's check the schema difference between two modules; i. e. Production and public, which is given below:
2. Comparing fields between two tables in any schema:
Underlying Problems that might exist in your workspace:
Benefits:
How to use it for other tables?
Just rename table schemas and table names you want to compare and run the query from following example, and check the raised output. You can also visualize and compare fields in output.
Example and Query:
Checking whether the fields are consistent or not among following two tables:
The structure of these tables are defined as follows:
Query to compare field differences:
-- Compare field names in Production schema
DO $$
DECLARE
feed_facts_fields text[];
output_facts_fields text[];
BEGIN
-- Get field names for expenserecords
SELECT array_agg(column_name)
INTO feed_facts_fields
FROM information_schema.columns
WHERE table_schema = 'Production'
AND table_name = 'expenserecords';
-- Get field names for paymentrecords
SELECT array_agg(column_name)
INTO output_facts_fields
FROM information_schema.columns
WHERE table_schema = 'Production'
AND table_name = 'paymentrecords';
-- Compare field names
RAISE NOTICE 'Field names in expenserecords: %', feed_facts_fields;
RAISE NOTICE 'Field names in paymentrecords: %', output_facts_fields;
IF feed_facts_fields = output_facts_fields THEN
RAISE NOTICE 'Field names are consistent between the two tables.';
ELSE
RAISE NOTICE 'Field names are inconsistent between the two tables.';
END IF;
END $$;
Query Output in PgAdmin:
Here, we noticed that fields are inconsistent where first table has vendor_name and second table has payee_name. If they are same and you want to make them consistent across the system, you can then modify your ETL job accordingly.
3. Comparing Table Row Counts between tables
SELECT 'your_table_name' AS table_name, COUNT(*) AS row_count
FROM your_schema.your_table
UNION ALL
SELECT 'other_table_name' AS table_name, COUNT(*) AS row_count
FROM other_schema.other_table;
//add UNION ALL and other tables if necessary
领英推荐
4. Compare Views
This query compares views between two databases.
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'your_schema'
EXCEPT
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'other_schema';
5. Checking field difference using SQL joins or using EXCEPT/INTERSECT:
a) Using Traditional Technique (i.e. IN/NOT IN):
Find New Records:
SELECT *
FROM "Database2"."YourSchema"."YourTable"
WHERE "PrimaryKeyColumn" NOT IN (
SELECT "PrimaryKeyColumn"
FROM "Database1"."YourSchema"."YourTable"
);
-- SELECT *
-- FROM "postgres"."Production"."employees"
-- WHERE "employee_id" NOT IN (
-- SELECT "employee_id"
-- FROM "postgres"."public"."employees"
-- );
Find Deleted Records:
-- Find Deleted Records:
SELECT *
FROM "Database1"."YourSchema"."YourTable"
WHERE "PrimaryKeyColumn" NOT IN (
SELECT "PrimaryKeyColumn"
FROM "Database2"."YourSchema"."YourTable"
);
b) Using EXCEPT and INTERSECT
In PostgreSQL, you can leverage the EXCEPT and INTERSECT operators to compare two sets of data effectively. These operators are valuable tools for identifying differences and commonalities between two result sets.
Find New Records:
-- Find New Records:
SELECT * FROM "Database2"."YourSchema"."YourTable"
EXCEPT
SELECT * FROM "Database1"."YourSchema"."YourTable";
Find Deleted Records:
-- Find Deleted Records:
SELECT * FROM "Database1"."YourSchema"."YourTable"
EXCEPT
SELECT * FROM "Database2"."YourSchema"."YourTable";
6. Validating documented Release Note/Data Dictionary with current status of Data Warehouse using ChatGPT in Turbo Mode
Simply paste your text from release note (which might be stored in .doc format) and ask ChatGPT to produce a single query whether those fields exists or not in the data warehouse based on the schema and table name, then you will get long query results, which you can run in PgAdmin and see the surprise.
ChatGPT Prompt Sample:
Conclusion:
In conclusion, comparing two databases in Postgres is a routine yet crucial task for database administrators and developers. This article has covered fundamental techniques for identifying changes between databases, allowing users to pinpoint newly added entries, removed records, and other variations. It's essential to choose the method that best suits your specific needs and the complexity of your data comparison assignment.