Federated Queries with Trino: Joining Data Across Multiple MySQL , PostgreSQL(Vice Versa) Hands on labs for Begineers
In the world of modern data architectures, it is not uncommon to find multiple databases in use across an organization. These databases often serve different purposes — from transactional databases (like MySQL) to analytical ones (like PostgreSQL). As data engineers and analysts, the ability to query and join data across these databases in a seamless manner is a powerful skill that can save time and resources. This is where Trino, a distributed SQL query engine, shines by enabling federated queries across heterogeneous data sources.
In this blog post, we’ll walk through a practical lab that demonstrates how you can use Trino to query and join data across MySQL and PostgreSQL databases. Whether you're working with different versions of MySQL or a combination of MySQL and PostgreSQL, Trino provides a unified layer to query and join data from multiple sources, all in real time.
Video Labs
<TBD>
Use Cases for Federated Queries with Trino
Before we dive into the lab, let’s discuss some real-world use cases where federated queries can be highly beneficial:
Setting Up the Lab: Docker Compose to Spin Up the Stack
To demonstrate the power of Trino in querying and joining data across different databases, we’ll use Docker Compose to spin up a simple stack. The stack consists of:
Docker Compose File
First, we need a docker-compose.yml file to spin up the services. This file will define the MySQL and PostgreSQL databases along with Trino, which will act as our query engine.
docker compose up --build -d
Use docker Desktop exec into container mysql1
Lets do same process for MYSQL2
Trino Configuration
We need to configure Trino to connect to the MySQL and PostgreSQL databases. For this, create the following property files inside the ./trino/etc directory:
MySQL1 Connection (mysql1.properties):
领英推荐
MySQL2 Connection (mysql2.properties):
PostgreSQL Connection (postgres.properties):
With this setup, Trino is now configured to access data from MySQL1, MySQL2, and PostgreSQL.
Running Queries in Trino
Now that we have the stack set up, let’s move on to running some queries using Trino. You can execute these queries in Trino’s SQL interface or using a notebook interface (such as Jupyter with a %sql magic command).
1. Querying MySQL1: Employees Data
First, let’s look at the employees data stored in MySQL1.
2. Querying MySQL2: Departments Data
3. Querying PostgreSQL: Projects Data
Joining Data Across MySQL and PostgreSQL
Joining Two MySQL Tables
We’ll join the employees table from MySQL1 with the departments table from MySQL2 based on the dept_id.
Joining PostgreSQL and MySQL2
Make sure to try these labs that the best way to learn
Conclusion
In this blog, we’ve walked through how to leverage Trino to perform federated queries across multiple databases, including MySQL and PostgreSQL. Trino’s ability to seamlessly join data from different sources allows you to get real-time insights without the need for complex ETL processes or data migration. By spinning up a Docker stack with MySQL and PostgreSQL, we demonstrated practical use cases such as combining employee and department data, as well as project and department data, using SQL queries across these heterogeneous databases.
Trino truly shines when it comes to cross-database analytics, giving data engineers and analysts the flexibility to query and combine data from multiple sources with minimal effort.
Data Engineer | Data Analyst | Data Scientist | AWS
3 个月Very informative
AI & Data Engineering | GenAI | Spark | Airflow | DBT | Snowflake | ETL/ELT
3 个月Can we do the same with semi structured files in some object store? Is there a way to run ddl/dml commands on these type of files and make an endpoint using which we can interact with it. Is that possible? What technologies would you use for that?
Big Data Developer | Data Engineering | Python | SQL | Spark | Scala | Hadoop | Sqoop | Hive | AWS | Software Testing
3 个月Good ????