Federated Queries with Trino: Joining Data Across Multiple MySQL , PostgreSQL(Vice Versa) Hands on labs for Begineers

Federated Queries with Trino: Joining Data Across Multiple MySQL , PostgreSQL(Vice Versa) Hands on labs for Begineers

In the world of modern data architectures, it is not uncommon to find multiple databases in use across an organization. These databases often serve different purposes — from transactional databases (like MySQL) to analytical ones (like PostgreSQL). As data engineers and analysts, the ability to query and join data across these databases in a seamless manner is a powerful skill that can save time and resources. This is where Trino, a distributed SQL query engine, shines by enabling federated queries across heterogeneous data sources.

In this blog post, we’ll walk through a practical lab that demonstrates how you can use Trino to query and join data across MySQL and PostgreSQL databases. Whether you're working with different versions of MySQL or a combination of MySQL and PostgreSQL, Trino provides a unified layer to query and join data from multiple sources, all in real time.



Video Labs

<TBD>


Use Cases for Federated Queries with Trino

Before we dive into the lab, let’s discuss some real-world use cases where federated queries can be highly beneficial:

  1. Data Integration: Combine transactional data from MySQL with analytical data from PostgreSQL without the need for ETL jobs or data migration. This can provide real-time access to data stored in multiple databases without replication.
  2. Reporting and Dashboards: Generate reports that require data from various sources, such as employee data from MySQL and project data from PostgreSQL. With Trino, you can create a unified view across databases.
  3. Cross-Database Analytics: If your organization uses different databases for various applications (e.g., MySQL for e-commerce and PostgreSQL for analytics), Trino allows you to analyze data across them without requiring manual data exports or complex joins.
  4. Real-Time Querying: Ad-hoc querying across databases for real-time insights, allowing teams to run flexible and quick queries without the overhead of data preparation or movement.


Setting Up the Lab: Docker Compose to Spin Up the Stack

To demonstrate the power of Trino in querying and joining data across different databases, we’ll use Docker Compose to spin up a simple stack. The stack consists of:

  • MySQL 1: For storing employee data.
  • MySQL 2: For storing department data.
  • PostgreSQL: For storing project data.
  • Trino: As the federated query engine to run SQL queries across these databases.

Docker Compose File

First, we need a docker-compose.yml file to spin up the services. This file will define the MySQL and PostgreSQL databases along with Trino, which will act as our query engine.


docker compose up --build -d

Use docker Desktop exec into container mysql1

Lets do same process for MYSQL2

Trino Configuration

We need to configure Trino to connect to the MySQL and PostgreSQL databases. For this, create the following property files inside the ./trino/etc directory:


MySQL1 Connection (mysql1.properties):

MySQL2 Connection (mysql2.properties):

PostgreSQL Connection (postgres.properties):


With this setup, Trino is now configured to access data from MySQL1, MySQL2, and PostgreSQL.

Running Queries in Trino

Now that we have the stack set up, let’s move on to running some queries using Trino. You can execute these queries in Trino’s SQL interface or using a notebook interface (such as Jupyter with a %sql magic command).

1. Querying MySQL1: Employees Data

First, let’s look at the employees data stored in MySQL1.


2. Querying MySQL2: Departments Data

3. Querying PostgreSQL: Projects Data

Joining Data Across MySQL and PostgreSQL

Joining Two MySQL Tables

We’ll join the employees table from MySQL1 with the departments table from MySQL2 based on the dept_id.

Joining PostgreSQL and MySQL2

Make sure to try these labs that the best way to learn

https://github.com/soumilshah1995/trino-federated-lab


Conclusion

In this blog, we’ve walked through how to leverage Trino to perform federated queries across multiple databases, including MySQL and PostgreSQL. Trino’s ability to seamlessly join data from different sources allows you to get real-time insights without the need for complex ETL processes or data migration. By spinning up a Docker stack with MySQL and PostgreSQL, we demonstrated practical use cases such as combining employee and department data, as well as project and department data, using SQL queries across these heterogeneous databases.

Trino truly shines when it comes to cross-database analytics, giving data engineers and analysts the flexibility to query and combine data from multiple sources with minimal effort.

Mohamed Fazan

Data Engineer | Data Analyst | Data Scientist | AWS

3 个月

Very informative

Aakash Kumar Dhal

AI & Data Engineering | GenAI | Spark | Airflow | DBT | Snowflake | ETL/ELT

3 个月

Can we do the same with semi structured files in some object store? Is there a way to run ddl/dml commands on these type of files and make an endpoint using which we can interact with it. Is that possible? What technologies would you use for that?

Swamy Naidu Lenka

Big Data Developer | Data Engineering | Python | SQL | Spark | Scala | Hadoop | Sqoop | Hive | AWS | Software Testing

3 个月

Good ????

回复

要查看或添加评论,请登录

Soumil S.的更多文章

社区洞察

其他会员也浏览了