Automating Data Import from MySQL to HDFS Using Sqoop
It's essential to efficiently transfer data from various sources into Hadoop's distributed storage. Sqoop (SQL-to-Hadoop) is an indispensable tool that helps in transferring structured data from relational databases like MySQL into Hadoop Distributed File System (HDFS). In this post, we will explore how to automate this process using a simple shell script.
Prerequisites
Before diving into the script, make sure you have the following set up:
Step 1: Understanding Sqoop and Hadoop Commands
We’ll use two main commands in this script:
Step 2: Writing the Shell Script
Here is a simple shell script that imports a table from a MySQL database to HDFS using Sqoop, lists the contents of the HDFS directory, and displays the data:
领英推荐
Script: import_data.sh
#!/bin/bash
# Step 1: Import data from MySQL into HDFS using Sqoop
echo "Starting data import from MySQL to HDFS..."
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password cloudera \
--table customers \
--target-dir /user/cloudera/oct07op
# Step 2: List the files in the target HDFS directory
echo "Listing the contents of the HDFS directory..."
hadoop fs -ls /user/cloudera/oct07op
# Step 3: Display the contents of the imported files
echo "Displaying the contents of the imported data..."
hadoop fs -cat /user/cloudera/oct07op/*
Step 3: Save and Execute the Script
chmod +x import_data.sh
Execute the script: Run the script using the command:
./import_data.sh
The script will: