Automating Data Import from MySQL to HDFS Using Sqoop

Automating Data Import from MySQL to HDFS Using Sqoop

It's essential to efficiently transfer data from various sources into Hadoop's distributed storage. Sqoop (SQL-to-Hadoop) is an indispensable tool that helps in transferring structured data from relational databases like MySQL into Hadoop Distributed File System (HDFS). In this post, we will explore how to automate this process using a simple shell script.


Prerequisites

Before diving into the script, make sure you have the following set up:

  • Sqoop is installed and configured on your system.
  • MySQL is installed, and you have access to the database.
  • HDFS is up and running.

Step 1: Understanding Sqoop and Hadoop Commands

We’ll use two main commands in this script:

  1. Sqoop Import: This command transfers data from MySQL to HDFS.
  2. Hadoop fs commands: These commands are used to interact with HDFS (list files and view file contents).

Step 2: Writing the Shell Script

Here is a simple shell script that imports a table from a MySQL database to HDFS using Sqoop, lists the contents of the HDFS directory, and displays the data:

Script: import_data.sh

#!/bin/bash

# Step 1: Import data from MySQL into HDFS using Sqoop
echo "Starting data import from MySQL to HDFS..."
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username root \
  --password cloudera \
  --table customers \
  --target-dir /user/cloudera/oct07op

# Step 2: List the files in the target HDFS directory
echo "Listing the contents of the HDFS directory..."
hadoop fs -ls /user/cloudera/oct07op

# Step 3: Display the contents of the imported files
echo "Displaying the contents of the imported data..."
hadoop fs -cat /user/cloudera/oct07op/*
        

Step 3: Save and Execute the Script

  1. Save the script: Save the above code in a file named import_data.sh.
  2. Make the script executable: Open a terminal and navigate to the directory where the script is saved. Run the following command to make it executable:

chmod +x import_data.sh        

Execute the script: Run the script using the command:

./import_data.sh        

The script will:

  • Import data from the customers table in the MySQL database retail_db.
  • Store the data in the HDFS directory /user/cloudera/oct07op.
  • List and display the contents of the imported data.



要查看或添加评论,请登录

Jaswanth Kumar的更多文章

社区洞察

其他会员也浏览了