Moving Data to the Cloud - A Practical Guide

Moving Data to the Cloud - A Practical Guide

Moving data to the cloud is one of the cornerstones of any cloud migration. Having worked with both on-premise and cloud-first data centers I have seen both sides of the coin. The challenge I most commonly saw, when moving data, was the challenge of quickly learning how to use the cloud correctly. Often times an engineering team would be tasked with learning dozens of new technologies and implementing them immediately.

This guide presents one of the easiest ways to move large chunks of data from on-premise to the cloud. The tool referenced in this guide is Apache NiFi and is an open source program. For a more comprehensive review of NiFi check out Nifi for Dummies(ebook) that a few of Calculated Systems' founders wrote.

Cloud Storage Options

There are many ways to store data on the cloud but the easiest, and the ones covered by this guide, typically are the object stores. All of the 3 major cloud providers have them:

  • Amazon - S3 Simple Storage Service
  • Azure - Blob Storage
  • Google - GCS Google Cloud Storage

These are an ideal starting point for files as typically you can just land the files without too much forethought or capacity planning. Additionally, these object stores often have insane durability in excess of 10 9s and 4 9s of up-time. This allows you to use them without fear of losing your data in all but the most demanding production uses.

For the purposes of this tutorial we will start with the most common object store, AWS’s S3 service.

AWS S3 Terminology

Before we get started moving data let us establish some basic terminology:

  • Bucket - A grouping of similar files that must have a unique name. These can be made publicly accessible and are often used to host static objects
  • Folder - Much like an operating system folder these exist within a bucket to enable organization
  • IAM - Identity and Access Management controls for making and controlling who and what can interact with your AWS
  • Access Keys - These are your access credentials to using AWS. These are not your typical username/password and are generated using access identity management

Creating an Access Key

For NiFi to have permission to write to S3 we must set it up with an access key pair. There are many ways to do this but best practice would be to create a new IAM User. To get to the IAM User screen you can navigate there or click this link https://console.aws.amazon.com/iam/home#/users

  1. Hit add user and check “Programmatic access”
  2. Enter a new name such as “Nifi_demo”
  3. Click “Next: Permissions”
  4. Click “Create Group” and you will be presented with a list of permissions you can add to this new user.
  5. Enter a group name such as “Nifi_Demo_Group”
  6. Next to filter Policies search for S3 and check “AmazonS3FullAccess” and cick “Create Group”
  7. At the bottom right press “Next:Tags” and Click through to “Next:Review”
  8. Click “Create user” to finish making an IAM User

The Access key ID and Secret Access Key are very important to setting up your data transfer. You can download them as a CSV or save them somewhere safe. Be sure to record your Secret Access key as this is the only time it can be viewed

Creating your S3 Bucket

Now that we have credentials for AWS we need a place to land them. To put it simply, we need to create a new S3 bucket if you do not already have one. Go to this link https://s3.console.aws.amazon.com/s3/

  1. Press “+ Create Bucket”
  2. Enter a unique bucket name and note down the Region you are creating it in
  3. You can click next and click through until the bucket is created the default options are fine.
  4. Click on your new bucket and you should be able to see its contents, which are currently empty

For the remainder of this article you can follow along using our AWS Certified Nifi Image. We developed this image as part of the AWS Partner program to run on the cloud without needing any setup or configuration

Setting up your NiFi + AWS Credential Service or Processor Controls

NiFi has many ways to provide access to AWS either through an overarching credential service or parameters set to a specific processor.

The credential service is ideal for when you have multiple processors all relying on the same keys. For the scope of this tutorial we will not be using that but it is ideal when moving into a production setting. For a break down of credential services check out our blog post on the subject

  • To get started Click and drag in a new processor “PutS3Object” and right-click>configure the processor
  • Under the Settings Tab check the boxes next to failure & success as this is going to be the last processor in the flow

Under the properties tab configure the following properties:

A processor to drop data into AWS S3
  • Access Key ID - From the User you created earlier and noted down
  • Secret Access Key ID - From the User you created earlier and noted down
  • Bucket - Put the name of the bucket you created
  • Region - The region your bucket is in; often US East (N. Virginia)

Press apply to finish up configuring the processor

Setting Up your Flow

For the purposes of this sample flow lets replicate Nifi’s own configuration directory to S3. To accomplish this we need two additional processors, List files & Fetch Files. Connect them As shown below and configure them as:

Nifi flow for moving data from a local machine to AWS S3

ListFile

No alt text provided for this image
  • Properties Tab - Set “Input Directory” to /nifi/docs/html
  • Drag a connection from ListFile to FetchFile for relationship Success

FetchFile

  • Settings Tab - Check the Boxes Next to “Failure”, “not.found”, * “permission.denied”
  • Drag a connection from FetchFile to PutS3Object For relationship Success

Running your Flow

  • Right click each of the processors and press “Start”
  • Let this run for a few seconds, if you want to track the progress right click into any blank space of your NiFi canvas and press “refresh.” You should see each processor reporting flowfiles “in” and “out”
  • For the purposes of this demo you should probably right-click>stop list files. In production you can leave this task long running but itis always best to stop demos when done. This stops the demo from producing sample files after you stopped using the program.

Viewing the Objects in S3

If you return to your bucket and look for files you should see them listed. Note: you may have to press the refresh button on the top right depending on your browser/settings.

[Optional] Security Cleanup

As an optional step you may wish to revoke the access keys you gave to this NiFi Demo. It is general best practice to remove unused keys when done.

To revoke the keys go here: https://console.aws.amazon.com/iam/home#/users

  • Left click on the user you created earlier in the tutorial
  • Go to the "security credentials" tab and look for the subsection "access keys" Here you can inactivate, delete, or even make new keys.
  • Make the key inactive or delete the key for best practices.

Next Steps:

Try Migrating Cloud Data for Yourself - Learn how here

Learn how to Stream Log Data to MySQL on AWS

Aroh Sunday

Senior Software Engineer | Javascript, Nodejs, Typescript, Solidity | Smart Contract Developer | MIcroservice | Mentor @CareerFoundry

3 年

Great. What about using Nifi to move data from MongoDB to Bigquery. How can such be set up? Thanks and regards

回复
Chris Gambino

Lead Architect | Co-Founder at Calculated Systems

5 年

Remember to checkout our NiFi for Dummies eBook!? https://www.calculatedsystems.com/nifi-for-dummies

要查看或添加评论,请登录

Chris Gambino的更多文章

  • NiFi and Retrieval Augmented Generation

    NiFi and Retrieval Augmented Generation

    Phase 1 – “Basic Knowledge” We built a real time slackbot to help answer NiFi questions. To build and host this…

    1 条评论
  • Cloud First IoT with Syft

    Cloud First IoT with Syft

    Introduction Syft Technologies is a leading scientific equipment manufacturer specializing in chemical analysis. To…

  • A Crash Course for Amazon Natural Language Processing

    A Crash Course for Amazon Natural Language Processing

    Over the past few years we have seen a rise in cloud native “machine learning” models. These general use models are…

  • What I Learned from 2.75 Million Bike Rides

    What I Learned from 2.75 Million Bike Rides

    What do you think is the most popular bicycle spot is in San Francisco? I’ll give you a hint, over 129,000 people…

  • Automated Data Collection with NiFi

    Automated Data Collection with NiFi

    Introduction Manufacturing is a field that is undergoing a complete transformation in the era of faster and more…

    2 条评论
  • Create A Restful API for Nifi, Walmart Case Study

    Create A Restful API for Nifi, Walmart Case Study

    I was recently tinkering with the walmart rest-api. This is publicly available interface and can be used for a quick…

  • Windows Share + Nifi + HDFS – A Practical Guide

    Windows Share + Nifi + HDFS – A Practical Guide

    Recently I had a client ask about how would we go about connecting a windows share to Nifi to HDFS, or if it was even…

    1 条评论
  • Parsing XML Logs With Nifi – Part 1 of 3

    Parsing XML Logs With Nifi – Part 1 of 3

    I have a plan to write a 3 part “intro” series as to how to handle your XML files. The subjects will be: Basic XML and…

    1 条评论
  • Integrating Nifi with Graylog

    Integrating Nifi with Graylog

    Graylog is gaining popularity as a log exploration tool. So this begs the question, how do you intelligently route your…

    1 条评论
  • Building a Smarter Home with Nifi and Spark

    Building a Smarter Home with Nifi and Spark

    I submitted an abstract for the hadoop world summit. Check it out and vote for it here Join us as we discuss what life…

    2 条评论

社区洞察

其他会员也浏览了