How to transfer data with AWS DataSync the easy way
AWS DataSync is a great, secure AWS service to move data between on premises and AWS storage services. With DataSync, you can copy data between Network File System (NFS) or Server Message block (SMB) file servers, Amazon Simple Storage Service (S3), Amazon Elastic File System (EFS) and Amazon FSx for Windows File Servers.
To get the basic understanding of DataSync, please follow this AWS Documentation
Recently, when I was working to migrate gigabytes of data from on-premises to cloud, and to transfer some specific set of data from one S3 to external account S3, I started evaluating the options of data transfer services in terms of time and cost, and looking at the available options among AWS services, I decided to use DataSync for this requirement. I want to highlight some pros and cons of DataSync as per my experience.
?Pros:
Cons:
DataSync can be used for data transfer in two scenarios: on-premises to Cloud and Cloud to Cloud. Both scenarios are discussed below, along with the code to automate some of the steps.
Scenario 1: On-premises to Cloud Data Migration
We used AWS DataSync to transfer on-premises data to AWS S3 storage. After analyzing multiple available services, we decided to go for DataSync as along with data transfer, it can manage the data integrity via checksum verification and provides multiple options of include and exclude patterns to transfer the specific data.
Steps to setup DataSync agent on on-premises side
To transfer data between on-premises storage systems and AWS Storage services, deploy a DataSync agent and associate it to your AWS account via the Management Console or API. The agent will be used to access your NFS server or SMB file share to read data from or write data to.?Please follow the steps as mentioned in the AWS blog for agent setup on on-premises side.
Steps to follow on destination account
1.????Go to DataSync service in AWS management console on destination account and select “Create Agent”
2.????Create a destination S3 bucket in the destination account/region with default attributes.
3.????Create an SNS Topic “datasync-notify” to notify Users in case of any DataSync transfer failure.
4.????Create a DataSync Role [s3_data_sync_access] in IAM. Allow DataSync to read and write to your Amazon S3 bucket. The following example policy grants DataSync the minimum permissions to read and write data to your S3 bucket (replace <YourS3BucketArn> with the ARN of your bucket).
{
???"Version": "2012-10-17",
???"Statement": [
???????{
???????????"Action": [
???????????????"s3:GetBucketLocation",
???????????????"s3:ListBucket",
???????????????"s3:ListBucketMultipartUploads"
???????????],
???????????"Effect": "Allow",
???????????"Resource": "<YourS3BucketArn>"
???????},
???????{
???????????"Action": [
???????????????"s3:AbortMultipartUpload",
???????????????"s3:DeleteObject",
???????????????"s3:GetObject",
???????????????"s3:ListMultipartUploadParts",
???????????????"s3:GetObjectTagging",
???????????????"s3:PutObjectTagging",
???????????????"s3:PutObject"
?????????????],
???????????"Effect": "Allow",
???????????"Resource": "<YourS3BucketArn>/*"
???????}
???]
}
5. Set up a DataSync source location (NFS server) on the destination account.
6. Set up a DataSync destination location (S3) on the destination account.
7. Create a DataSync task to initiate data transfer with specified parameters for source location, destination location, settings and task logging
8. Start execution of DataSync Task
You can also use the lambda code in my git repo to automate the steps 5 through 8.
Lambda Input:
领英推荐
{
"sourceLocation": “NAME OF SOURCE Directory”,
“destinationLocation": “NAME OF DESTINATION BUCKET”,
“AgnetARN”: "arn:aws:datasync:<REGION>:<ACCNT ID>:agent/<AGENT ID>",
“NFSServer”: “ServerHostname”
}
Note: NFSServer is the name of the NFS server. This value is the IP address or Domain Name Service (DNS) name of the NFS server. An agent that is installed on-premises uses this host name to mount the NFS server in a network.
Lambda Output:
{
"status": “TRANSFERRING”,
“taskid": “arn:aws:datasync:region:account-id:task/task-id”
}
Scenario 2: S3 to S3 Cross Account Data Migration
To transfer the specific S3 data to external account S3, we can again rely on DataSync to transfer the data across the AWS accounts, I found it works better than the S3 copy or S3 sync for cross account copy.
You need to follow certain steps on source as well as destination accounts in order to start seamless data transfer.
Steps to setup DataSync on Destination account
1.????On Destination AWS Account, create S3 bucket where output should be copied.
2.????Please make sure source and destination buckets are in same AWS region.
3.????Enter the following S3 Bucket Policy on destination bucket (replace <sourcebucketaccount> and <destinationbucket> with appropriate values).
{
"Version": "2012-10-17"
"Statement": [
?????{
????????"Sid": "BucketPolicyForDataSync",
????????"Effect": "Allow",
????????"Principal": {
???????????"AWS": [
????????????"arn:aws:iam::<sourcebucketaccount>:role/datasync-role-source",
????????????"arn:aws:iam::<sourcebucketaccount>:root"
???????????]
????????},
????????"Action": [
?????????????"s3:GetBucketLocation",
?????????????"s3:ListBucket",
?????????????"s3:ListBucketMultipartUploads",
?????????????"s3:AbortMultipartUpload",
?????????????"s3:DeleteObject",
?????????????"s3:GetObject",
?????????????"s3:ListMultipartUploadParts",
?????????????"s3:PutObject",
?????????????"s3:GetObjectTagging",
?????????????"s3:PutObjectTagging"
??????????],
???????"Resource": [
?????????????"arn:aws:s3:::<destinationbucket>",
?????????????"arn:aws:s3:::<destinationbucket>/*"
????????????]
??????}
???]
}?
Steps to setup DataSync on Source account
- AWSDataSyncFullAccess
- AWSDataSyncReadOnlyAccess
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Effect": "Allow",
"Resource":
"arn:aws:s3:::<sourcebucket>",
"arn:aws:s3:::<destinationbucket>"
},
{
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:ListMultipartUploadParts",
"s3:PutObjectTagging",
"s3:GetObjectTagging",
"s3:PutObject"
],
"Effect": "Allow",
"Resource":
"arn:aws:s3:::<sourcebucket>/*",
"arn:aws:s3:::<destinationbucket>/*"
}
]
}
You can also use the lambda code in my git repo to automate the steps 3 through 8.
Lambda Input:
{
"SourceBucketName": “NAME OF SOURCE BUCKET”,
“external_bucket": “NAME OF DESTINATION BUCKET”
}
Lambda Output:
{
"status": “TRANSFERRING”,
“taskid": “arn:aws:datasync:region:account-id:task/task-id”
}
Very soon, I will be publishing a CloudFormation Template to automate all the above steps on source account.
Data Engineering Consultant at Optum | UnitedHealth Group
3 年This is amazing, One of the best article and tested and its work perfectly.
Strategic IT Leader-?? | Trusted Partner for C-Level Stakeholders | Driving Cloud IT Infrastructure Transformation and Delivery | $100M+ Managed Accounts |
3 年Too good. One of the best blogs about one of the costliest service