An honest review of AWS DataSync
To start off with, its a great service if you want to continuously sync or a one time copy of data from one location to another. Following location types are supported.
1 - Amazon EFS file System
2 - Amazon FSx
3 - Amazon S3
4 - Hadoop Distributed File System (HDFS)
5 - Network File System (NFS)
6 - Object storage
7 - Server Message Block (SMB)
In my scenario, I had to copy data from an on-premises SMB share to a S3 bucket residing in a separate account than the one where DataSync service was commissioned.
I had a DataSync appliance deployed as an EC2 instance in a VPC with no direct internet access. Deploying it in a VPC meant I could use our DirectConnect connection to access on-premises storage. I also had a VPC endpoint configured for DataSync service to avoid sending data through the public internet. Even though in this case it would have been AWS's own network.?
领英推荐
First decision to be made was how to activate the agent on EC2 instance. The choice was either a manual activation by logging into the instance, generating an activation key and registering the agent, or create the agent via DataSync console and point to the instance which automatically activates it. Later option was the way to go, manually generating an activation key doesn't seem to work that smoothly.
Second step was to configure locations, both source and destination. Since I was planning to copy the data to a S3 bucket residing in a different account, I needed to configure it via CLI as the console only lets you choose buckets in the local account. However, this didn't work either. Apparently the only scenario it works for is when you are not using an appliance and configuring DataSync to copy data directly from one S3 bucket to another one in a different account. I gave up on that and decided to use a local S3 bucket and then use S3 sync feature to copy data to the destination bucket in the other account.
Now came the time to configure on-premises SMB share as a location. Location for SMB share was created smoothly but mounting of this share only happens when you go to create a task for data copy and if unsuccessful you are greeted with an error message. Which after some googling I found was either because of the share name, mine had a $ sign in it, or because of SMB version.?I tried different option combinations but none of them worked, including the use of an escape character "\" for $. While going through these trials, I realised how great it is to have AWS CloudShell where you can use AWS CLI to keep repeating these different options without developing a cramp in your hand from repeated ClickOps. Plus there is no need to assume roles or specify AWS profiles like you do from your local machine if you were using AWS CLI.
In the end what worked was to reconfigure on-premises share without $, after that everything was smooth sailing. However, following enhancements/fixes would be very welcomed.
1 - Use of $ in share names. At least allow escape characters when creating locations
2 - Copying data to S3 buckets in different accounts
3 - Bit more logging and ability to test mount shares from within the appliance. I had to test mine by launching a separate instance in the same VPC
4 - Mount SMB shares as soon as the location is created and not when a task is. At least the users wouldn't have to keep re-creating tasks just to see if the share is mounting successfully
ICT Infrastructure Specialist | Cloud & Virtualization | DevSecOps | Automation & IaC | System Design & Architect
3 年That’s helpful. Thanks for sharing it Imran!