AWS DataSync: Transfer data to AWS Efficiently

AWS DataSync

AWS DataSync is a managed service that automates the transfer of data between on-premises storage and AWS storage, as well as between different AWS storage services. Some example use cases include backing up on-prem data to the cloud, moving data to AWS for processing, and moving data from one S3 bucket to another in a different region. 

DataSync is available through the  console or the AWS CLI (Command Line Interface) and can be used to move data between any of the following:

  • NFS (Network file system)
  • Self-managed object store
  • Amazon S3 & Glacier
  • AWS Snowcone
  • Amazon EFS (Elastic File System)
  • Amazon FSx for Windows File Server file systems

How does it work?

AWS Datasync- transferring data
Diagram from https://aws.amazon.com/datasync/

The diagram above shows a high-level overview of how AWS DataSync operates to transfer data from on-premise storage to AWS storage. 

Transfer Data from on-premise to AWS

  1. Deploy the DataSync agent as a VM and connect to your on-premises storage system. This also works with Amazon Outposts and AWS Snowcone, which are examples of on-premise infrastructure provided by AWS.
  2. The agent connects to AWS DataSync either through the internet or directly through AWS Direct Connect.
  3. Create a task in AWS DataSync to transfer data over to an AWS storage service 
  4. Data will be securely transferred and validated for you. Status and progress can be checked in AWS CloudWatch.


AWS DataSync can also be used to move data between AWS storage services themselves, for example, from Amazon EFS to S3 or from one S3 bucket to another. 

AWS DataSync Features and Benefits

  • Fully Managed and Automated: DataSync is a fully managed service, which means that there is no need to provision special infrastructure or create custom scripts to copy data. In addition, by using it in combination with CloudWatch events, you can set it to trigger an event when a transfer completes, thereby automating your workflows. 
  • Preserve metadata: DataSync keeps the integrity of file system metadata and file permissions between storage systems in tact, allowing for easy transition. 
  • Fast and Efficient: DataSync was engineered to transfer data as efficiently as possible, by performing network optimizations including in-line compression and incremental transfers. It also uses multi-threaded architecture to ensure high speeds between local and AWS connections. 
  • Secure and robust: All data is encrypted in transit and validated to ensure consistency between source and destination data. Encryption at rest can also be enabled for AWS Storage services.
  • Task Scheduling: You can create tasks within DataSync that will periodically check and copy changes from a data source based on a specified schedule. 
  • Monitoring and Auditing: You can use CloudWatch to view the status of DataSync transfers in progress as well as data about any past transfers. CloudWatch Logs can be used to check the result of any integrity checks that DataSync performs on an individual file basis. For auditing, CloudTrail can be used to view any past DataSync actions. Learn more about CloudWatch vs Cloudtrail.

How much does AWS DataSync Cost?

AWS DataSync costs $0.0125 per gigabyte (GB) of data copied. 

Note that this pricing is based on a flat per-GB fee for the use of DataSync only and does not include the additional fees that may be incurred from data transfer rates to read/ write to AWS services. See full pricing details here. 

When to use AWS DataSync? Use Cases and Examples

Data Migration

When migrating your on-premise workloads to the cloud, there is often large amounts of active data that needs to be moved over. DataSync can be used to copy such data quickly and efficiently  into AWS storage services such as Amazon S3, EFS and FSx for Windows File Server. The migration can be done in stages so that once the bulk of the data is copied over, tasks are scheduled to check for changes and transfer only the changed data until the switch over is complete. 

Data Backups & Replication

On-premise to AWS Storage: As with all critical data, it is important to keep backups and ensure data redundancy for disaster recovery. DataSync can be used to replicate and store a copy of your data to AWS storage services to ensure that it is protected. These services all have built-in encryption settings that can be turned on to ensure that your data is stored securely at rest. 

Between AWS Storage Services: Although AWS has many of its own measures built-in to their storage services to ensure availability and redundancy of data, it is still a good idea to maintain several copies of crucial data. DataSync can be used to backup data from one AWS storage service into another. Services such as Amazon S3 already has built-in tools for automatic data replication from one bucket to another. However, the replication only occurs for new data added to the bucket after the replication setting was turned on. If there is prior data that needs to be backed up to a new bucket in a different region, DataSync can be used to easily copy files over. The other advantage of using DataSync is that you can specify rules to dictate which files/paths to copy within a bucket whereas S3 replication applies to the entire bucket. 

Data Archives

AWS DataSync can be used to move cold data (data that will rarely/never need to be accessed) to the cloud rather than keeping it on on-premise storage systems. Not only is this useful for freeing up the on-prem storage but it may often be more cost effective. AWS has storage tiers such as Amazon S3 Glacier and Glacier Deep Archive which are specifically geared towards storing cold data and are priced very competitively.

Data for Hybrid Environments

Many organizations use a hybrid approach where some workloads are handled in AWS and some workloads remain on-premises. In these setups, large volumes of data might be generated on one system and needs to be transferred to AWS for processing (e.g for data cleansing, data analysis, machine learning, etc.) DataSync can be used to move data between these different workflows to simplify and streamline the process. 

AWS DataSync Demo & Tutorial

Transferring Data from on-premise to AWS

Below is a quick walkthrough and tutorial of how to create a DataSync agent to transfer data to and from AWS.

Transferring Data between AWS Storage Services

Below is a tutorial on how to create a task and select the source/destination AWS storage service to transfer data within AWS with DataSync.

Found this interesting? Share it!
Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *