Scenario

Businesses with a self-built Hadoop cluster, deployed on premises or on a cloud-based virtual machine, often run their system over HDFS storage. To leverage the numerous benefits provided by a managed service, these businesses can migrate their clusters to Alibaba Cloud E-MapReduce (EMR).

This solution allows you to set up a VPN tunnel to connect your existing network and Alibaba Cloud. Then you can use DistCp, a built-in data replication tool of Hadoop, to migrate your data to Alibaba Cloud EMR based on a variety of storage types. This solution provides an affordable yet secure network for data migration between your on-premises environment and Alibaba Cloud over a VPN tunnel; supports your EMR cluster with a wider range of storage choices to achieve significant cost savings; and allows you to directly perform analytics on the data using supported services, such as Hadoop, Hive, Spark, and Flink.