The response to our previous post, , was extraordinary - we’ve fielded dozens of calls from enterprises asking us for repatriation advice. We have aggregated those responses into this new post, where we dig a little deeper into the costs and savings associated with repatriation to make it easier for you to put together your own analysis. Migration of data is a daunting task for many. In practice, they target new data to come to MinIO and take their sweet time to migrate old data from the cloud or leave it in place and not grow. How to Repatriate From AWS S3 to MinIO Repatriation Overview To repatriate data from AWS S3, you will follow these general guidelines: Determine the specific buckets and objects that need to be repatriated from AWS S3. Make sure you understand business needs and compliance requirements on a bucket-by-bucket basis. Review Data Requirements: You’ve already decided to repatriate to MinIO, now you can choose to run MinIO in an on-premises data center or at another cloud provider or colocation facility. Using the requirements from #1, you will select hardware or instances for forecasted storage, transfer and availability needs. Identify Repatriation Destination: Plan and execute the transfer of data from AWS S3 to MinIO. Simply use MinIO's built-in Batch Replication or mirror using the MinIO Client (see for details). There are several additional methods you can use for data transfer, such as using AWS DataSync, AWS Snowball or , or directly using AWS APIs. Data Transfer: How to Repatriate From AWS S3 to MinIO TD SYNNEX data migration Ensure that appropriate access controls and permissions are set up for the repatriated data on a per-bucket basis. This includes IAM and bucket policies for managing user access, authentication, and authorization to ensure the security of the data. Data Access and Permissions: It is critical to preserve the object lock retention and legal hold policies after the migration. The target object store has to interpret the rules in the same way as Amazon S3. If you are unsure, ask for the on the target object store implementation. Object Locks: Cohasset Associates Compliance Assessment Define and implement a data lifecycle management strategy for the repatriated data. This includes defining retention policies, backup and recovery procedures, and data archiving practices on a per-bucket basis. Data Lifecycle Management: Validate the transferred data to ensure its integrity and completeness. Perform necessary checks and tests to ensure that the data has been successfully transferred without any corruption or loss. After the transfer, the object name, ETag and metadata, checksum and the number of objects all match between the source and destination. Data Validation: The good news is that if you follow cloud-native principles to build your applications, then all you will have to do is reconfigure them for the new MinIO endpoint. However, if your applications and workflows were designed to work with the AWS ecosystem, make the necessary updates to accommodate the repatriated data. This may involve updating configurations, reconfiguring integrations or in some cases modifying code. Update Applications and Workflows: Continuously monitor and optimize the repatriated data environment to ensure optimal performance, cost-efficiency, and adherence to data management best practices. Monitor and Optimize: Repatriation Steps There are many factors to consider when budgeting and planning for cloud repatriation. Fortunately, our engineers have done this with many customers and we’ve developed a detailed plan for you. We have customers that have repatriated everything from a handful of workloads to hundreds of petabytes. The biggest planning task is to think through choices around networking, leased bandwidth, server hardware, archiving costs for the data not selected to be repatriated, and the human cost of managing and maintaining your own cloud infrastructure. Estimate these costs and plan for them. Cloud repatriation costs will include data egress fees for moving the data from the cloud back to the data center. These fees are intentionally high enough to compel cloud lock-in. Take note of these high egress fees - they substantiate the economic argument to leave the public cloud because, as the amount of data you manage grows, the egress fees increase. Therefore, if you’re going to repatriate, it pays to take action sooner rather than later. We’re going to focus on data and metadata that must be moved – this is eighty percent of the work required to repatriate. Metadata includes bucket properties and policies (access management based on access/secret key, lifecycle management, encryption, anonymous public access, object locking and versioning). Let’s focus on data (objects) for now. For each namespace you want to migrate, take inventory of the buckets and objects you want to move. It is likely that your DevOps team already knows which buckets hold important current data. You can also use . At a high level, this will look something like: Amazon S3 Inventory Namespace Total Buckets Total Object Count Total Object Size (GB) Daily Total Upload (TB) Daily Total Download (TB) ns-001 166 47,751,258 980,014.48 50.04 14.80 ns-002 44 24,320,810 615,033.35 23.84 675.81 ns-002 648 88,207,041 601,298.91 328.25 620.93 ns-001 240 68,394,231 128,042.16 62.48 12.45 The next step is to list, by namespace, each bucket and its properties for every bucket you’re going to migrate. Note the application(s) that store and read data in that bucket. Based on usage, classify each bucket as hot, warm or cold tier data. In an abridged version, this will look something like Bucket Name Properties App(s) Hot/Warm/Cold Tier A Copy and paste JSON here Spark, Iceberg, Dremio Hot B Copy and paste JSON here Elastic Warm C Copy and paste JSON here Elastic (snapshots) Cold You have some decisions to make about data lifecycle management at this point and pay close attention because here’s a great way to save money on AWS fees. Categorize objects in each bucket as hot, warm or cold based on how frequently they are accessed. A great place to save money is to migrate cold tier buckets directly to S3 Glacier – there’s no reason to incur egress fees to download just to upload again. Depending on the amount of data you’re repatriating, you have a few options to choose how to migrate. We recommend that you load and work with new data on the new MinIO cluster while copying hot and warm data to the new cluster over time. The amount of time and bandwidth needed to copy objects will, of course, depend on the number and size of the objects you’re copying. Here’s where it will be very helpful to calculate the total data that you’re going to repatriate from AWS S3. Look at your inventory and total the size of all the buckets that are classified as hot and warm. Total Hot and Warm Tier Data = 1,534,096.7 GB Available bandwidth = 10 Gbps Minimum Transfer Time required (total object size / available bandwidth) = 14.2 days Calculate data egress fees based on the above total. I’m using , but your organization may qualify for a discount from AWS. I’m also using 10 Gbps as the connection bandwidth, but you may have more or less at your disposal. Finally, I’m working from the assumption that one-third of S3 data will merely be shifted to S3 Glacier Deep Archive. list price Total Data Tiered to S3 Glacier = 767,048.337 GB S3 to S3 Glacier transfer fees ($0.05/1000 objects) = $3,773.11 S3 Glacier Deep Archive monthly storage fee = $760 Don’t forget to budget for S3 Glacier Deep Archive usage moving forward. Total Data to be Transferred = 1,534,096.7 GB First 10 TB at $0.09/GB = $900 Next 40 TB at $0.085/GB = $3,400 Next 100 TB at $0.07/GB = $70,000 Additional over 150 TB at $0.05/GB = $69,205 Total Egress Fees = $143,504 For the sake of simplicity, the above calculation includes neither the fee for per object operations ($0.40/1m) nor the cost of LISTing ($5/1m). For very large repatriation projects, we can also compress objects before sending them across the network, saving you some of the cost of egress fees. Another option is to use AWS Snowball to transfer objects. Snowball devices are each 80TB, so we know up front that we need 20 of them for our repatriation effort. The per-device fee includes 10 days of use, plus 2 days for shipping. Additional days are available for $30/device. 20 Snowball Devices Service Fee ($300 ea) = $6,000 R/T shipping (3-5 days at $400/device) = $8,000 S3 data out ($0.02/GB) = $30,682 Total Snowball Fees = $38,981.93 AWS will charge you standard request, storage, and data transfer rates to read from and write to AWS services including and . There are further considerations when working with . For S3 export jobs, data transferred to your Snow Family device from S3 are billed at standard S3 charges for operations such as LIST, GET, and others. You are also charged standard rates for Amazon CloudWatch Logs, Amazon CloudWatch Metrics, and Amazon CloudWatch Events. Amazon S3 AWS Key Management Service (KMS) Amazon S3 storage classes Now we know how long it will take to migrate this massive amount of data and the cost. Make a business decision as to which method meets your needs based on the combination of timing and fees. At this point, we also know the requirements for the hardware needed to run MinIO on-prem or at a colocation facility. Take the requirement above for 1.5PB of storage, estimate data growth, and consult our page and . Recommended Hardware & Configuration Selecting the Best Hardware for Your MinIO Deployment The first step is to recreate your S3 buckets in MinIO. You’re going to have to do this regardless of how you choose to migrate objects. While both S3 and MinIO store objects using server-side encryption, you don’t have to worry about migrating encryption keys. You can connect to your KMS of choice using . This way, new keys will be automatically generated for you as encrypted tenants and buckets are created in MinIO. MinIO KES to manage encryption keys You have multiple options to copy objects: Batch Replication and . My previous blog post, included detailed instructions for both methods. You can copy objects directly from S3 to on-prem MinIO, or use a temporary MinIO cluster running on EC2 to query S3 and then mirror to on-prem MinIO. mc mirror How to Repatriate From AWS S3 to MinIO Typically, customers use tools we wrote combined with AWS Snowball or TD SYNNEX’s data migration hardware and services to move larger amounts of data (over 1 PB). MinIO recently partnered with Western Digital and TD SYNNEX to field a Snowball alternative. Customers can schedule windows to take delivery of the Western Digital hardware and pay for what they need during the rental period. More importantly, the service is not tied to a specific cloud - meaning the business can use the service to move data into, out of, and across clouds - all using the ubiquitous S3 protocol. Additional details on the service can be found on the page on the TD SYNNEX site. Data Migration Service Bucket metadata, including policies and bucket properties, can be read using and then set up in MinIO. When you sign up for MinIO SUBNET, our engineers will work with you to migrate these settings from AWS S3: access management based on access key/secret key, lifecycle management policies, encryption, anonymous public access, immutability and versioning. One note about versioning, AWS version ID isn’t usually preserved when data is migrated because each version ID is an internal UUID. This is largely not a problem for customers because objects are typically called by name. However, if AWS version ID is required, then we have an extension that will preserve it in MinIO and we’ll help you enable it. get-bucket S3 API calls Pay particular attention to . S3 isn’t going to be the only part of AWS’s infrastructure that you leave behind. You will have a lot of service accounts for applications to use when accessing S3 buckets. This would be a good time to list and audit all of your service accounts. Then you can decide whether or not to recreate them in your identity provider. If you choose to automate, then use Amazon Cognito to share IAM information with external OpenID Connect IDPs and AD/LDAP. IAM and bucket policies Pay particular attention to Data Lifecycle Management, such as object retention, object locking and archive/tiering. Run a on each bucket to obtain a human-readable JSON list of lifecycle rules. You can easily recreate AWS S3 settings using MinIO Console or MinIO Client (mc). Use commands such as and to pinpoint objects that require special security and governance treatment. get-bucket-lifecycle-configuration get-object-legal-hold get-object-lock-configuration While we’re on the subject of lifecycle, let’s talk about backup and disaster recovery for a moment. Do you want an additional MinIO cluster to replicate to, for backup and disaster recovery? After objects are copied from AWS S3 to MinIO, it’s important to validate data integrity. The easiest way to do this is to use the MinIO Client to run against old buckets in S3 and new buckets on MinIO. This will compute the difference between the buckets and return a list of only those objects that are missing or different. This command takes the arguments of the source and target buckets. For your convenience, you may want to create for S3 and MinIO so you don’t have to keep typing out full addresses and credentials. For example: mc diff aliases mc diff s3/bucket1 minio/bucket1 The great news is that all you have to do is point existing apps at the new MinIO endpoint. Configurations can be rewritten app by app over a period of time. Migrating data in object storage is less disruptive than a filesystem, just change the URL to read/write from a new cluster. Note that if you previously relied on AWS services to support your applications, those won’t be present in your data center, so you’ll have to replace them with their open-source equivalent and rewrite some code. For example, Athena can be replaced with Spark SQL, Apache Hive and Presto, Kinesis with Apache Kafka, and AWS Glue with Apache Airflow. If your S3 migration is part of a larger effort to move an entire application on-prem, then chances are you used to call downstream services when new data arrived. If this is the case, then do not fear - MinIO supports as well. The most straightforward migration here would be to implement a custom webhook to receive the notification. However, if you need a destination that is more durable and resilient, then use messaging services such as Kafka or RabbitMQ. We also support sending events to databases such as PostgreSQL and MySQL. S3 event notifications event notification Now that you’ve completed repatriating, it’s time to turn your attention to storage operation, monitoring and optimization. The good news is that no optimization is needed for MinIO – we’ve built optimization right into the software so you know you’re getting the best performance for your hardware. You’ll want to start monitoring your new MinIO cluster to assess resource utilization and performance on an ongoing basis. MinIO exposes via a Prometheus endpoint that you can consume in your . For more on monitoring, please see and . metrics monitoring and alerting platform of choice Multi-Cloud Monitoring and Alerting with Prometheus and Grafana Metrics with MinIO using OpenTelemetry, Flask, and Prometheus With , we have your back when it comes to with MinIO. Subscribers gain access to built-in automated troubleshooting tools to keep their clusters running smoothly. They also get unlimited, direct-to-engineer support in real-time via our support portal. We also help you future-proof your object storage investment with an annual architecture review. SUBNET Day 2 operations Migrate and Save It’s far from a secret that the days of writing blank checks to cloud providers are gone. Many businesses are currently evaluating their cloud spend to find potential savings. Now you have everything you need to start your migration from AWS S3 to MinIO, including concrete technical steps and a financial framework. If you get excited about the prospect of repatriation cost savings, then please reach out to us at . hello@min.io Also appears . here
Share Your Thoughts