Quantcast
Channel: IBM Data Science in Practice - Medium
Viewing all articles
Browse latest Browse all 330

Data Resilience Unleashed: A Comprehensive Guide to Multi-Site Active-Active Bucket Replication…

$
0
0

Data Resilience Unleashed: A Comprehensive Guide to Multi-Site Active-Active Bucket Replication with MinIO — watsonx.data

Photo by Joshua Sortino on Unsplash

Data replication is a crucial part of managing data in modern systems. It involves making copies of data and keeping them in different places to make sure the information stays accurate, the system stays strong, and people can always get to the important data.

There are different ways of data replication:

  • Full Replication: Copy everything to keep everything the same. It’s good for accuracy but can use a lot of resources.
  • Partial Replication: Copy only some parts of the data to save resources while keeping the important information.
  • Snapshot Replication: Take pictures of the data at specific times. This helps look back in time and get data from certain points.
  • Transactional Replication: Keep data updated in real-time. This is important when changes happen instantly and need to be copied right away.
  • Merge Replication: Useful for systems with data coming from different places. It brings everything together to keep a complete set of data.

The advantages of data replication are:

  • Always Available Data: Copies of data are in many places, so if something goes wrong, the data is still there.
  • Balanced Work: Replication helps the system work well by spreading out the tasks, so no one part gets too busy.
  • Disaster Recovery: If something big happens, like a disaster, having copies of data in different places helps get everything back to normal.
  • Handling Problems: Replication makes sure data is safe, even if some parts of the system have issues.

However, there are also challenges, like making sure all the copies of data are the same and dealing with problems that might come up when different copies are being made. The way you choose to replicate data depends on what your system needs and what limitations it has.

Data replication on MinIO bucket is implemented through watsonx.data developer edition.

Bucket Replication

MinIO supports server-side and client-side replication of objects between source and destination buckets.

  • Server-Side Bucket Replication

Configure per-bucket rules for automatically synchronizing objects between MinIO deployments. The deployment where you configure the bucket replication rule acts as the “source” while the configured remote deployment acts as the “target”. MinIO applies rules as part of object write operations (e.g. PUT) and automatically synchronizes new objects and object mutations, such as new object versions or changes to object metadata.

MinIO server-side bucket replication only supports a MinIO cluster on an identical release for the remote replication target.

  • Client-side Bucket Replication

Use the command process to synchronize objects between buckets within the same S3-compatible cluster or between two independent S3-compatible clusters. Client-side replication using mc mirror supports MinIO-to-S3 and similar replication configurations.

Bucket Replication Setup

Bucket replication uses rules to synchronize the contents of a bucket on one MinIO deployment to a bucket on a remote MinIO deployment.

Replication can be done in any of the following ways:

  • Active-Passive: Eligible objects replicate from the source bucket to the remote bucket. Any changes on the remote bucket do not replicate back.
  • Active-Active: Changes to eligible objects of either bucket replicate to the other bucket in a two-way direction.
  • Multi-Site Active-Active: Changes to eligible objects on any bucket set up for bucket replication replicate to all of the other buckets.

Implementation of Multi-Site Active — Active bucket replication

Multi-site active-active replication is a data replication strategy that involves maintaining identical copies of data across multiple geographically distributed sites. In an active-active replication setup, all sites are actively serving read and write requests simultaneously. This means that changes made at one site are quickly propagated to other sites, ensuring that all sites have up-to-date and consistent data.

In order to follow the steps below information are required

  • Source watsonx.data URL and port
  • Source MinIO console URL and port
  • Source MinIO endpoint URL and port
  • Target watsonx.data URL and port
  • Target MinIO console URL and port
  • Target MinIO endpoint URL and port

Data replication is implemented for watsonx.data developer edition.

Section 1. Steps to be performed on Source watsonx.data / MinIo instance

Under this section Target MinIO bucket is configured in Source watsonx.data and MinIO object storage account with relevant permissions.

1.1 Logon to Source watsonx.data instance through CLI.

ssh -p <port_number> <username>@<region>.<domain>
  • ssh: command used to initiate the secure shell connection.
  • -p <port_number>: This option is used to specify the port number on which the SSH server is running. Replace <port_number> with the actual port number you want to use.
  • <username>: This is the username you use to log in to the remote server. Replace <username> with your actual username.
  • @<region>.<domain>: This part specifies the hostname or IP address of the remote server. <region> could refer to a geographical region, and <domain> could be a domain name or an IP address. Replace <region> and <domain> with the specific values for the server you are connecting to.

2 Navigate to MinIO binaries directory

cd <path>/minio-binaries/

1.3 Create an alias for target MinIO object storage

mc alias set <target_alias_name> http://<region>.<domain>:<Port> <accesskey> <sceretkey><target_alias_name> This is the alias or nickname you want to give to the target MinIO server configuration. You will use this alias to refer to the server in subsequent mc commands.
  • mc alias set command is used to configure and set up an alias for a MinIO server, making it easier to reference and interact with that server in subsequent commands.
  • <region>.<domain>:<Port>: The target endpoint URL of the MinIO object storage. It specifies the location where the MinIO object storage can be accessed.
  • <accesskey>: The access key associated with the target MinIO object storage. This key is used for authentication.
  • <sceretkey>:The secret key associated with the target MinIO object storage. This key, along with the access key, is used for authentication.

1.4 Create target bucket in target MinIO object storage

mc mb <target_alias_name>/<target_bucket_name>
  • mc mbCommand is used to create a new bucket on a MinIO server.
  • <target_alias_name>: Alias name of target MinIO object account as given in step 1.3
  • <target_bucket_name>: Mention Target bucket name. New bucket with the given name will be created at target MinIO object account.

1.5 Verify “target” watsonx.data alias is added into Source watsonx.data

mc alias list
  • mc alias list command is used with the MinIO Client (mc) tool to display a list of configured aliases

1.6 Enable versioning for target bucket

mc version enable <target_alias_name>/<target_bucket_name>
  • mc version enable command is used with the MinIO Client (mc) tool to enable versioning for a bucket on a MinIO server
  • <target_alias_name> : Alias of target MinIO object account as specified in 1.3
  • <target_bucket_name> : Alias of target bucket specified in 1.4

1.7 Execute the following code to create a MinIO-managed user with the necessary policies.

wget -O - https://min.io/docs/minio/linux/examples/ReplicationAdminPolicy.json | \
mc admin policy create <target_alias_name> ReplicationAdminPolicy /dev/stdin
mc admin user add <target_alias_name> ReplicationAdmin LongRandomSecretKey
mc admin policy attach <target_alias_name> ReplicationAdminPolicy --user=ReplicationAdmin
wget -O - https://min.io/docs/minio/linux/examples/ReplicationRemoteUserPolicy.json | \
mc admin policy create <target_alias_name> ReplicationRemoteUserPolicy /dev/stdin
mc admin user add <target_alias_name> ReplicationRemoteUser LongRandomSecretKey
mc admin policy attach <target_alias_name> ReplicationRemoteUserPolicy --user=ReplicationRemoteUser
  • <target_alias_name>: Provide the name of target alias created in step 1.3

1.8 Verify appropriate roles have been added to target-bucket

mc admin policy list <target_alias_name>

Target MinIO bucket is configured in Source watsonx.data and MinIO object storage account. Follow the same steps, as described in Section 2, for Source MinIO bucket configuration in taget watsonx.data and MinIO object storage account.

Section 2. Steps to be performed on Target watsonx.data / MinIO instance

2.1 Logon to Target watsonx.data instance through cli.

ssh -p <port_number> <username>@<region>.<domain>

For more information on command, kindly refer step 1.1

2.2 Navigate to MinIO binaries directory

cd <path>/minio-binaries/

2.3 Create an alias for source MinIO bucket

mc alias set <source_alias_name> http://<region>.<domain>:<Port> <accesskey> <sceretkey>
  • <source_alias_name>: This is the alias or nickname you want to give to the source MinIO server configuration. You will use this alias to refer to the server in subsequent mc commands.
  • <region>.<domain>:<Port>: The source endpoint URL of the MinIO server. It specifies the location where the MinIO server can be accessed
  • <accesskey>: The access key associated with the source MinIO server. This key is used for authentication.
  • <sceretkey>: The secret key associated with the source MinIO server. This key, along with the access key, is used for authentication.

2.4 Create source bucket in source MinIO account

mc mb <source_alias_name>/<source_bucket_name>
  • <source_alias_name>: Provide same name as given in step
  • <source_bucket_name>: Mention source bucket name. New bucket with the given name will be created at source MinIO object account.

2.5 Verify “source” alias is added into Source watsonx.data

mc alias list

2.6 Enable versioning for source bucket

mc version enable <source_alias_name>/<source_bucket_name>

2.7 Execute the following code to create a MinIO-managed user with the necessary policy for source MinIO object storage.

wget -O - https://min.io/docs/minio/linux/examples/ReplicationAdminPolicy.json | \
mc admin policy create <source_alias_name> ReplicationAdminPolicy /dev/stdin
mc admin user add <source_alias_name> ReplicationAdmin LongRandomSecretKey
mc admin policy attach <source_alias_name> ReplicationAdminPolicy --user=ReplicationAdmin
wget -O - https://min.io/docs/minio/linux/examples/ReplicationRemoteUserPolicy.json | \
mc admin policy create <source_alias_name> ReplicationRemoteUserPolicy /dev/stdin
mc admin user add <source_alias_name> ReplicationRemoteUser LongRandomSecretKey
mc admin policy attach <source_alias_name> ReplicationRemoteUserPolicy --user=ReplicationRemoteUser
  • <source_alias_name> Provide the name of source alias created in step 2.3

2.8 Verify appropriate roles have been added to source-bucket

mc admin policy list <source_alias_name>

Source MinIO bucket is configured now in target watsonx.data and MinIO object storage account.

Next steps are to configure data replication by providing source and target information along with relevant access

2.9 Add access key and secret key to MinIO host file for

mc config host add <alias_of_object_account> http://<region>.<domain>:<port> <accesskey> <sceretkey>
  • <alias_of_watsonxdata> Provide alias name of MinIO object account. Search for “Target MinIO endpoint URL and port” in the output you get by executing command
mc alias list

Result set would be available in format as below. In actual result set, you should see accesskey and secretkey , its hidden here for security purpose.

<alias_of_watsonxdata>
URL : http://<region>.<domain>:<port>
AccessKey : ******************
SecretKey : ******************
API : s3v4
Path : auto
  • <region>.<domain>:<port> The endpoint URL of the MinIO server available in above result set. It specifies the location where the MinIO server can be accessed.
  • <accesskey> The access key associated with the MinIO server. This key is used for authentication.
  • <sceretkey> The secret key associated with the MinIO server. This key, along with the access key, is used for authentication.

For example, if you want to add a MinIO server with an alias “myminio,” located at “http://example.com:9000," with access key “myaccesskey” and secret key “mysecretkey,” you would use the following command:

mc config host add myminio http://example.com:9000 myaccesskey mysecretkey

2.10 Configure replication rule from target bucket to source bucket

mc replicate add <alias_of_watsonxdata>/<target-bucket-name>                                      \                                          
--remote-bucket https://<accesskey>:<sceretkey>@<region>.<domain>:<port>/<source-bucket-name> \
--replicate "delete,delete-marker,existing-objects" \ \
--priority 1
  • mc replicate add: Command to add a new replication configuration.
  • <alias_of_watsonxdata> Retrieved in step 2.9
  • <target_bucket_name> Specified in step 1.4
  • --remote-bucket: Specifies the remote bucket to which you want to replicate data.
  • <region>.<domain>:<port> The endpoint URL of the MinIO server available in above result set. It specifies the location where the MinIO server can be accessed.
  • <accesskey> The access key associated with the MinIO server. This key is used for authentication.
  • <sceretkey> The secret key associated with the MinIO server. This key, along with the access key, is used for authentication.
  • <source-bucket-name>: Specified in step 2.4
  • --replicate: Indicates what type of operations to replicate. The provided argument "delete,delete-marker,existing-objects" suggests that the replication should include deletions (delete), delete markers (delete-marker), and existing objects (existing-objects).
  • --priority 1: Specifies the priority of the replication. Lower values indicate higher priority. The value "1" suggests that this replication has a high priority.

Section 3. Steps to be performed on Source watsonx.data / MinIO instance

Since now target and source buckets are created and replication rule has been set to replicate data from target bucket to source bucket, similar replication rule has to be setup to replicate data from source bucket to target bucket in source watsonx.data instance.

3.1 Logon to Source watsonx.data instance

ssh -p <port_number> <username>@<region>.<domain>

3.2 Navigate to MinIO binaries directory

cd <path>/minio-binaries/

3.3.1 Add access key and secret key to MinIO host file for

mc config host add <alias_of_watsonxdata> http://<region>.<domain>:<port> <accesskey> <sceretkey>
  • <alias_of_watsonxdata> Alias name of MinIO object account. Search for “Source MinIO endpoint URL and port” in the output you get by executing below command
mc alias list

Result set would be available in format as below. In actual result set, you should see accesskey and secretkey , its hidden here for security purpose.

<alias_of_watsonxdata>
URL : http://<region>.<domain>:<port>
AccessKey : ******************
SecretKey : ******************
API : s3v4
Path : auto
  • <region>.<domain>:<port> The endpoint URL of the MinIO server available in above result set. It specifies the location where the MinIO server can be accessed.
  • <accesskey> The access key associated with the MinIO server. This key is used for authentication.
  • <sceretkey> The secret key associated with the MinIO server. This key, along with the access key, is used for authentication.

3.3.2 Configure data replication from source bucket (source watsonx.data instance) to target bucket (target watsonx.data instance)

mc replicate add <alias_of_watsonxdata>/<source-bucket-name>                                      \                                          
--remote-bucket https://<accesskey>:<sceretkey>@<region>.<domain>:<port>/<target-bucket-name> \
--replicate "delete,delete-marker,existing-objects" \ \
--priority 1

Command is similar to what is executed as part of step 2.9.2 except,

  • <alias_of_watsonxdata> Retrieved in step 3.3.1
  • <source-bucket-name> Specified in step 2.4
  • <target-bucket-name>: Specified in step 1.4

refer to step 2.9.2 for more information.

Section 4. Verify data replication configuration

4.1 Logon to MinIO console using “Source MinIO console URL and port” for source bucket (source watsonx.data instance)

4.2 Navigate to<source_bucket_name> and upload sample file

  • <source_bucket_name> Bucket name specified in step 2.4

4.3 Logon to target MinIO object storage using “Target MinIO console URL and port” and navigate to <target_bucket_name> Bucket name specified in step 1.4

4.4 Verify the file uploaded in <source_bucket_name> of source watsonx.data instance is replicated to <target_bucket_name> of target watsonx.data instance based on the replication setup completed in section 1,2 and 3.

4.5 Verify the “Replication” status is enabled for both <source_bucket_name> and <target_bucket_name>. Below is the example of <source_bucket_name> where replication status is “Enabled”.

Only two watsonx.data instance have been considered for the demonstration purpose. You can setup Data replication rule for multiple watsonx.data instances by following similar process as above.

Conclusion

Implementation of Multi-Site Active-Active bucket replication in MinIO involves configuring source and target instances, creating aliases, and setting up access keys. The blog provides a detailed guide on establishing replication rules, specifying data flow between buckets, and ensuring priority for replication sequencing. Emphasis is placed on verification at each step to guarantee a successful setup. This comprehensive guide equips users to enhance data resilience, achieve geographical redundancy, and ensure seamless access to consistent data across distributed MinIO deployments.


Data Resilience Unleashed: A Comprehensive Guide to Multi-Site Active-Active Bucket Replication… was originally published in IBM Data Science in Practice on Medium, where people are continuing the conversation by highlighting and responding to this story.


Viewing all articles
Browse latest Browse all 330

Latest Images

Trending Articles





Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>
<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596344.js" async> </script>