-
Notifications
You must be signed in to change notification settings - Fork 30
User Guide
This User Guide outlines the process for successfully performing an end-to-end migration. The solution offered in this repository caters to several specific scenarios:
- Migrating existing or historical data from one cluster to another.
- Transferring ongoing or live traffic between clusters.
- Conducting a comprehensive migration involving both existing and live data.
- Upgrading an existing cluster.
- Comparing an existing cluster with a prospective new one.
In this guide, we focus on scenarios 1 and 2, guiding you through the migration of historical data from a source cluster while concurrently handling live production traffic, which will be captured and redirected to a target cluster. It's crucial to note that migration strategies are not universally applicable. This guide provides a detailed methodology, predicated on certain assumptions detailed throughout, emphasizing the importance of robust engineering practices and a systematic approach to ensure a successful migration.
:Assumption: Starting point is an ElasticSearch 7.x/OpenSearch 1.x cluster managed on AWS EC2.
Your source cluster in this solution operates on Elasticsearch or OpenSearch, hosted on EC2 instances or similar computing environments. A proxy is set up to interact with this source cluster, either positioned in front of or directly on the coordinating nodes of the cluster.
This component is designed for HTTP RESTful traffic, playing a dual role. It not only forwards traffic to the source cluster but also splits and channels this traffic to a stream-processing service for later playback.
Acting as a traffic simulation tool, the Traffic Replayer replays recorded request traffic to a target cluster, mirroring real-world workload patterns. It links original requests and their responses to those directed at the target cluster, facilitating comparative analysis.
This container is tasked with a one-time operation to transfer index metadata and historical data from the source to the target cluster. It compares indices between clusters to identify those requiring migration and employs the open-source Data Prepper.
Operational within the Elastic Container Service (ECS) on AWS Fargate, this console is a containerized platform. It orchestrates the deployment of the Migration Assistant for Amazon OpenSearch Service, alongside a variety of tools to streamline the migration process.
The solution architecture, adaptable for cloud deployment, unfolds as follows:
- Incoming traffic reaches the existing cluster, targeting each coordinator node.
- A Capture Proxy is placed before each coordinator node for traffic capture, storing data in an event stream.
- With the continuous capture setup, historical data backfill is initiated.
- Post-backfill, the captured traffic is replayed using the Traffic Replayer.
- The results from directing traffic to both the original and new clusters are then evaluated.
:assumption: This architecture is based on the use of AWS cloud infrastructure, but most tools are designed to be cloud-independent. A local containerized version of this solution is also available.
Deploying to AWS (covered later in the guide) will deploy the following into your AWS account:
- Traffic is directed to the existing cluster, reaching each coordinator node.
- A Capture Proxy is added before each coordinator node in the cluster, allowing for traffic capture and storage in Amazon MSK.
- Once continuous traffic capture is in place, the user initiates a historical backfill.
- Following the backfill, the user replays the captured traffic using a Traffic Replayer.
- The user evaluates the outcomes from routing traffic to both the original and the new cluster.
- After confirming the new cluster’s functionality meets expectations, the user dismantles all related stacks, retaining only the new cluster’s setup. Additionally, the user can retire and discard the old cluster’s legacy infrastructure and all of the solution’s stacks, keeping only the new cluster.
Follow documentation for deploying solution. Then, on a cluster with at least two coordinator nodes, the user can attach a Capture Proxy on a node by following these steps: Please note that this is one method for installing the Capture Proxy on a node, and that these steps may vary depending on your environment.
These are the prerequisites to being able to attach the Capture Proxy:
-
*Make sure that your MSK client is accessible by the coordinator nodes in the cluster
- Add the following IAM policy to the node/EC2 instance so that it’s able to store the captured traffic in Kafka:
- From the AWS Console, go to the EC2 instance page, click on IAM Role, click on Add permissions, choose Create inline policy, click on JSON VIEW then add the following policy (replace region and account-id).
{ "Version": "2012-10-17", "Statement": [ { "Action": "kafka-cluster:Connect", "Resource": "arn:aws:kafka:<region>:<account-id>:cluster/migration-msk-cluster-<stage>/*", "Effect": "Allow" }, { "Action": [ "kafka-cluster:CreateTopic", "kafka-cluster:DescribeTopic", "kafka-cluster:WriteData" ], "Resource": "arn:aws:kafka:<region>:<account-id>:topic/migration-msk-cluster-<stage>/*", "Effect": "Allow" } ] }
-
Verify Java installation is accessible.
-
From linux command line of that EC2 instance, Check that the JAVA_HOME environment variable is set properly
(echo $JAVA_HOME)
, if not, then try running the following command that might help set it correctly:JAVA_HOME=$(dirname "$(dirname "$(type -p java)")")
- If that doesn’t work, then find the java directory on your node and set it as $JAVA_HOME
-
-
Log in to one of the coordinator nodes for command line access.
-
Update node’s port setting.
- Update elasticsearch.yml/opensearch.yml. Add this line to the node’s config file: http.port: 19200
-
Restart Elasticsearch/OpenSearch process so that the process will bind to the newly configured port. For example, if systemctl is available on your linux distribution you can run the following (Note: depending on your installation of Elasticsearch, these methods may not work for you)
sudo systemctl restart elasticsearch.service
-
Verify process is bound to new port. Run netstat -tapn to see if the new port is being listened on. If the new port is not there, then there is a chance that Elasticsearch/ OpenSearch is not running, in that case, you must start the process again. (Depending on your setup, restarting/starting the Elasticsearch process may differ)
-
Test the new port by sending any kind of traffic or request, e.g; curl https://localhost:19200 or http://
-
Download Capture Proxy:
- Go to the Opensearch Migrations latest releases page: https://github.com/opensearch-project/opensearch-migrations/releases/latest
- Copy the link for the Capture Proxy tar file, mind your instance’s architecture.
curl -L0 <capture-proxy-tar-file-link> --output CaptureProxyX64.tar.gz
- Unpack solution tarball:
tar -xvf CaptureProxyX64.tar.gz
cd CaptureProxyX64/bin
-
Running the Capture Proxy: 1.
nohup ./CaptureProxyX64 --kafkaConnection <msk-endpoint> --destinationUri http://localhost:19200 —listenPort 9200 —enableMSKAuth --insecureDestination &
Explanation of parameters in the command above:
- --kafkaConnection: your MSK client endpoint.
- --destinationUri: URI of the server that the Capture Proxy is capturing traffic for.
- --listenPort: Exposed port for clients to connect to this proxy. (The original port that the node was listening to)
- --enableMSKAuth: Enables SASL Kafka properties required for connecting to MSK with IAM auth.
- --insecureDestination: Do not check the destination server’s certificate.
-
Test the port that the Capture Proxy is now listening to.
-
curl https://localhost:9200
orhttp://
- You should expect the same response when sending a request to either ports (9200, 19200), except that the traffic sent to the port that the Capture Proxy is listening to, will be captured and sent to your MSK Client, also forwarded to the new Elasticsearch port.
-
-
Verify requests are sent to Kafka
-
Verify that a new topic has been created
- Log in to the Migration Console container.
- Go the Kafka tools directory cd kafka-tools/kafka/bin
- Run the following command to list the Kafka topics, and confirm that a new topic was created.
./kafka-topics.sh --bootstrap-server "$MIGRATION_KAFKA_BROKER_ENDPOINTS" --list --command-config ../../aws/msk-iam-auth.properties
-
Verify that a new topic has been created
Encountering a compatibility issue or missing feature?
- Search existing issues to see if it’s already reported. If it is, feel free to upvote and comment.
- Can’t find it? Create a new issue to let us know.
- Migration Assistant Overview
- Is Migration Assistant Right for You?
- Existing Data Migration - Quick Start Guide
- A. Snapshot Creation Verification
- B. Client Traffic Switchover Verification
- C. Traffic Capture Verification
- D. System Reset Before Migration