Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat(disk_balance): add do_disk_migrate_replica to support migrate origin data #664

Merged
merged 81 commits into from
Nov 25, 2020

Conversation

foreverneverer
Copy link
Contributor

@foreverneverer foreverneverer commented Nov 11, 2020

  1. support disk migration rpc request(RPC_REPLICA_DISK_MIGRATE) and check whether the request is valid feat(disk_balance): support and validate disk migration rpc #660.
  2. add do_disk_migrate_replica to support migrate origin data
  3. close origin replica and update replica dir feat(disk_balance): close origin replica and update replica dir #668

replica migration status

   client(shell)----------->replicaServer--------->metaServer
         |                      |                     |
         |------- start ------->|----> IDLE           |
         |                      |      | (validate)   |
         |                      |     MOVING          |
         |                      |      | (copy data)  |
         |                      |     MOVED           |     
         |                      |      | (rename dir) |
         |                      |     CLOSED          |
         |                      |      |              |
         |          ------------|<- Learning <--------|
         |         |            |                     |
         |  LearningSucess      |                     |
         |         |            |                     |
         |          ----------->|                     |

#660 complete the migration request args check, this pr add do_disk_migrate_replica and support migrate origin data. The migration status from MOVING to MOVED

@foreverneverer foreverneverer marked this pull request as ready for review November 16, 2020 09:17
src/replica/replica_disk_migrator.cpp Outdated Show resolved Hide resolved
src/replica/replica_disk_migrator.cpp Outdated Show resolved Hide resolved
return;
}

if (init_target_dir(req) && migrate_replica_checkpoint(req) && migrate_replica_app_info(req)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire execution spends too long, it will probably block other tasks on the REPLICATION_LONG thread pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but the process need spend the long time, if not allow block REPLICATION_LONG, put DEFAULT?

if (utils::filesystem::directory_exists(replica_dir)) {
derror_replica("migration target replica dir({}) has existed", replica_dir);
reset_status();
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does the user know the result that failure happened?

Copy link
Contributor Author

@foreverneverer foreverneverer Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy data is designed async and no response the status, the result now need check the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I can add query_status rpc in later pr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @neverchanje, there should be a way to let user notice failure happened, query_status rpc may be helpful.

Copy link
Contributor Author

@foreverneverer foreverneverer Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it in later pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is your plan? You certainly can not actively notify the user. One possible way is to save the error, and every time user queries the status, if failure happened, you respond with that error.

Copy link
Contributor Author

@foreverneverer foreverneverer Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now has one way is run disk-migrate -g gpid -n node(not add other argument), if has one task is running, it will show:

 Internal server error [DiskMigrate to session 127.0.0.1:45801(replica)] failed: ErrorCode({Errno:ERR_BUSY}):Existed migrate task(replication::disk_migration_status::MOVED) is running]

We can get the task process but not need query_status, I think we can use this first, if need query, I can add it in later pr

src/replica/replica_disk_migrator.cpp Outdated Show resolved Hide resolved

// _target_replica_data_dir = /root/gpid.app_type.disk.balance.tmp/data/rdb, it will update to
// /root/target/gpid.app_type/data/rdb in replica_disk_migrator::update_replica_dir finally
_target_data_dir = utils::filesystem::path_combine(_target_replica_dir, kDataDirFolder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are _target_replica_dir = fmt::format("{}{}", replica_dir, kReplicaDirTempSuffix); and _target_data_dir = utils::filesystem::path_combine(_target_replica_dir, kDataDirFolder); the same?

Copy link
Contributor Author

@foreverneverer foreverneverer Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filesystem::path_combine("/abc", "file") => /abc/file, but format("/abc", "file")=>/abcfile

src/replica/replica_stub.cpp Outdated Show resolved Hide resolved
src/replica/replica_disk_migrator.cpp Show resolved Hide resolved
void replica_disk_migrator::migrate_replica(const replica_disk_migrate_request &req) {}
void replica_disk_migrator::migrate_replica(const replica_disk_migrate_request &req)
{
if (status() != disk_migration_status::MOVING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this condition happen? could you please take an example for me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now hasn't this condition, add it only make sure the process must under MOVING.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I suggest that you can consider this condition in your further pull request.

if (utils::filesystem::directory_exists(replica_dir)) {
derror_replica("migration target replica dir({}) has existed", replica_dir);
reset_status();
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @neverchanje, there should be a way to let user notice failure happened, query_status rpc may be helpful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants