Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit log bootstrapper (placed before peers bootstrapper) is not reliable post topology change UNTIL a snapshot has taken place #900

Closed
richardartoul opened this issue Sep 12, 2018 · 3 comments

Comments

@richardartoul
Copy link
Contributor

richardartoul commented Sep 12, 2018

Imagine an M3DB cluster running with the bootstrapper configuration (filesystem,commitlog,peers,uninitialized) and the following topology (RF=1 for simplicity):

Host 1: Shard 1
Host 2: Shard 2
Host 3: Shard 3
Host 4: shard 4

Topology change removes host 4, making the new topology:

Host 1: Shard 1, Shard 4
Host 2: Shard 2
Host 3: Shard 3

For non-active blocks, Host 1 will have incrementally flushed all the data for shard 4. However, for the active block of Shard 4 the data will still be in-memory and not yet flushed to disk. The data will also not be in any snapshot files or commit log files. If the node goes down at this point, the commit log bootstrapper will succeed the bootstrap but data will have been lost.

Potential solutions:

  1. Require a successful snapshot post-bootstrap during topology changes before marking shards as "Available" on the new host
  2. Incremental flush the active block to the snapshots directory during peer bootstrapping
@richardartoul richardartoul changed the title Commit log bootstrapper is not reliable post topology change UNTIL a snapshot has taken place Commit log bootstrapper (placed after peers bootstrapper) is not reliable post topology change UNTIL a snapshot has taken place Sep 12, 2018
@richardartoul richardartoul changed the title Commit log bootstrapper (placed after peers bootstrapper) is not reliable post topology change UNTIL a snapshot has taken place Commit log bootstrapper (placed before peers bootstrapper) is not reliable post topology change UNTIL a snapshot has taken place Sep 12, 2018
@richardartoul
Copy link
Contributor Author

Discussed offline with @prateek and settled on 2 probably being the best solution

@robskillington
Copy link
Collaborator

robskillington commented Sep 14, 2018

Agreed, second approach sounds the most pragmatic.

@richardartoul
Copy link
Contributor Author

Resolved by #903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants