-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport #8009 into 3.0? #8253
Comments
Unlikely to happen. Since the fix doesn't sync down the free list, booting into the patched etcd 3.0 then booting into an older version of 3.0 will leak free pages at best / corrupt the db at worst. |
What does this imply for our ability to roll back from a fixed 3.1.x to a broken 3.0.x? |
@mml 3.3.0 will have the patch and proper rollback support. Issuing a rollback would restore the backend to the slow mode with free lists before reverting the cluster version to 3.2. |
@heyitsanthony If we only backported "Garbage collect pages allocated after minimum txid" (etcd-io/bbolt#3) to etcd 3.0.x and 3.1.x. We might be able to resolve the most pressing issues with #8009 without introducing any rollback or backward compatibility issues in the way freelists are persisted. We would either need to get boltdb/bolt#694 merged or build a version of bbolt that contains etcd-io/bbolt#3 but not etcd-io/bbolt#1. After that, Etcd would just pick up the new versions for the minor release of 3.0.x and 3.1.x. Does this sound reasonable? I'll be available to contribute. |
We tried it, and it will not solve what #8009 hit. A sudden free pages release due to compaction (and a previous spike on page usage) will still trigger the problem unless we stop syncing the free pages. etcd-io/bbolt#3 helps more on reducing page usage on concurrent read/write txns case. |
@xiang90 we are also considering to back port the PRs to 3.1.9 to fix the "database space exceed" issue, Is there any way to do this ? |
The backport policy is documented here: https://github.com/coreos/etcd/blob/master/Documentation/branch_management.md We could backport patches to more than one minor releases in theory, but given the people we have today, it is not feasible. I am closing this. |
It would be super nice to have the fix for #8009 backported into 3.0.x.
@wojtek-t @mml @jpbetz
The text was updated successfully, but these errors were encountered: