-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose project roadmap #15499
Propose project roadmap #15499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Two things I think would be nice to add:
- Section to mention feature tracking, with reference to feature tracking board if we are still going to use it for future releases: https://github.com/etcd-io/etcd/projects/1
- Once known, we should add links to each item so it links through to the issue or milestone directly.
## Milestones | ||
|
||
* [P0] Etcd releases are qualified by rigorous robustness testing | ||
* [P0] Etcd can reliably detect data corruption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the above P1
3 action items fit into the Milestones section here?
etcd apply code should be easy to understand and validate correctness
etcd can reliably detect data corruption (hash is linearizable)
(This P1 action item is mentioned as P0)
etcd recovery from data inconsistency procedures are documented and tested
How does the stalled write due to slow disk fit into the Milestone? @ahrtr
What about lease redesign? #14094
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd apply code should be easy to understand and validate correctness
I would want to avoid touching random parts of apply code without better testing and clear goal. Removal of v2 API and following cleanup should already improve the situation and takes priority.
etcd can reliably detect data corruption (hash is linearizable)
This should be covered by Etcd can reliably detect data corruption
, maybe requires rewrite/clarification. I might downgrade it back to P1 as it doesn't help with v3.4 release.
etcd recovery from data inconsistency procedures are documented and tested.
Documentation is done, we just need to add test. I think we should file an issue as important but goal milestone itself should be tracked as part of improvements to testing.
How does the stalled write due to slow disk fit into the Milestone? @ahrtr
This is somewhat new effort that is still not well defined. For me it comes under reliability, which is important, but as it relates to hardware failures it's not something etcd tackled yet. However with recently reported #15498 I would want to propose "etcd is resilient to hardware failures" soon.
What about lease redesign?
Correctness should be our top priority, however leases have been broken for long time and no-one cared (K8s also doesn't). As so I would treat it second priority to KV API.
At some point we could consider an larger effort "etcd APIs is high quality and has consistent behavior" that would encompass leases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the scope for this item clear? IIRC there was a discussion on corruption detection per key/value, then there were some discussions around merkle trees and partitioning the keyspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the scope for this item clear? IIRC there was a discussion on corruption detection per key/value, then there were some discussions around merkle trees and partitioning the keyspace.
If you mean corruption detection scope, then not. I didn't have time to define it. It's pretty large issue to tackle and are multiple ways to approach it. Main challenge is balancing breaking changes and short term vs long term improvements. I have couple of ideas that I discussed with @ptabor, but didn't have time to write them down as I want to focus on finishing robustness tests first (not too long).
Happy to make the scope clearer if someone is interested in working on it. Still would like to encourage people to work on robustness tests first, as they also help v3.4 and v3.5 releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any tracking issue for changes to catch corruption per key/value?
|
||
This document defines high level goals for project. | ||
|
||
## Milestones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be careful about redefining meaning of the words.
So far in etcd we were using milestones = future minor release: v3.5, v3.6.
Here we naming as milestone a focus area we want to invest.
Let's use consistent terms. My proposal it to think about this as 3 level hierarchy:
-
Milestones -> are milestones as defined in https://github.com/etcd-io/etcd/milestones. Let's keep them as publicly visible releases (might be patch).
-
Efforts
- are bugs that track progress on multiple issues that need to be addressed with a common objective
- [or] (for bigger efforts) projects: https://github.com/etcd-io/etcd/projects.
Still I would represent project as umbrella issue -> as it seems that project cannot be assigned to a milestone.
-
Issues - for individual work items.
Now the question remains:
If we have a tool to dynamically track the milestones with attached efforts / items, do we need to redundantly track it in a markdown doc ?
And I would say - we don't. I assume that the purpose of the doc is different. It's a statement of intent what we want to focus in following releases. And thanks to being submitted by maintainers and reviewed, it forces them to be on the same page (as opposed to an individual maintainer assigning an issue to a milestone). But If that's the goal, let's call it explicitly in the preamble to this doc.
Then let's have:
Milestones:
release-v3.6
The main focus of the v3.6 is the reduction of technical debts. The explicit goal is to avoid new features.
The focus will be on:
- deprecate/decomission experimental / legacy cody:
- decommission storage v2 (link to a tracking bug)
- experimental features are graduated or removed (link to a tracking bug)
- ...
release-v3.5.x
The same as release-v3.4.x.
release-v3.4.x
The release focuses on stability. Etcd maintainers are going to backport:
- critical/important vulnerabilities
- backportable test coverage (including robustness tests)
- critical correctness or robustness issue fixes
- not invasive flags/warnings that enable easier transition to next release.
I think this PR mixed up three concepts/things:
What should be included in future releases? This should be open to discussion. See my proposal (on top of @ptabor 's ). Note that adding whatever testing is a continuous effort, and is always welcome. 3.6.0
3.7.0 (or 4.0)
|
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
For now I separated postmortem update #15552 |
Format based on https://github.com/etcd-io/etcd/blob/release-3.5/ROADMAP.md
cc @ahrtr @ptabor