Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose project roadmap #15499

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ Now it's time to dig into the full etcd API and other guides.

- Email: [etcd-dev](https://groups.google.com/forum/?hl=en#!forum/etcd-dev)
- Slack: [#etcd](https://kubernetes.slack.com/messages/C3HD8ARJ5/details/) channel on Kubernetes ([get an invite](http://slack.kubernetes.io/))
- [Roadmap](./ROADMAP.md)
- [Community meetings](#Community-meetings)

### Community meetings
Expand Down
24 changes: 24 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# etcd roadmap

This document defines high level goals for project.

## Milestones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful about redefining meaning of the words.

So far in etcd we were using milestones = future minor release: v3.5, v3.6.
Here we naming as milestone a focus area we want to invest.

Let's use consistent terms. My proposal it to think about this as 3 level hierarchy:

  1. Milestones -> are milestones as defined in https://github.com/etcd-io/etcd/milestones. Let's keep them as publicly visible releases (might be patch).

  2. Efforts

    • are bugs that track progress on multiple issues that need to be addressed with a common objective
    • [or] (for bigger efforts) projects: https://github.com/etcd-io/etcd/projects.
      Still I would represent project as umbrella issue -> as it seems that project cannot be assigned to a milestone.
  3. Issues - for individual work items.

Now the question remains:
If we have a tool to dynamically track the milestones with attached efforts / items, do we need to redundantly track it in a markdown doc ?

And I would say - we don't. I assume that the purpose of the doc is different. It's a statement of intent what we want to focus in following releases. And thanks to being submitted by maintainers and reviewed, it forces them to be on the same page (as opposed to an individual maintainer assigning an issue to a milestone). But If that's the goal, let's call it explicitly in the preamble to this doc.

Then let's have:

Milestones:

release-v3.6

The main focus of the v3.6 is the reduction of technical debts. The explicit goal is to avoid new features.
The focus will be on:

  1. deprecate/decomission experimental / legacy cody:
    • decommission storage v2 (link to a tracking bug)
    • experimental features are graduated or removed (link to a tracking bug)
  2. ...

release-v3.5.x

The same as release-v3.4.x.

release-v3.4.x

The release focuses on stability. Etcd maintainers are going to backport:

  • critical/important vulnerabilities
  • backportable test coverage (including robustness tests)
  • critical correctness or robustness issue fixes
  • not invasive flags/warnings that enable easier transition to next release.


* [P0] Etcd releases are qualified by rigorous robustness testing
* [P0] Etcd can reliably detect data corruption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the above P1 3 action items fit into the Milestones section here?

etcd apply code should be easy to understand and validate correctness
etcd can reliably detect data corruption (hash is linearizable)
(This P1 action item is mentioned as P0)

etcd recovery from data inconsistency procedures are documented and tested

How does the stalled write due to slow disk fit into the Milestone? @ahrtr

What about lease redesign? #14094

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd apply code should be easy to understand and validate correctness

I would want to avoid touching random parts of apply code without better testing and clear goal. Removal of v2 API and following cleanup should already improve the situation and takes priority.

etcd can reliably detect data corruption (hash is linearizable)

This should be covered by Etcd can reliably detect data corruption, maybe requires rewrite/clarification. I might downgrade it back to P1 as it doesn't help with v3.4 release.

etcd recovery from data inconsistency procedures are documented and tested.

Documentation is done, we just need to add test. I think we should file an issue as important but goal milestone itself should be tracked as part of improvements to testing.

How does the stalled write due to slow disk fit into the Milestone? @ahrtr

This is somewhat new effort that is still not well defined. For me it comes under reliability, which is important, but as it relates to hardware failures it's not something etcd tackled yet. However with recently reported #15498 I would want to propose "etcd is resilient to hardware failures" soon.

What about lease redesign?

Correctness should be our top priority, however leases have been broken for long time and no-one cared (K8s also doesn't). As so I would treat it second priority to KV API.
At some point we could consider an larger effort "etcd APIs is high quality and has consistent behavior" that would encompass leases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the scope for this item clear? IIRC there was a discussion on corruption detection per key/value, then there were some discussions around merkle trees and partitioning the keyspace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the scope for this item clear? IIRC there was a discussion on corruption detection per key/value, then there were some discussions around merkle trees and partitioning the keyspace.

If you mean corruption detection scope, then not. I didn't have time to define it. It's pretty large issue to tackle and are multiple ways to approach it. Main challenge is balancing breaking changes and short term vs long term improvements. I have couple of ideas that I discussed with @ptabor, but didn't have time to write them down as I want to focus on finishing robustness tests first (not too long).

Happy to make the scope clearer if someone is interested in working on it. Still would like to encourage people to work on robustness tests first, as they also help v3.4 and v3.5 releases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any tracking issue for changes to catch corruption per key/value?

* [P1] Experimental features are graduated or removed
* [P1] Etcd testing is high quality, easy to maintain and expand
* [P1] Etcd v2 API and storage is removed and code cleaned up
* [P1] Etcd supports zero-downtime downgrade
* [P2] Etcd can automatically recover from data corruption

Each listed milestone should have a corresponding
[issue](https://github.com/etcd-io/etcd/issues) or
[milestone](https://github.com/etcd-io/etcd/milestones) on GitHub.
If it doesn't please [let us know](https://github.com/etcd-io/etcd#contact).

### Priorities

* P0 - Critical for reliability of the v3.5 and v3.4 releases. Should be prioritized this over all other work and back-ported.
* P1 - Important for long term success of the project. Blocks v3.6 release.
* P2 - Stretch goals that would be nice to have for v3.6, however should not be blocking.