Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document design philosophies and reasonings for commonly challenged technical decisions #1038

Open
juliusv opened this issue Jun 1, 2018 · 8 comments

Comments

@juliusv
Copy link
Member

juliusv commented Jun 1, 2018

We frequently receive questions, issues, and pull requests that contradict already established decisions within the Prometheus team. When responding to them (or even to avoid some of them being filed in the first place), it would be great to have documented reasonings to link to. This would give both clarity and legitimacy to rejections.

We should document for each decision:

  • Existing reasonings for it.
  • Links to relevant discussions.
  • Whether this decision is a permanent design decision or temporary due to lack of maintainability (like a moratorium on new SD mechanisms or not adding auth).
  • Explanation of what the current consensus status is (formal vote vs. lazy consensus in team).
  • Mechanisms of triggering an escalation / formal vote.

This has overlap with existing pages or ideas for pages:

  • Existing FAQ page: this is similar to the FAQ page, but more about contentious issues and general design questions rather than regular usage questions.
  • Planned non-goals documentation (Add non-goals #149): This could perhaps be part of the overall reasonings documentation.

To be determined:

  • How should these overlapping areas be reconciled into a coherent whole?
  • Should this documentation be centralized or should some specialized parts be kept distributed across repositories?
  • How fine-granular should this documentation become? Probably not every minor decision should go in there that has been discussed one time somewhere on a GitHub issue. We should probably start with major points like overarching non-goals first and then see how it goes.
@RichiH
Copy link
Member

RichiH commented Jun 2, 2018

I like the proposed structure very much. It should also carry relevant timestamps, if any.

What about having this as a multi-stage structure like

  1. philosophy
  2. goals & non-goals
  3. design decisions
  4. patterns & anti-patterns

and linking to it from the top of the FAQ?

For anything specific to certain exporters, their docs/ should probably live in their respective repos and we create a mechanism to pull this into our documentation at build time.

As to the granularity, starting somewhere is enough; for the longer term, anything which comes up repeatedly and/or creates a longer-running or controversial discussion is a good candidate for inclusion.

@jamtur01
Copy link
Contributor

jamtur01 commented Jun 3, 2018

I like @RichiH's approach here. I think that makes logical sense as a progression.

@juliusv
Copy link
Member Author

juliusv commented Jun 3, 2018

Yeah, sounds good for a start. It'll sometimes be challenging to decide which of those 4 sections something belongs into, but I'd say we should just start and see how well that structure works out.

@ant31
Copy link

ant31 commented Jun 4, 2018

Some random thoughts:

  • A strong focus should be put into including and welcoming more (core) maintainer in general. The big 'no' by default to every new proposals and argumentation required that follows isn't welcoming at all.
  • Many users come with technical solutions without clearly explaining the problem, make sure the problem is clearly exposed before replying that a solution doesn't 'make sense' and close it.
  • Acknowledge that the community is growing and usages evolve. Allow decision discussed years, months ago to be reviewed and questioned again.

In general, the Prometheus projects don't have a high PR/Issue rate:
last 30 days: Alertmanager 6 new issues, Prometheus: 16 new issues.
With ~20 maintainers, that's 1 reply per month, can still afford to spend more some time to dig the issue and proposal from new users.

@brian-brazil
Copy link
Contributor

In general, the Prometheus projects don't have a high PR/Issue rate:

I think you're off by a few orders of magnitude. There's ~25 issues requiring attention per day across the project, and only ~2-3 equivalent maintainers working on them. That's 10 replies per maintainer per day, on top of all the other project work they're doing. In addition there's probably another 20+ user questions per day between the lists and IRC.

@juliusv
Copy link
Member Author

juliusv commented Jun 4, 2018

Yeah, to understand this better, while we have 18 voting team members, only some of them get to spend any significant amount of time on Prometheus (as part of their job or in their free time). We have >30 repositories and a lot of action going everywhere... so it's definitely hard to stay on top of things. I think the estimate of around 2-3 combined full-time people working on things across the project is a good estimate. So this is what we start out with at least, as a basis to work on. The challenge will be to bring in new contributors who are not only capable and willing enough, but also welcomed well enough into the project so that they will in turn become maintainers and help share that load. It's a bit of a vicious circle, but if e.g. companies spent more money just sponsoring maintainership, that could also help a lot. Google has e.g. hundreds of paid people working on Kubernetes in comparison.

@ant31
Copy link

ant31 commented Jun 4, 2018

Google has e.g. hundreds of paid people working on Kubernetes in comparison.

Google team on upstream is significantly smaller than that but not cross-companies. The important point is: What kubernetes did to involve many different companies? IMO and personal experience, on every proposal, issues, PR, ML discussions, users are welcomed to propose changes and even to rediscuss topic that was already stated and/or get routed to a detailed explanation and meeting notes. Plus proposals that are incomplete but with a real usecase behind are rephrased, take-over by maintainers and rarely simply closed, dismissed.

Why would companies involve time and dedicate employees to a project that most likely going to refuse any change without a hard fight ?

It's a bit of a vicious circle

Yes, it is. Also, contributions aren't only through code, when you 'educate' someone by answering a question or refusing in detail a proposal, he/she's now able to reply to next users with similar questions/PR.

Speaking of Kubernetes, how many answers are done by core-maintainers vs users? Even on sub-projects, helm/kubespray/kops/kubeadm/charts, users help each other now! It's freeing maintainer to work on features. Of course, initially it took a massive amount of a time to reply in details to most of the questions, but it worked out.

To break the circle, I insist on welcoming any answer/question/change/suggestion positively. To be clear, I'm not saying to increase acceptance rate but to have a positive bias rather than negative on user/maintainers inputs

In addition there's probably another 20+ user questions per day between the lists and IRC.

And the user base is growing faster than the maintainer one, what's the plan to manage that?

@juliusv
Copy link
Member Author

juliusv commented Jun 4, 2018

I think we are in agreement on the major points, but I wanted to illustrate that Prometheus has always been in a different position than some other major OSS projects. It never started out with the explicit backing of a major tech company and still doesn't have much in terms of maintenance resources flowing into it. It was created as a passion project to solve a job at hand, then set free, and now the combined unpaid and paid time that is going into it is still very limited. For example, I'm also only commenting here on my free time, and to the extent that I can bring myself to care, because almost none of my paid work involves upstream Prometheus. So I am mostly inactive upstream nowadays. Money and corporate backing (or lack thereof) is a big factor.

aylei pushed a commit to aylei/docs that referenced this issue Oct 28, 2019
Improved migration overview document
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants