-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/philosophy: Initial commit #1054
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Richard Hartmann <richih@richih.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, this seems a bit all over the place. I'm not getting a clear sense of what we do/don't do, it feels more like reverse engineering things from certain design decisions but missing other decisions.
content/docs/philosophy/goals.md
Outdated
they need to take action in order to prevent undesired system state. | ||
|
||
Thus, its most important function is to keep the pipeline of ingestion, rule | ||
evaluation, and alert hand-off working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notification sending, we don't use the term "alert hand-off"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
content/docs/philosophy/goals.md
Outdated
operational patterns for different parts of our ecosystem: | ||
|
||
For Prometheus itself, this means running every instance as an island of data | ||
completely detached from every other instance, except for optional federation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception is weird, and incomplete. I'd go more with how any dependenceis have understandable impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the half sentence completely.
completely detached from every other instance, except for optional federation. | ||
|
||
For Alertmanager on the other hand, it means the exact opposite: meshing all | ||
instances closely together, sharing knowledge about alerts and their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too vague. The important point is that it's AP rather than CP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand those two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That didn't click in this context, but yes, we can simply refer to Brewer. It might lead to quite a bit more explanation around why we chose which. I would then be tempted to pull this in the design decisions simply link to each other; if linking at all.
content/docs/philosophy/goals.md
Outdated
|
||
## Simple operation | ||
|
||
Operation of Prometheus should be as easy and failure-tolerant as possible. We |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is waaaay too open for interpretation. "easy" means I turn it on and it magically works with no configuration - which of course is far for easy in practice.
I'd talk about well defined and limited interfaces that allow integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/easy/simple/ and added a section on dependencies.
try to put required complexity into earlier phases, going through them less | ||
often and ideally still while under the control of a smaller subset of people. | ||
|
||
One example of this would be the preference of statically linked binaries over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an artifact of how Go works, it's not a project philosophy.
content/docs/philosophy/goals.md
Outdated
# Push-type system | ||
|
||
Prometheus is, and always will be, a pull-type system. We strongly believe that | ||
this makes operational sense in all but the very largest of scales. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect, we believe it works at all scales.
Also both ways work at scale. The more salient point is that a lot of things we do only really work with push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that Monarch-style mixed systems allow for even more scale. This is beyond this document's scope, though.
Also, I think you meant pull. It makes sense to expand on those things, though. will put in a TODO into patterns and link back to there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, both can be scaled indefinitely. Also Monarch is more pull than push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's no limits to scale with either approach, but there's the usual known pros+cons for both approaches, and we're quite married to the pull approach, as Prometheus is so designed around being in control of pulling and processing data when it wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just say the first sentence and then link to a blog post (Julius' from a while back perhaps or the FAQ entry. To the vast majority of users this is a non-issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet many users continue to try and make it do push (and tend to be quite unhappy when told their approach won't work and/or be supported), so I think it's worth a few sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
still doing one thing: ingest metric data, do computations on it, and expose | ||
it to other systems. | ||
2. Expect the output of every program to become the input to another, as yet unknown, program. | ||
Today's lingua franca is HTTP endpoints, which are used by Prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP endpoints with JSON
But this isn't what we use everywhere. We've our own text format, and file_sd and yaml for config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I didn't put JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we're also doing many things that aren't HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something you would propose instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove it.
We ensure that master always builds, called Continuous Integration these days, | ||
and we not afraid to replace whole sections of our codebase, e.g. our storage | ||
engine. | ||
4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're saying here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a literal quote. I can shorten it to stop at the first comma, though. Would that make more sense to you?
The "unskilled" part is an artefact; back then it was common to assign many tasks to what were basically filing clerks. They had the IT knowledge you would expect them to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit out of the scope of the project, as we don't run an operations team. It certainly has nothing to do with the Unix philosophy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah, I think some of these points sound a bit too forced in the context of Prometheus. I do agree about the "do one thing well" (which I would rather phrase as "keep pieces as simple as possible") and "open interfaces" points and would focus on those, while abandoning the others. I'm not sure we need to explicitly try to tie things so ceremoniously to the Unix philosophy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
While this is an outdated way of stating the goal, automation where possible | ||
is still one of the core characteristics any modern philosophy. | ||
|
||
## Embrace cloud-native technologies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not our philosophy, in fact I'd say this explicitly isn't our philosophy.
We're a technology that happens to work well for things that fall into the "cloud native" marketing term - but we also work just as well outside of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that structure, I wanted to say that this works well with good operations, but re-reading, I can see how that got lost over time.
Would you think it better to remove this or to expand on how cloud-native is one of the many facets of proper operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have a section saying that we work across many paradigms, but aren't going to do things just for the sake of one system that is weird/missing a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Maybe some (more?) ideal use cases? Although that might drift over time and become stale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewritten completely & moved down.
|
||
Prometheus is a project of convicted and passionate individuals. As we do not | ||
have a profit motive, nor quarterly projections, or any other requirement to | ||
meet arbitrary business requirements, we can foxus on getting things right. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
focus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vim stopped to set spell
for my markdown files... I need to fix that.
I agree that it still lacks a consistent thread, but I figured I'd rather get partial feedback at this point to align. |
Signed-off-by: Richard Hartmann <richih@richih.org>
Are there any major sections & content missing from philosophy and goals? What I am missing, on purpose, are any cross-links, those will come last. Anything we can reasonably push to a lower tier, we should push there. |
|
||
# Be open | ||
|
||
We will always put as much of our code, discussions, presentations, and other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing an "as possible" here gramatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
# Play well with others | ||
|
||
Prometheus is a project of convicted and passionate individuals. As we do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think convicted was the word you meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔒
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...are you sure...?
also means that we are free to suggest other implementations and projects if | ||
they are a better fit for a particular use-case. | ||
|
||
# Be inclusive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to mix this in with the other technical stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another place where this is mentioned? I feel like this needs to be somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only indirectly in our governance: https://prometheus.io/governance/#values
Maybe if/once we have a general section for developers and getting into Prometheus contributions, it would be a better fit there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very vision statementy. It has little to no bearing on the design decisions we make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep it in there while we don't have a better place, yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put in a TODO for now.
## Keep dependencies clear and limited | ||
|
||
Any non-trivial system needs to integrate with other systems. To keep the | ||
resulting complexity low, we will always try to have the fewest interfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention to make debugging/understandability easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing this change.
Have you intentionally omitted most of the points from #149 so far? |
label rewriting, and alert generation. | ||
|
||
|
||
# Non-Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is where we put in stuff about assuming you already have CM/DNS/service database/machine database etc., tying back to laying/doing one thing well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to put in that have CMDB etc explicitly outside our scope? Isn't that obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get requests in this area fairly regularly.
Signed-off-by: Richard Hartmann <richih@richih.org>
Signed-off-by: Richard Hartmann <richih@richih.org>
Signed-off-by: Richard Hartmann <richih@richih.org>
content as possible into a form and place which is accessible in the long term, | ||
free of charge. | ||
|
||
# Be opionated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which section it goes in here, but there should be something about not adding attractive nuisances: features that might be useful to experts for niche use cases. but would be misunderstood by and cause problems for the average user. Put another way, we don't leave chainsaws lying about for users to hurt themselves with.
try to put required complexity into earlier phases, going through them less | ||
often and ideally still while under the control of a smaller subset of people. | ||
|
||
One example of this would be the preference of statically linked binaries over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still here, this is not a goal of the project.
## Keep dependencies clear and limited | ||
|
||
Any non-trivial system needs to integrate with other systems. To keep the | ||
resulting complexity low, we will always try to have the fewest interfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing this change.
evaluation, and alert notifications working. | ||
|
||
The second most important function is to give humans context about these alerts | ||
by allowing access to the most recent data Prometheus ingested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fullstop
I'd add something around data that's good enough to make useful engineering decisions, rather than 100% perfect data.
* fix inconsistencies in file names and heading * use 2nd level heading for all flags * fix typo in meta * revert file name change until later * revert file name * escape * character
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RichiH if you still want this I have made some extra comments.
title: Goals and Non-Goals | ||
sort_rank: 3 | ||
--- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we miss an introduction here
|
||
For Alertmanager on the other hand, it means the exact opposite: meshing all | ||
instances closely together, sharing knowledge about alerts and their | ||
notifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not say that they are closely meshed, just that they are meshed, as they can lose their mesh by design. I would also then specify what happens when they can't mesh.
|
||
# Event handling | ||
|
||
Prometheus is dealing with metrics. As such, it will never process and store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we get into exemplars
|
||
We will always put as much of our code, discussions, presentations, and other | ||
content as possible into a form and place which is accessible in the long term, | ||
free of charge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add in the open, too?
|
||
# Play well with others | ||
|
||
Prometheus is a project of convinced and passionate individuals. As we do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should review this about the recent discussions on tsdb code reuse etc
@roidelapluie I do, it's on my bucket lest, thanks! |
@RichiH , this PR seems to get some love on the mailing list. I guess to revive it, it needs first an update from your side before it makes sense for reviewers to comment on details. |
@RichiH Any updates here? I think it would be great to get this in in some form. |
This was surprisingly hard; I went through several iterations, tossing things away, re-adding some, and refactoring into different files. On the plus side, I think what I have is already relatively cleanly separated. design & patterns is a collection of half-finished thoughts and sentences with tons of duplicates. No use pushing those atm.
I would be especially interested in large conceptual comments and missing sections, less so in specific wording as that is bound to change, yet again, once there's more content.