Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/philosophy: Initial commit #1054

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

docs/philosophy: Initial commit #1054

wants to merge 5 commits into from

Conversation

RichiH
Copy link
Member

@RichiH RichiH commented Jun 10, 2018

This was surprisingly hard; I went through several iterations, tossing things away, re-adding some, and refactoring into different files. On the plus side, I think what I have is already relatively cleanly separated. design & patterns is a collection of half-finished thoughts and sentences with tons of duplicates. No use pushing those atm.

I would be especially interested in large conceptual comments and missing sections, less so in specific wording as that is bound to change, yet again, once there's more content.

Signed-off-by: Richard Hartmann <richih@richih.org>
@RichiH RichiH self-assigned this Jun 10, 2018
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, this seems a bit all over the place. I'm not getting a clear sense of what we do/don't do, it feels more like reverse engineering things from certain design decisions but missing other decisions.

they need to take action in order to prevent undesired system state.

Thus, its most important function is to keep the pipeline of ingestion, rule
evaluation, and alert hand-off working.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notification sending, we don't use the term "alert hand-off"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

operational patterns for different parts of our ecosystem:

For Prometheus itself, this means running every instance as an island of data
completely detached from every other instance, except for optional federation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception is weird, and incomplete. I'd go more with how any dependenceis have understandable impact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the half sentence completely.

completely detached from every other instance, except for optional federation.

For Alertmanager on the other hand, it means the exact opposite: meshing all
instances closely together, sharing knowledge about alerts and their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too vague. The important point is that it's AP rather than CP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand those two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't click in this context, but yes, we can simply refer to Brewer. It might lead to quite a bit more explanation around why we chose which. I would then be tempted to pull this in the design decisions simply link to each other; if linking at all.


## Simple operation

Operation of Prometheus should be as easy and failure-tolerant as possible. We
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is waaaay too open for interpretation. "easy" means I turn it on and it magically works with no configuration - which of course is far for easy in practice.

I'd talk about well defined and limited interfaces that allow integration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/easy/simple/ and added a section on dependencies.

try to put required complexity into earlier phases, going through them less
often and ideally still while under the control of a smaller subset of people.

One example of this would be the preference of statically linked binaries over
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an artifact of how Go works, it's not a project philosophy.

# Push-type system

Prometheus is, and always will be, a pull-type system. We strongly believe that
this makes operational sense in all but the very largest of scales.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, we believe it works at all scales.

Also both ways work at scale. The more salient point is that a lot of things we do only really work with push.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that Monarch-style mixed systems allow for even more scale. This is beyond this document's scope, though.

Also, I think you meant pull. It makes sense to expand on those things, though. will put in a TODO into patterns and link back to there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, both can be scaled indefinitely. Also Monarch is more pull than push.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's no limits to scale with either approach, but there's the usual known pros+cons for both approaches, and we're quite married to the pull approach, as Prometheus is so designed around being in control of pulling and processing data when it wants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just say the first sentence and then link to a blog post (Julius' from a while back perhaps or the FAQ entry. To the vast majority of users this is a non-issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet many users continue to try and make it do push (and tend to be quite unhappy when told their approach won't work and/or be supported), so I think it's worth a few sentences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

still doing one thing: ingest metric data, do computations on it, and expose
it to other systems.
2. Expect the output of every program to become the input to another, as yet unknown, program.
Today's lingua franca is HTTP endpoints, which are used by Prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP endpoints with JSON

But this isn't what we use everywhere. We've our own text format, and file_sd and yaml for config files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I didn't put JSON.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we're also doing many things that aren't HTTP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something you would propose instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove it.

We ensure that master always builds, called Continuous Integration these days,
and we not afraid to replace whole sections of our codebase, e.g. our storage
engine.
4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're saying here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a literal quote. I can shorten it to stop at the first comma, though. Would that make more sense to you?

The "unskilled" part is an artefact; back then it was common to assign many tasks to what were basically filing clerks. They had the IT knowledge you would expect them to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit out of the scope of the project, as we don't run an operations team. It certainly has nothing to do with the Unix philosophy.

Copy link
Member

@juliusv juliusv Jun 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah, I think some of these points sound a bit too forced in the context of Prometheus. I do agree about the "do one thing well" (which I would rather phrase as "keep pieces as simple as possible") and "open interfaces" points and would focus on those, while abandoning the others. I'm not sure we need to explicitly try to tie things so ceremoniously to the Unix philosophy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

While this is an outdated way of stating the goal, automation where possible
is still one of the core characteristics any modern philosophy.

## Embrace cloud-native technologies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not our philosophy, in fact I'd say this explicitly isn't our philosophy.

We're a technology that happens to work well for things that fall into the "cloud native" marketing term - but we also work just as well outside of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that structure, I wanted to say that this works well with good operations, but re-reading, I can see how that got lost over time.

Would you think it better to remove this or to expand on how cloud-native is one of the many facets of proper operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have a section saying that we work across many paradigms, but aren't going to do things just for the sake of one system that is weird/missing a feature.

Copy link
Contributor

@jamtur01 jamtur01 Jun 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Maybe some (more?) ideal use cases? Although that might drift over time and become stale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten completely & moved down.


Prometheus is a project of convicted and passionate individuals. As we do not
have a profit motive, nor quarterly projections, or any other requirement to
meet arbitrary business requirements, we can foxus on getting things right. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focus

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vim stopped to set spell for my markdown files... I need to fix that.

@RichiH
Copy link
Member Author

RichiH commented Jun 10, 2018

I agree that it still lacks a consistent thread, but I figured I'd rather get partial feedback at this point to align.

Signed-off-by: Richard Hartmann <richih@richih.org>
@RichiH
Copy link
Member Author

RichiH commented Jun 10, 2018

Are there any major sections & content missing from philosophy and goals?

What I am missing, on purpose, are any cross-links, those will come last.

Anything we can reasonably push to a lower tier, we should push there.


# Be open

We will always put as much of our code, discussions, presentations, and other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing an "as possible" here gramatically

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


# Play well with others

Prometheus is a project of convicted and passionate individuals. As we do not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think convicted was the word you meant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...are you sure...?

also means that we are free to suggest other implementations and projects if
they are a better fit for a particular use-case.

# Be inclusive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to mix this in with the other technical stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another place where this is mentioned? I feel like this needs to be somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only indirectly in our governance: https://prometheus.io/governance/#values

Maybe if/once we have a general section for developers and getting into Prometheus contributions, it would be a better fit there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very vision statementy. It has little to no bearing on the design decisions we make.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it in there while we don't have a better place, yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in a TODO for now.

## Keep dependencies clear and limited

Any non-trivial system needs to integrate with other systems. To keep the
resulting complexity low, we will always try to have the fewest interfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention to make debugging/understandability easier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing this change.

@juliusv
Copy link
Member

juliusv commented Jun 10, 2018

Are there any major sections & content missing from philosophy and goals?

Have you intentionally omitted most of the points from #149 so far?

label rewriting, and alert generation.


# Non-Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is where we put in stuff about assuming you already have CM/DNS/service database/machine database etc., tying back to laying/doing one thing well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to put in that have CMDB etc explicitly outside our scope? Isn't that obvious?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get requests in this area fairly regularly.

Signed-off-by: Richard Hartmann <richih@richih.org>
Signed-off-by: Richard Hartmann <richih@richih.org>
Signed-off-by: Richard Hartmann <richih@richih.org>
content as possible into a form and place which is accessible in the long term,
free of charge.

# Be opionated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which section it goes in here, but there should be something about not adding attractive nuisances: features that might be useful to experts for niche use cases. but would be misunderstood by and cause problems for the average user. Put another way, we don't leave chainsaws lying about for users to hurt themselves with.

try to put required complexity into earlier phases, going through them less
often and ideally still while under the control of a smaller subset of people.

One example of this would be the preference of statically linked binaries over
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still here, this is not a goal of the project.

## Keep dependencies clear and limited

Any non-trivial system needs to integrate with other systems. To keep the
resulting complexity low, we will always try to have the fewest interfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing this change.

evaluation, and alert notifications working.

The second most important function is to give humans context about these alerts
by allowing access to the most recent data Prometheus ingested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fullstop

I'd add something around data that's good enough to make useful engineering decisions, rather than 100% perfect data.

aylei pushed a commit to aylei/docs that referenced this pull request Oct 28, 2019
* fix inconsistencies in file names and heading

* use  2nd level heading for all flags

* fix typo in meta

* revert file name change until later

* revert file name

* escape * character
Copy link
Member

@roidelapluie roidelapluie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RichiH if you still want this I have made some extra comments.

title: Goals and Non-Goals
sort_rank: 3
---

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss an introduction here


For Alertmanager on the other hand, it means the exact opposite: meshing all
instances closely together, sharing knowledge about alerts and their
notifications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say that they are closely meshed, just that they are meshed, as they can lose their mesh by design. I would also then specify what happens when they can't mesh.


# Event handling

Prometheus is dealing with metrics. As such, it will never process and store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we get into exemplars


We will always put as much of our code, discussions, presentations, and other
content as possible into a form and place which is accessible in the long term,
free of charge.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add in the open, too?


# Play well with others

Prometheus is a project of convinced and passionate individuals. As we do not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should review this about the recent discussions on tsdb code reuse etc

@RichiH
Copy link
Member Author

RichiH commented Feb 13, 2020

@roidelapluie I do, it's on my bucket lest, thanks!

@beorn7 beorn7 self-requested a review February 13, 2020 15:01
@beorn7
Copy link
Member

beorn7 commented Mar 17, 2020

@RichiH , this PR seems to get some love on the mailing list. I guess to revive it, it needs first an update from your side before it makes sense for reviewers to comment on details.

@grobie
Copy link
Member

grobie commented Oct 15, 2020

@RichiH Any updates here? I think it would be great to get this in in some form.

@beorn7 beorn7 removed their request for review August 8, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants