Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic coordinator and worker classes #31

Merged
merged 29 commits into from
Jul 16, 2024
Merged

Conversation

lucabello
Copy link
Contributor

@lucabello lucabello commented Jul 2, 2024

Issue

Our work on HA solutions (Mimir, Loki, and Tempo) is rapidly becoming more complex, making it easy for the respective Coordinator/Worker charms to diverge.

Solution

Most of the Coordinator/Worker charms are in fact shared between the different HA solutions; this PR adds abstractions to simplify the creation and management of Coordinator/Worker charms, also preventing the charms from diverging.

Testing Instructions

To test these classes, we should use them in our Mimir HA deployment, and make sure the solution still functions.

Mimir PR: (to be added)

Additional Context

Copy link
Contributor

@mmkay mmkay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is going. A bit of comments from me but it looks like migration might be pretty straightforward, nice :)

src/cosl/distributed/nginx.py Outdated Show resolved Hide resolved
src/cosl/distributed/nginx.py Outdated Show resolved Hide resolved
src/cosl/distributed/coordinator.py Outdated Show resolved Hide resolved
src/cosl/distributed/coordinator.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sed-i sed-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I managed to do only a partial review for now.
Should we split out databag_model to a tandem PR? Perhaps taking it outside of this context would help with abstractions?

.gitignore Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
src/cosl/__init__.py Outdated Show resolved Hide resolved
src/cosl/databag_model.py Outdated Show resolved Hide resolved
src/cosl/databag_model.py Outdated Show resolved Hide resolved
src/cosl/distributed/cluster.py Outdated Show resolved Hide resolved
src/cosl/distributed/cluster.py Outdated Show resolved Hide resolved
src/cosl/distributed/cluster.py Outdated Show resolved Hide resolved
src/cosl/databag_model.py Outdated Show resolved Hide resolved
src/cosl/distributed/cluster.py Outdated Show resolved Hide resolved
@lucabello
Copy link
Contributor Author

I answered all the comments so far, and closed a bunch I agreed with and solved :)

As per the databag_model.py abstraction, I explained here that it's something we already use, and it's brought in with basically no changes (except the ones I just made from the PR comments)

@PietroPasotti
Copy link
Contributor

I answered all the comments so far, and closed a bunch I agreed with and solved :)

As per the databag_model.py abstraction, I explained here that it's something we already use, and it's brought in with basically no changes (except the ones I just made from the PR comments)

given it brings quite a bunch of controversy and noise to this PR, I agree we should probably keep ti separate and make this branch depend on that one.

@lucabello
Copy link
Contributor Author

I've moved databag_model.py inside interface.py (i.e., the former cluster.py) to make it clear this databag abstraction is only meant for the coordinated workers.

We should absolutely make a shared databag abstraction to re-use in our charms, but (even though the code is still there) I'm taking the file out of the PR so we don't waste time on that for now :)

Copy link
Contributor

@Abuelodelanada Abuelodelanada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted a few comments in the code.

On a more general note I think we should try to avoid as much as we can the coupling between Coordinator and Worker and its collaborators objects.

For instance, JujuTopology, CertHandler, MetricsEndpointProvider, etc are instantiated in the __init__ of the classes.

What do you think about injecting those dependencies instead of creating the instances inside the objects?
I know for that we may need some factory methods/classes... but by reducing the coupling we increase testability and separation of concerns.

src/cosl/coordinated_workers/coordinator.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Outdated Show resolved Hide resolved
src/cosl/helpers.py Show resolved Hide resolved
@lucabello
Copy link
Contributor Author

@Abuelodelanada the “coupling” is 50% of the work here, and it's a feature: the coordinator/workers class are meant to abstract two main things

  • the coordinator/worker logic (is the deployment complete, recommended, etc.);
  • the self-monitoring integrations.

If we were to pass objects like MetricsEndpointProvider to the Coordinator class, there would be no point in having it inside it at all. At some point, we said we want to have a separate object that wraps our COS integrations, and that will also wrap things similarly.

Of course, there is room for improvement:

  • the way we interact with relation data could be made generic and re-used by all of our charms;
  • self-monitoring integrations could be wrapped in a SelfMonitoring object;
  • more stuff, probably.

However, the points above would require discussions, careful design, and that's something I'd love to do — but I don't think we have time for that right now, and we'll have to postpone those.

@lucabello lucabello marked this pull request as ready for review July 10, 2024 11:54
@sed-i
Copy link
Contributor

sed-i commented Jul 10, 2024

@lucabello do we have a tandem PR for this somewhere?

@lucabello
Copy link
Contributor Author

lucabello commented Jul 11, 2024

I have linked the two Mimir PRs where this abstraction is used.

If we don't have other comments, we need to address a few last things:

  • this comment @Abuelodelanada @sed-i what do you think?
  • update-ca-certificates in the Nginx image: I believe @mmkay has already contacted the right people
  • figure out if we can/should have model_uuid in the worker topology without getting warnings from pydantic
  • when the worker is getting data from a coordinator app databag, do we need a leader guard? The answer is no, we don't
  • is there a better way to fetch the charm libraries than what we do know?

src/cosl/coordinated_workers/worker.py Show resolved Hide resolved
src/cosl/coordinated_workers/worker.py Show resolved Hide resolved
src/cosl/coordinated_workers/worker.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/worker.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/worker.py Show resolved Hide resolved
src/cosl/coordinated_workers/nginx.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Outdated Show resolved Hide resolved
src/cosl/coordinated_workers/coordinator.py Show resolved Hide resolved
@lucabello lucabello requested a review from sed-i July 15, 2024 15:37
Copy link
Contributor

@sed-i sed-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
Thanks for being patient with all the comments :)
Let's see how it goes using it in a charm!

@lucabello lucabello merged commit ac3167d into main Jul 16, 2024
5 checks passed
@lucabello lucabello deleted the coordinator-superclass branch July 16, 2024 07:44
@lucabello lucabello restored the coordinator-superclass branch July 16, 2024 07:45
@lucabello lucabello deleted the coordinator-superclass branch July 16, 2024 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants