Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for search reprovisioning #392

Merged
merged 7 commits into from
Oct 3, 2024

Conversation

m-alisafaee
Copy link
Contributor

@m-alisafaee m-alisafaee commented Sep 10, 2024

Adds a Sanic background task that reads data from the database and sends them as events.

Search reprovisioning.

/deploy

@coveralls
Copy link

coveralls commented Sep 10, 2024

Pull Request Test Coverage Report for Build 11163620994

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 204 of 209 (97.61%) changed or added relevant lines in 13 files are covered.
  • 4 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.1%) to 90.54%

Changes Missing Coverage Covered Lines Changed/Added Lines %
components/renku_data_services/authz/authz.py 17 18 94.44%
components/renku_data_services/namespace/db.py 6 7 85.71%
components/renku_data_services/project/db.py 7 8 87.5%
components/renku_data_services/message_queue/core.py 49 51 96.08%
Files with Coverage Reduction New Missed Lines %
components/renku_data_services/base_api/auth.py 1 89.61%
components/renku_data_services/crc/models.py 1 84.57%
components/renku_data_services/storage/blueprints.py 2 94.76%
Totals Coverage Status
Change from base Build 11140653681: 0.1%
Covered Lines: 9523
Relevant Lines: 10518

💛 - Coveralls

@m-alisafaee m-alisafaee force-pushed the pitch/reprovisioning-search branch 2 times, most recently from 1bdbd6f to f0100b2 Compare September 18, 2024 22:20
@RenkuBot
Copy link
Contributor

You can access the deployment of this PR at https://renku-ci-ds-392.dev.renku.ch

@m-alisafaee m-alisafaee marked this pull request as ready for review September 25, 2024 19:55
@m-alisafaee m-alisafaee requested a review from a team as a code owner September 25, 2024 19:55
Copy link
Member

@olevski olevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks really good Mohammad. I have a few questions and requests for changes. For some of the questions it may be easier to have a live discussion - but you tell me what you prefer.

components/renku_data_services/message_queue/api.spec.yaml Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
Comment on lines +81 to +82
This table is used to make sure that only one instance of reprovisioning is run at any given time.
It gets updated with the reprovisioning progress.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we should add a constraint to this table that makes so that either:

  • it can only ever contain one row
  • OR it can contain a history of all reprovisionings but we add an active boolean column and add a constraint so that only 1 row can have active set to true at any time.

i.e. some way so that even if we mess things up in our code we can only have 1 reprovisioning happening at a time.

components/renku_data_services/message_queue/db.py Outdated Show resolved Hide resolved
Comment on lines 35 to 39
async with session_maker() as session, session.begin():
start_event = make_event(
message_type="reprovisioning.started", payload=v2.ReprovisioningStarted(id=str(reprovisioning.id))
)
await event_repo.store_event(session, start_event)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I know we check if a reprovisioning is already going on in the db.py. But I think it is safer/nicer to not even send the reprovisioning.started event if another reprovisioning is underway. So the first thing we check inside this db transaction should be whether another reprovisioning is already in progress. And if it is then fail/do nothing. We should do this in addition with a constraint on the table in the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check if a reprovisioning is active, in the blueprint and before calling this function. So, the reprovisioning.started message isn't sent twice.

I'll add the DB constraint in the follow up PR.

Comment on lines +71 to +72
def delete(self) -> BlueprintFactoryResponse:
"""Stop reprovisioning (if any)."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we really want this at all? Because if you start a reprovisioning but then stop it you leave the event queue in a weird inconsistent state. So maybe once you start a reprovisioning you have to wait for it to finish or error out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some 2 cents: while a delete action may not be desirable, there should be a way to force a reprovisioning to be marked as errored. You cannot always ensure that a job will report back it has errored, so an admin needs to be able to mark it that way (and attempt a new one).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the DELETE endpoint to stop a reprovisioning will stay.

Comment on lines +87 to +88
id: Mapped[ULID] = mapped_column("id", ULIDType, primary_key=True, default_factory=lambda: str(ULID()), init=False)
start_date: Mapped[datetime] = mapped_column("start_date", DateTime(timezone=True), nullable=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: How do we handle errors during reprovisioning? And we have 2 kind here:

  1. Errors that fully crash the whole reprovisioning process
  2. Errors that may result from trying to reprovision a specific entity only

At least for no 1 we should save something in the database and report it back to the admin. For no 2 the question is whether we want to keep going if we encounter errors with some entities or do we fully stop?

And related to this is another question. What do we do if we encounter an error? Should we try to delete the / cleanup the events that were added so that we can effectively rollback to the pre-reprovisioining request state?

@m-alisafaee m-alisafaee marked this pull request as ready for review October 3, 2024 12:45
Copy link
Member

@olevski olevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is great just one minor thing I noticed in the apispec.

components/renku_data_services/message_queue/api.spec.yaml Outdated Show resolved Hide resolved
@m-alisafaee m-alisafaee merged commit d475878 into main Oct 3, 2024
12 of 13 checks passed
@m-alisafaee m-alisafaee deleted the pitch/reprovisioning-search branch October 3, 2024 14:36
@RenkuBot
Copy link
Contributor

RenkuBot commented Oct 3, 2024

Tearing down the temporary RenkuLab deplyoment for this PR.

leafty added a commit that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants