Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(RK-397) Check for module on disk before skipping implementation #1278

Merged
merged 1 commit into from
Feb 16, 2022

Conversation

tvpartytonight
Copy link
Contributor

Prior to this change, the puppetfile module loader would check
to see if the static version changed after an environment sync.
However, if the static version failed to deploy, this check
would prevent a deploy from re-attempting to deploy that
static version. This results in deploys returning without
error because it never re-attempts to deploy the invalid static
version.

This change just ensures that the install path for the module
also exists before skipping, so we can be reasonably certain that
what is installed on disk is the version r10k expects.

Please add all notable changes to the "Unreleased" section of the CHANGELOG in the format:

- (JIRA ticket) Summary of changes. [Issue or PR #](link to issue or PR)

@tvpartytonight tvpartytonight requested a review from a team February 16, 2022 17:52
@tvpartytonight
Copy link
Contributor Author

@justinstoller I think I just want to make sure this is the right fix before updating the spec tests.

@tvpartytonight
Copy link
Contributor Author

@puppetlabs/puppetserver-maintainers I pinged Justin directly, but if anyone has opinions about this, would be great to hear.

@tvpartytonight tvpartytonight force-pushed the RK-397 branch 2 times, most recently from 7aa1886 to a5f053b Compare February 16, 2022 20:23
@@ -0,0 +1 @@
This only exists so the directory can be commited to git for testing purposes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

committed has two 't's ; P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention is also usually to call this file .gitkeep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about .gitkeep, not my spelling deficiencies.

@justinstoller
Copy link
Member

I don't love this approach. The average time per module that r10k was taking that was too much for the scale some people run at was only 40 ms. ie, We need to take significantly less that 40ms per module we're no-oping. Part of the design of --incremental was to remove any per-module disk access.

IMO, a big part of the issue that instigated this is terrible error reporting in other areas of the product, and this is a bandaid. However, a single directory check is should be significantly less than 40ms and maybe this is the right time/value trade off. Let me run this change through our Puppetfile for scale testing and see how it performs.

@justinstoller
Copy link
Member

The ticket uses a module with the wrong tag as the reproducer. Do you know if that's the actual issue or is that a simplified reproducer? I'm wondering what proportion of issues users encounter installing modules end up failing prior to the module directory creation.

@tvpartytonight
Copy link
Contributor Author

I don't love the approach either, but I didn't see any other way to address the issue of "is this thing actually deployed or not?" Keeping some other state on disk seems just as bad, so I think our hand is really forced here. I think that since the check is the last part of the unless, it won't happen for most modules unnecessarily.

I think the general problem is for anyone using --incremental can't reproduce errors they see initially from a deploy. I don't know if the written up ticket is a simplified version of a different issue.

@tvpartytonight
Copy link
Contributor Author

Maybe one optimization that could be implemented is to do a single ls of the modules dir that is then cached and then used, as opposed to hitting the disk per module.

@justinstoller
Copy link
Member

I finished my local testing and with a 1233 module Puppetfile an incremental no-op deploy goes from 1.9 seconds to 2.1 seconds. So, it's 10% slower but no where near the time frames we taking before incremental.

@justinstoller
Copy link
Member

Regarding the UX, I'm wondering if we keep some record of errors in the .r10k-deploy.json that we could reference from one run to the next to display failures on subsequent runs (or just try to fix them).

Also, we maybe should update our docs that when users are concerned that there may have been failures that they may want to manually run r10k without the --incremental flag as some kind of work around if they don't have this fix or the update failed after creating the module directory.

@justinstoller
Copy link
Member

FWIW, I checked and don't find meaningful info to determine if there were errors in the .r10k-depoy.json.

@justinstoller
Copy link
Member

I guess those last three comments were all a long way to say that I think this fix is a good value/time tradeoff to solve a specific usability issue.

Prior to this change, the puppetfile module loader would check
to see if the static version changed after an environment sync.
However, if the static version failed to deploy, this check
would prevent a deploy from re-attempting to deploy that
static version. This results in deploys returning without
error because it never re-attempts to deploy the invalid static
version.

This change just ensures that the install path for the module
also exists before skipping, so we can be reasonably certain that
what is installed on disk is the version r10k expects.
@justinstoller justinstoller merged commit c0d3ba8 into puppetlabs:main Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants