Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moves bundle unpack timeout into OperatorGroup #2952

Conversation

m1kola
Copy link
Member

@m1kola m1kola commented Apr 13, 2023

Description of the change:

Moves operatorframework.io/bundle-unpack-timeout annotation from InstallPlan to OperatorGroup

Motivation for the change:

operatorframework.io/bundle-unpack-timeout is an internal annotation used for E2E testing.

We need to move this out of InstallPlan in preparation to changes in the unpacking process (see #2942): OLM will soon be creating unpack jobs before creating InstallPlan so we need to find a new place where we can set this annotation.

Testing remarks:

Updates existing E2E tests

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2023
@openshift-ci
Copy link

openshift-ci bot commented Apr 13, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@m1kola m1kola force-pushed the move-bundle-unpack-timeout-to-og branch from eaa0cd8 to e2680ad Compare April 13, 2023 12:03
@m1kola m1kola marked this pull request as ready for review April 13, 2023 12:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2023
@openshift-ci openshift-ci bot requested review from ankitathomas and asmacdo April 13, 2023 12:05
@m1kola m1kola requested a review from awgreene April 13, 2023 12:06
Copy link
Collaborator

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2023
Copy link
Contributor

@ankitathomas ankitathomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2023
@perdasilva perdasilva force-pushed the move-bundle-unpack-timeout-to-og branch from e2680ad to b47f9e1 Compare April 20, 2023 16:08
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2023

ogs, err := ogLister.List(k8slabels.Everything())
if err != nil || len(ogs) == 0 {
return ignoreTimeout, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return an err?

Copy link
Contributor

@tmshort tmshort Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the check for len(ogs) == 0 is a bit redundant the check below. Are we treating "0" different than "not 1"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return an err?

I looked into a similar piece of code where we check whether UnsafeFailForward upgrade strategy is set on an OperatorGroup:

// IsFailForwardEnabled takes a namespaced operatorGroup lister and returns
// True if an operatorGroup exists in the namespace and its upgradeStrategy
// is set to UnsafeFailForward and false otherwise. An error is returned if
// an more than one operatorGroup exists in the namespace.
// No error is returned if no OperatorGroups are found to keep the resolver
// backwards compatible.
func IsFailForwardEnabled(ogLister v1listers.OperatorGroupNamespaceLister) (bool, error) {
ogs, err := ogLister.List(labels.Everything())
if err != nil || len(ogs) == 0 {
return false, nil
}
if len(ogs) != 1 {
return false, fmt.Errorf("found %d operatorGroups, expected 1", len(ogs))
}
return ogs[0].UpgradeStrategy() == operatorsv1.UpgradeStrategyUnsafeFailForward, nil
}

It allows a case when there is no OperatorGroup. I don't know if it still makes sense. I was under the impression that OG is required for resolution to succeed, but I didn't dare to handle timeout differently from fail forward.

Also, the check for len(ogs) == 0 is a bit redundant the check below. Are we treating "0" different than "not 1"?
I see that we can

The idea is that we return either a value specified in the annotation or a default value ignoreTimeout. Given the above logic around a case when OG does not exist (len(ogs) == 0) we do not want to return an error since we can return default value. In case when more than one OG exist - we return an error because we do not know which one to respect.

Basically this is exactly the same pattern as in IsFailForwardEnabled, but I'm happy to change me if you think that special handling of "OG does not exist" situation no longer makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we ought to return the error if the List call fails and remove the || len(ogs) == 0. This can let the caller decide whether they want to retry, or are happy with using the default value. Maybe we could also add a ticket for the old code so we get this rechecked. I'm worried silent errors around this code could lead to hard to determine bugs. wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this function to return an error from List and removed || len(ogs) == 0. From what I know about OLM so far - OperatorGroup is required and without it resolution will fail eventually. In case of these errors - sync will be requeued (no extra code on caller required).

I'll try to the same for old code in IsFailForwardEnabled, but in a separate PR. Will see how it goes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #2957 for IsFailForwardEnabled.

d, err := time.ParseDuration(timeoutStr)
if err != nil {
logger.Errorf("failed to parse unpack timeout annotation(%s: %s): %v", BundleUnpackTimeoutAnnotationKey, timeoutStr, err)
return ignoreTimeout, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging vs error return? The behavior here is a bit inconsistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works in the same way it was before the refactoring here: we log an error and ignore the value.

The only case when we want to return an error in this func - is where there are two OperatorGroups in which case we can not proceed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also decided to change how we handle duration parsing error. In master it is also a silent error (we just log) - now I return parse error to the caller which will requeue.

This error is very unlikely to happen as annotation is internal use only (not documented) and is set programmatically (we only use it for E2Es). So I think other than requeuing - we do not need any special handling for it.

@m1kola m1kola force-pushed the move-bundle-unpack-timeout-to-og branch from b47f9e1 to 0d20d85 Compare April 24, 2023 11:29
@m1kola
Copy link
Member Author

m1kola commented Apr 24, 2023

Rebased on top of master to make github happy.

@perdasilva perdasilva force-pushed the move-bundle-unpack-timeout-to-og branch from 0d20d85 to 8438943 Compare April 24, 2023 13:53
@perdasilva
Copy link
Collaborator

Rebased from this side

@m1kola m1kola force-pushed the move-bundle-unpack-timeout-to-og branch 4 times, most recently from 09dc7a7 to aea944e Compare April 24, 2023 15:32
`operatorframework.io/bundle-unpack-timeout` is an internal annotation used
for E2E testing.

We need to move this out of InstallPlan in preparation to changes
in the unpacking process: OLM will soon be creating unpack jobs
before creating InstallPlan so we need to find a new place
where we can set this annotation.

Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com>
@m1kola m1kola force-pushed the move-bundle-unpack-timeout-to-og branch from aea944e to 03b3747 Compare April 24, 2023 16:14
Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2023
@openshift-ci
Copy link

openshift-ci bot commented Apr 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ankitathomas, m1kola, perdasilva, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit 47d6aa1 into operator-framework:master Apr 24, 2023
@m1kola m1kola deleted the move-bundle-unpack-timeout-to-og branch April 24, 2023 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants