-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moves bundle unpack timeout into OperatorGroup #2952
Moves bundle unpack timeout into OperatorGroup #2952
Conversation
Skipping CI for Draft Pull Request. |
eaa0cd8
to
e2680ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
e2680ad
to
b47f9e1
Compare
|
||
ogs, err := ogLister.List(k8slabels.Everything()) | ||
if err != nil || len(ogs) == 0 { | ||
return ignoreTimeout, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return an err
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the check for len(ogs) == 0
is a bit redundant the check below. Are we treating "0" different than "not 1"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return an
err
?
I looked into a similar piece of code where we check whether UnsafeFailForward
upgrade strategy is set on an OperatorGroup
:
operator-lifecycle-manager/pkg/controller/registry/resolver/fail_forward.go
Lines 12 to 27 in b5a885e
// IsFailForwardEnabled takes a namespaced operatorGroup lister and returns | |
// True if an operatorGroup exists in the namespace and its upgradeStrategy | |
// is set to UnsafeFailForward and false otherwise. An error is returned if | |
// an more than one operatorGroup exists in the namespace. | |
// No error is returned if no OperatorGroups are found to keep the resolver | |
// backwards compatible. | |
func IsFailForwardEnabled(ogLister v1listers.OperatorGroupNamespaceLister) (bool, error) { | |
ogs, err := ogLister.List(labels.Everything()) | |
if err != nil || len(ogs) == 0 { | |
return false, nil | |
} | |
if len(ogs) != 1 { | |
return false, fmt.Errorf("found %d operatorGroups, expected 1", len(ogs)) | |
} | |
return ogs[0].UpgradeStrategy() == operatorsv1.UpgradeStrategyUnsafeFailForward, nil | |
} |
It allows a case when there is no OperatorGroup
. I don't know if it still makes sense. I was under the impression that OG is required for resolution to succeed, but I didn't dare to handle timeout differently from fail forward.
Also, the check for
len(ogs) == 0
is a bit redundant the check below. Are we treating "0" different than "not 1"?
I see that we can
The idea is that we return either a value specified in the annotation or a default value ignoreTimeout
. Given the above logic around a case when OG does not exist (len(ogs) == 0
) we do not want to return an error since we can return default value. In case when more than one OG exist - we return an error because we do not know which one to respect.
Basically this is exactly the same pattern as in IsFailForwardEnabled
, but I'm happy to change me if you think that special handling of "OG does not exist" situation no longer makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we ought to return the error if the List
call fails and remove the || len(ogs) == 0
. This can let the caller decide whether they want to retry, or are happy with using the default value. Maybe we could also add a ticket for the old code so we get this rechecked. I'm worried silent errors around this code could lead to hard to determine bugs. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this function to return an error from List
and removed || len(ogs) == 0
. From what I know about OLM so far - OperatorGroup
is required and without it resolution will fail eventually. In case of these errors - sync will be requeued (no extra code on caller required).
I'll try to the same for old code in IsFailForwardEnabled
, but in a separate PR. Will see how it goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #2957 for IsFailForwardEnabled
.
d, err := time.ParseDuration(timeoutStr) | ||
if err != nil { | ||
logger.Errorf("failed to parse unpack timeout annotation(%s: %s): %v", BundleUnpackTimeoutAnnotationKey, timeoutStr, err) | ||
return ignoreTimeout, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging vs error return? The behavior here is a bit inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works in the same way it was before the refactoring here: we log an error and ignore the value.
The only case when we want to return an error in this func - is where there are two OperatorGroup
s in which case we can not proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also decided to change how we handle duration parsing error. In master it is also a silent error (we just log) - now I return parse error to the caller which will requeue.
This error is very unlikely to happen as annotation is internal use only (not documented) and is set programmatically (we only use it for E2Es). So I think other than requeuing - we do not need any special handling for it.
b47f9e1
to
0d20d85
Compare
Rebased on top of |
0d20d85
to
8438943
Compare
Rebased from this side |
09dc7a7
to
aea944e
Compare
`operatorframework.io/bundle-unpack-timeout` is an internal annotation used for E2E testing. We need to move this out of InstallPlan in preparation to changes in the unpacking process: OLM will soon be creating unpack jobs before creating InstallPlan so we need to find a new place where we can set this annotation. Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com>
aea944e
to
03b3747
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ankitathomas, m1kola, perdasilva, tmshort The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of the change:
Moves
operatorframework.io/bundle-unpack-timeout
annotation fromInstallPlan
toOperatorGroup
Motivation for the change:
operatorframework.io/bundle-unpack-timeout
is an internal annotation used for E2E testing.We need to move this out of
InstallPlan
in preparation to changes in the unpacking process (see #2942): OLM will soon be creating unpack jobs before creatingInstallPlan
so we need to find a new place where we can set this annotation.Testing remarks:
Updates existing E2E tests
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue