Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: IgnoreNotFound prevents a workflow error when the resource does not exist #1834

Closed
wants to merge 6 commits into from

Conversation

dmayle
Copy link
Contributor

@dmayle dmayle commented Dec 11, 2019

This implements the solution proposed in:
#1823

Adds an 'ignoreNotFound' value to the resource object in the workflow manifest so that kubectl won't throw an error when the resource does not exist.

While this was implemented in order to allow checking for the existence of a resource, it is also useful in other cases, such as an idempotent delete step.

@dmayle
Copy link
Contributor Author

dmayle commented Dec 12, 2019

I've checked, and the test is broken in trunk, and the error does not come from this code. Also, I have already signed the CLA

@@ -265,6 +272,12 @@ func (we *WorkflowExecutor) SaveResourceParameters(resourceNamespace string, res
if param.ValueFrom == nil {
continue
}
if we.Template.Resource.IgnoreNotFound && resourceName == "" {
// Resource wasn't found, so we populate result with sentinel value
output := "<NOT_FOUND>"
Copy link
Member

@simster7 simster7 Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you provided a sentinel value? Using kubectl get ... -ignore-not-found=true directly doesn't provide one and I don't think any user would expect a value to be filled in if a resource doesn't exist.

Considering your use case from your proposal:

...
  - name: check-for-existence
    resource:
      action: get
      ignoreNotFound: true
      manifest: |
        apiVersion: v1
        kind: Pod
        metadata:
          name: does-not-exist
          namespace: argo-workflows
    outputs:
      parameters:
      - name: not-found
        valueFrom:
          jsonPath: '{.metadata.name}'

A user would have to know what the sentinel value is before checking it against the result of '{.metadata.name}' to determine if a resource actually exists or not, or would have to check if the result of '{.metadata.name}' and name match. It would be more logical to simply check if '{.metadata.name}' is empty or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that there is a sentinel value is because we have three separate possibilities from any jsonpath expression: 1) non-empty value like "foo", 2) empty value like "{.metadata.annotations.imageregistry}" == "", and 3) non-existent entity.

If we don't provide a sentinel value, it's still possible to detect the difference between cases (2) and (3), but it requires the developer to either use two steps (one with a known existing jsonpath like .metadata.name followed by the desired jsonpath) or by including a marker in the jsonpath expression (e.g. '<{.metadata.annotations.imageregistry}>' which can than be compared with "<>" for empty vs "" for non-existent). Both of these require considerable extra costs to development, either extra runtime costs or string parsing with every result.

Using a sentinel value provides a nice balance that makes usage simple and efficient for most use cases. In the case where the user happens to be using the exact sentinel value chosen here, they then have to choose a solution like adding markers to the jsonpath.

Adding a custom sentinel value with an additional flag would also solve that problem, but that would be adding more code and configuration for a use case that is unlikely to occur in actual usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, thanks for explaining.

@alexec What do you think about this? I am leaning towards not including it. Even though it will force developers to have to do an extra check, I think it is more intuitive for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought wast that it should probably be a sentinel error. They are more clearly defined. However, as I understand it, you want to:

  1. Ignore "not found" errors.
  2. Allow subsequent steps to understand that the resource was not found and act on it.

Am I correct? If that's the case, could we say that .metadata.creationTImestamp is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexec The main point of this feature is to disassociate the kubectl not-found error from the workflow error domain.
Right now, the "resource" template subsumes all kubectl errors, which would normally be a useful behavior, but in the case of kubectl, they made a decision to return an error code when looking for a resource that doesn't exist (or for deleting one).

While this can be useful in the guise of a shell script, where you can add per-command error handling, in argo-workflows there is no per-step handling of errors, so gracefully handling this case is problematic.

We could switch the behavior over wholesale, but that might be surprising to existing users. This PR just adds the option to the user to set the --ignore-not-found flag on get or delete operations. With this set, deleting a resource becomes idempotent in the case of a race condition (delete if found, ignore if someone else deleted it first), and getting a resource can now also be used to check for the existence of a resource (something that I need enough that I'm already maintaining a local fork to do so).

The added behavior that we're discussing here is specifically about how to return this additional piece of information to the developer in a way that is useful to consume. We only return a single string, so we have to choose how to encode the new information.

As you've indicated, we could force all developers who are trying to retrieve some field value to use two steps in their workflow (One to check a value like.metadata.creationTimestamp and one to check the desired value). This performs an additional API RPC call, adds argo executor overhead, and requires additional workflow complexity.

We could also suggest that the user add their own markers, but that requires them to split out the marker on every step to get a useful value.

The code here returns a fixed sentinel value that is unlikely to appear in the wild. I've chosen the characters to be outside of the set used in kubernetes identifiers, and unlikely to occur in a natural language sentence. This way, the check is as simple as a string comparison for the sentinel.

pkg/apis/workflow/v1alpha1/workflow_types.go Show resolved Hide resolved
@alexec
Copy link
Contributor

alexec commented Dec 19, 2019

@simster7 I've assigned you an owner as you've already looked into this.

@simster7
Copy link
Member

simster7 commented Jan 2, 2020

@dmayle Any plans on finishing this up? We would like to get this merged

@dmayle
Copy link
Contributor Author

dmayle commented Jan 3, 2020 via email

@simster7
Copy link
Member

simster7 commented Jan 3, 2020

No preference! Commits get squashed when merging to master anyway

@simster7 simster7 changed the title IgnoreNotFound prevents a workflow error when the resource does not exist feat: IgnoreNotFound prevents a workflow error when the resource does not exist Jan 6, 2020
@codecov
Copy link

codecov bot commented Jan 7, 2020

Codecov Report

Merging #1834 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1834   +/-   ##
=======================================
  Coverage   10.63%   10.63%           
=======================================
  Files          35       35           
  Lines       24992    24992           
=======================================
  Hits         2657     2657           
  Misses      21996    21996           
  Partials      339      339

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87c497f...d03aa8d. Read the comment docs.

@dmayle
Copy link
Contributor Author

dmayle commented Jan 7, 2020

The argo CI check failed because of a transient network error connecting to Github. Is there anyway to retrigger just that check?

@simster7
Copy link
Member

simster7 commented Jan 7, 2020

Don't worry about the Argo CI check, we treat that one as flakey

@dmayle
Copy link
Contributor Author

dmayle commented Jan 7, 2020

I've had to rebase again onto master because of the generated workflow file.

@dmayle
Copy link
Contributor Author

dmayle commented Jan 9, 2020

Any thoughts?

@simster7
Copy link
Member

(@dmayle Just FYI, we're a bit busy with the upcoming API Server release and will get to this a couple of days after the release)

@dmayle
Copy link
Contributor Author

dmayle commented Jan 15, 2020

@simster7 thanks for the heads up!

@alexec alexec modified the milestones: v2.5, v2.6 Jan 27, 2020
@alexec
Copy link
Contributor

alexec commented Jan 31, 2020

@simster7 are you happy to be the assigned reviewer of this PR?
@dmayle you'll probably need to do a git merge master on the PR before review.

@alexec alexec removed this from the v2.6 milestone Feb 6, 2020
Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you do a git merge master?

@simster7 simster7 added this to the v2.6 milestone Feb 19, 2020
@alexec alexec removed this from the v2.6 milestone Feb 19, 2020
@CLAassistant
Copy link

CLA assistant check
All committers have signed the CLA.

@simster7
Copy link
Member

@dmayle Seems like #1779 solves the problem in this issue. Can you confirm? Closing for now

@simster7 simster7 closed this May 12, 2020
@dmayle
Copy link
Contributor Author

dmayle commented May 14, 2020

@simster7 #1779 allows me to pass the --ignore-not-found flag, but ExecResource() expects JSON output (doesn't allow empty output). I'll try to get around to a new patch based on this new version

@simster7
Copy link
Member

Thanks @dmayle. Sorry for the absolute crazy delay on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants