Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore Upgrade Test Scenario2 by creating simple Task and Pipeline resources #6855

Merged
merged 1 commit into from
Jul 10, 2023

Conversation

JeromeJu
Copy link
Member

@JeromeJu JeromeJu commented Jun 21, 2023

Changes

Scenario 2 of the upgrade test aims to install pipeline server from the
preivous release, create resources from that version and test on those
resources against the client of the current release.

This commit restores the resources creation of simple Task and Pipeline
resources to enable the 2nd scenario of the upgrade test to work by
applying the resources from the old server version of pipeline. It was
preivously renamed in #1351 and then removed in #2685.

It fixes the missing piece where the required resources from the previous
release have not been created to accomplish scenario 2 of the upgrade
test. More specifically, it creates a simple Task and Pipeline resources
and then runs the two simple resources with a TaskRun and PipelineRun
created after upgrading to the current version and verify that they both
run successfully.

/kind misc
Closes #6868

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

@tekton-robot tekton-robot added kind/misc Categorizes issue or PR as a miscellaneuous one. release-note-none Denotes a PR that doesnt merit a release note. labels Jun 21, 2023
@tekton-robot tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 21, 2023
@lbernick lbernick self-assigned this Jun 21, 2023
@Yongxuanzhang
Copy link
Member

maybe add a link to the issue/discussion?

Copy link
Member

@lbernick lbernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested locally? When I tried running the upgrade tests with these changes, I saw:

==== APPLYING THE RESOURCES TASKRUNS ====
=========================================
>> Applying the resource taskrun
find: ‘/usr/local/google/home/leebernick/tekton/pipeline/examples/taskruns/’: No such file or directory
=============================================
==== APPLYING THE RESOURCES PIPELINERUNS ====
=============================================
>> Applying the resource pipelinerun
find: ‘/usr/local/google/home/leebernick/tekton/pipeline/examples/pipelineruns/’: No such file or directory

Building on Yongxuan's feedback, I think a more descriptive commit message and title would be helpful here. Example title: "Restore missing coverage of upgrading Pipelines version with stored crds in upgrade tests".

Some helpful info in the commit message would be:

  • a description of the user journey we're testing (a user has an existing cluster with pipelines installed and crds created, and wants to upgrade their pipelines version)
  • an explanation for why this piece of functionality got removed (linking to PRs would be helpful here), and why the existing behavior is incorrect

Also, I'd label this a "bug" not "misc" fwiw

echo ">> Applying the resource ${resource}"
# Applying the resources, either *taskruns or * *pipelineruns
for file in $(find ${REPO_ROOT_DIR}/examples/${resource}s/ -name *.yaml | sort); do
perl -p -e 's/gcr.io\/christiewilson-catfactory/$ENV{KO_DOCKER_REPO}/g' ${file} | ko apply -f - || return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this should have an exit status 1 if it doesn't succeed, and the script is run with set +o errexit, but when I ran this, the script didn't exit when this step failed. Do you know why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check for the path that we are iterating before the for loop.

Particularly I used v1 examples to avoid creating resources that could already exist in the same namespace.

@JeromeJu JeromeJu force-pushed the upgrade-test-scenario2 branch 2 times, most recently from 948a367 to c0c7d9c Compare June 22, 2023 20:05
@tekton-robot tekton-robot added kind/bug Categorizes issue or PR as related to a bug. and removed kind/misc Categorizes issue or PR as a miscellaneuous one. labels Jun 22, 2023
@JeromeJu
Copy link
Member Author

Example gist: https://gist.github.com/JeromeJu/86e7b96e2716db77ee3b79903527c0cb

Some of the tests failed as expected ie. pipelinerun.yaml because the KO_DOCKER_REPO is not set the same as in CI because we need a kind cluster locally.

Copy link
Member

@lbernick lbernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your gist, it looks like the script is still not working as intended. I see

=========================================
==== APPLYING THE RESOURCES TASKRUNS ====
=========================================
>> Applying the resource taskrun
error: error when deleting "STDIN": resource name may not be empty
Error: exit status 1
2023/06/22 19:16:52 error during command execution:exit status 1
taskrun.tekton.dev/tkn-arg-test-d7st5 created
error: error when deleting "STDIN": resource name may not be empty
Error: exit status 1
2023/06/22 19:16:52 error during command execution:exit status 1
taskrun.tekton.dev/array-with-default-xfb59 created
secret "ssh-key-for-git" deleted
serviceaccount "ssh-key-service-account" deleted
Error from server (NotFound): error when deleting "STDIN": taskruns.tekton.dev "authenticating-git-commands" not found
Error: exit status 1
....

I think we want the script to fail if any of these steps fail, rather than relying on specific checks. This will make failures more clear, rather than having to inspect logs to determine success/failure. You can do this with set -e: https://www.tutorialspoint.com/aborting-a-shell-script-on-any-command-fails

@JeromeJu
Copy link
Member Author

Based on your gist, it looks like the script is still not working as intended. I see

=========================================
==== APPLYING THE RESOURCES TASKRUNS ====
=========================================
>> Applying the resource taskrun
error: error when deleting "STDIN": resource name may not be empty
Error: exit status 1
2023/06/22 19:16:52 error during command execution:exit status 1
taskrun.tekton.dev/tkn-arg-test-d7st5 created
error: error when deleting "STDIN": resource name may not be empty
Error: exit status 1
2023/06/22 19:16:52 error during command execution:exit status 1
taskrun.tekton.dev/array-with-default-xfb59 created
secret "ssh-key-for-git" deleted
serviceaccount "ssh-key-service-account" deleted
Error from server (NotFound): error when deleting "STDIN": taskruns.tekton.dev "authenticating-git-commands" not found
Error: exit status 1
....

I think we want the script to fail if any of these steps fail, rather than relying on specific checks. This will make failures more clear, rather than having to inspect logs to determine success/failure. You can do this with set -e: https://www.tutorialspoint.com/aborting-a-shell-script-on-any-command-fails

I totally agree with this that we want the test to fail once any step fails and wouldn't want any red herring in the logs.

But I think these logs come from the ko delete -f - I added at line 186 just to make sure there were no previous examples happen to be created in the namespace then the correct version of resources couldn't get created. These error messages are indicating the resources are not existent in the namespace for the upgrade test, I think they could be ignored.

If this makes sense I hope adding the doc string could help resolve this confusion here?

@JeromeJu JeromeJu force-pushed the upgrade-test-scenario2 branch 2 times, most recently from 1ed71ee to 1054d90 Compare June 23, 2023 12:40
@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 23, 2023
@lbernick
Copy link
Member

I totally agree with this that we want the test to fail once any step fails and wouldn't want any red herring in the logs.

But I think these logs come from the ko delete -f - I added at line 186 just to make sure there were no previous examples happen to be created in the namespace then the correct version of resources couldn't get created.

It sounds like you're trying to make sure resources get cleaned up between runs of the test, so each local run starts with a clean slate. This is a good idea, but this isn't how I'd recommend doing it. Instead, it would be better to have a function that cleans up any resources that were created when the script exits, regardless of whether it exited with success or failure. Here's an example of where we use trap to undo setup steps when a script exits.

These error messages are indicating the resources are not existent in the namespace for the upgrade test, I think they could be ignored. If this makes sense I hope adding the doc string could help resolve this confusion here?

It's not beneficial to have lines in any script that don't execute successfully. Take a look at the section of this bash cheatsheet titled "Writing robust scripts and debugging". If commands don't execute successfully (i.e. exit status 0), we'd prefer a fail-fast behavior of exiting the script immediately and failing tests. Exiting as soon as an error occurs makes subsequent steps easier to debug, and makes it clear (via test failure) that there is a problem with the script. I don't think it's realistic here to document what error messages the tests are supposed to output, and for someone to have to read through test logs and documentation to understand if the test is working correctly or not.

Let me know if you'd like to pair on this script together, and please address all previous feedback (incl. commit message + PR title) before requesting another round of review.

@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 28, 2023
@JeromeJu
Copy link
Member Author

JeromeJu commented Jun 28, 2023

Thanks @lbernick for the pointers!

I've added the gist at https://gist.github.com/JeromeJu/92058e20dc2a15ac7a470c71248bd211 for the scenario ii part.
Created a simple Task and Pipeline in the upgrade directory within the upgrade namespace because the verify_pipeline_installation would delete all pipeline CRDs in the current namespace. Also left them there as these might be able to test against the case mentioned and discussed here for a simple Task/ Pipeline to test when we are testing the old server version #5784 (comment)

After some attempts I also felt that it might be easier to keep the whole script in shell script for now and migrate it to go in a whole due to the reasons that there are helpers in the vendor library i.e.

.

@lbernick
Copy link
Member

/hold
I'd like #6871 to be merged first, as it will give us more confidence in the changes we're making to these tests.

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 28, 2023
Copy link
Member

@lbernick lbernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Jerome, this gist LGTM!

test/e2e-tests-upgrade.sh Outdated Show resolved Hide resolved
test/upgrade/simpleResources.yaml Outdated Show resolved Hide resolved
test/upgrade/simpleResources.yaml Outdated Show resolved Hide resolved
test/upgrade/simpleResources.yaml Show resolved Hide resolved
test/e2e-tests-upgrade.sh Outdated Show resolved Hide resolved
@JeromeJu JeromeJu force-pushed the upgrade-test-scenario2 branch 2 times, most recently from ea47654 to 3ee0608 Compare June 28, 2023 19:34
Copy link
Member

@lbernick lbernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Jerome, can you please address the feedback about the PR title? Also, I think the PR description references #2685, but the missing function was actually renamed in #1351

test/e2e-tests-upgrade.sh Outdated Show resolved Hide resolved
test/upgrade/simpleResources.yaml Outdated Show resolved Hide resolved
test/upgrade/simpleResources.yaml Show resolved Hide resolved
@JeromeJu JeromeJu changed the title Restore apply_resources for scenario2 of upgrade test Restore Upgrade Test Scenario2 by creating resources Jun 29, 2023
@JeromeJu JeromeJu changed the title Restore Upgrade Test Scenario2 by creating resources Restore Upgrade Test Scenario2 by creating simple Task and Pipeline resources Jun 29, 2023
@JeromeJu JeromeJu force-pushed the upgrade-test-scenario2 branch 4 times, most recently from 61f3ad7 to 21419ba Compare June 29, 2023 18:14
test/e2e-tests-upgrade.sh Outdated Show resolved Hide resolved
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2023
…esources

Scenario 2 of the upgrade test aims to install pipeline server from the
preivous release, create resources from that version and test on those
resources against the client of the current release.

This commit restores the resources creation of simple Task and Pipeline
resources to enable the 2nd scenario of the upgrade test to work by
applying the resources from the old server version of pipeline. It was
preivously renamed in tektoncd#1351 and then removed in tektoncd#2685.

It fixes the missing piece where the required resources from the previous
release have not been created to accomplish scenario 2 of the upgrade
test. More specifically, it creates a simple Task and Pipeline resources
and then runs the two simple resources with a TaskRun and PipelineRun
created after upgrading to the current version and verify that they both
run successfully.

/kind misc
Closes tektoncd#6868
@JeromeJu
Copy link
Member Author

JeromeJu commented Jul 7, 2023

/unhold
since #6871 has merged

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 7, 2023
@chitrangpatel
Copy link
Contributor

/assign

@chitrangpatel
Copy link
Contributor

Thanks @JeromeJu !

@chitrangpatel
Copy link
Contributor

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 10, 2023
@tekton-robot tekton-robot merged commit 161ff7d into tektoncd:main Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesnt merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing coverage of stored resources in upgrade tests
5 participants