-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing nightly unit test failure #3160
Conversation
Unless service account is explicitly set in taskrun spec, by default, service account is set to "default".
@afrittoli @adshmh we could also look into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is intermittent and time plays role here since the other unit tests in taskrun_test.go are creating default service account (and never deleted). If this unit test TestReconcile_ExplicitDefaultSA is the first one to start, we are missing default service account and fails with this kind of error.
Great catch @pritidesai !!!
serviceaccounts "default" not found
I want to dig into this a little bit:
- I don't quite understand why "default" wouldn't exist - afaik there is always a "default" serviceaccount in every namespace (i think?)
- The error says "default" was not found, but the change in the PR changes the test to use "default" instead of "pipelines"
I think this makes sense if the problem was that that "pipelines" service account didnt existing, but the error above looks like it's the other way around - could you explain a bit more?
I also don't totally understand how this could have ever worked since I don't see anything creating the "pipelines" service account either 🤔
@@ -359,7 +359,7 @@ func TestReconcile_ExplicitDefaultSA(t *testing.T) { | |||
tb.TaskRunServiceAccountName("test-sa"), | |||
)) | |||
taskruns := []*v1beta1.TaskRun{taskRunSuccess, taskRunWithSaSuccess} | |||
defaultSAName := "pipelines" | |||
defaultSAName := "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the PR that added this change (#1227) and trying to understand the intent, i think the goal was to explicitly set the serviceaccount to something other than default, is that your understanding as well? (or maybe it just needs to be something other than "tekton" https://github.com/tektoncd/pipeline/blob/master/pkg/apis/config/testdata/config-defaults.yaml#L22 ?)
it looks like the default value is "default" so i think by changing this to "default" we might be removing the thing this test is meant to cover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup thats my understanding as well
This particular unit test is testing two things:
(1) taskrun without having specified service account in the spec unlike the next test case in the same function
(2) changing default service account through config map
Trying to think if it's possible to separate these two.
And I am not sure if we could be removing the thing this test is meant to cover because default value is tekton
according to testdata
and irrespective of what it is set to (other than tekton
), the same value is set in config-map and sent to controller. 🤔
But the value(tekton
) in testdata
is not reflected as default service account either since Scott's error message says the taskrun controller is looking for default
service account instead of tekton
when task spec is missing service account.
I am still looking into this and would appreciate input from config map experts 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran TestReconcile which has a list of tasks without any service account set in the task spec. So taskrun
controller is relying on default service account. The test is creating service account as part of the fake client. Now, this test runs fine which means default service account is default
and not tekton
for taskrun controller and changing this to zoo
in fake client, results in failure:
{"level":"error","ts":1599113959.2378,"logger":"fallback","caller":"taskrun/taskrun.go:377","msg":"Failed to create task run pod for taskrun \"test-taskrun-run-success\": failed to create task run pod \"test-taskrun-run-success\": translating TaskSpec to Pod: serviceaccounts \"default\" not found. Maybe missing or invalid Task foo/test-task", ...
Tracing this error takes us to the creds init where serviceaccount
is hardcoded to default
.
Should this be read from config-map or somewhere else instead of having it hardcoded to default
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on getting this hard coded default out of creds_init.go
👩💻 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created PR #3168 to remove this hard coded default and also add it to the pipelinerun/taskrun spec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the problem is the hardcoded default
serviceaccount (although, it would be nice if it wasn't harcoded 😛 )
This unit test creates
The unit test is creating |
@pritidesai i am still confused but I think it's just because I need to take some time to understand this and it's the evening before a long weekend and I just can't do it! Apologies for my confusion and weekend brain! I'll try to summarize it again but you've probably already addressed it:
I think it makes sense to me that the default service account would exist on a running instance, but not in this mock environment, and i can see how the order of test execution could make that sometimes not get created, i just cant understand how the change in the PR goes along with that 😦 (and you've probably explained it already!!) Happy to defer to you in the meantime and I appreciate your thorough analysis! I might be back with more questions later but in the meantime: /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bobcatfish The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
No worries @bobcatfish yup I can understand it being so tricky and appreciate you following the entire analysis. This screen shot from Scott's error capture might be useful understanding and feel free to hold this unless you are sure. Even when the Since I am not able to reproduce this, I am feeding off of this error capture and changing it to I have opened another PR # 3168 related to the findings from this effort. PR #3168 is implementing changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this is not the right fix 🙃
As far as I understand, we do create the serviceAccount
during the tests. One thing that we do not do (and that we do when we setup tasks, …) is to make sure the informers are up-to-date too.
I take a quick look at it, and this is the "fix" I came up with master...vdemeester:3160-sa-unit-failure (@bobcatfish @pritidesai)
@@ -359,7 +359,7 @@ func TestReconcile_ExplicitDefaultSA(t *testing.T) { | |||
tb.TaskRunServiceAccountName("test-sa"), | |||
)) | |||
taskruns := []*v1beta1.TaskRun{taskRunSuccess, taskRunWithSaSuccess} | |||
defaultSAName := "pipelines" | |||
defaultSAName := "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the problem is the hardcoded default
serviceaccount (although, it would be nice if it wasn't harcoded 😛 )
Thanks @vdemeester I am closing this for now assuming this is fixed with PR #3189 🙏 |
Changes
Unless service account is explicitly set in taskrun spec, by default, service account is set to "default".
Scott's error capture says:
test-taskrun-run-success
taskrun has no service account set. Later in the config map,default-service-account
is set topipelines
. But if service account is not specified in thetaskrun
spec, task run controller sets the service account todefault
which is clear from the error message where controller is looking for service accountdefault
instead ofpipelines
.It can also mean that the config map settings/updates done in the unit test are not visible to taskrun controller which needs fix.
This error is intermittent and time plays role here since the other unit tests in
taskrun_test.go
are creatingdefault
service account (and never deleted). If this unit testTestReconcile_ExplicitDefaultSA
is the first one to start, we are missingdefault
service account and fails with this kind of error.Fixes #2815
/cc @bobcatfish @sbwsg
/kind bug
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.
Release Notes