Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull-kubernetes-e2e-gce instantly failing all runs with python error #30759

Closed
liggitt opened this issue Sep 19, 2023 · 17 comments
Closed

pull-kubernetes-e2e-gce instantly failing all runs with python error #30759

liggitt opened this issue Sep 19, 2023 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@liggitt
Copy link
Member

liggitt commented Sep 19, 2023

What happened:

looks like something just changed in test-infra and broke the pull-kubernetes-e2e-gce job:

Traceback (most recent call last):
  File "/workspace/./test-infra/jenkins/bootstrap.py", line 1204, in <module>
    bootstrap(ARGS)
  File "/workspace/./test-infra/jenkins/bootstrap.py", line 1072, in bootstrap
    logging.info('Builder: %s', node())
  File "/workspace/./test-infra/jenkins/bootstrap.py", line 590, in node
    os.environ[NODE_ENV] = urllib.request.urlopen(urllib.request.Request(
  File "/usr/lib/python3.9/os.py", line 684, in __setitem__
    value = self.encodevalue(value)
  File "/usr/lib/python3.9/os.py", line 756, in encode
    raise TypeError("str expected, not %s" % type(value).__name__)
TypeError: str expected, not bytes

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

seen in all runs of kubernetes/kubernetes#120755

Please provide links to example occurrences, if any:

Anything else we need to know?:

@liggitt liggitt added the kind/bug Categorizes issue or PR as related to a bug. label Sep 19, 2023
@liggitt
Copy link
Member Author

liggitt commented Sep 19, 2023

/priority critical-urgent

@k8s-ci-robot
Copy link
Contributor

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

  • /sig <group-name>
  • /wg <group-name>
  • /committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Sep 19, 2023
@aojea
Copy link
Member

aojea commented Sep 19, 2023

@ameukam it seems the images were. bumped recently on the jobs, can be related to one of these changes #30695?

@ameukam
Copy link
Member

ameukam commented Sep 19, 2023

@ameukam it seems the images were. bumped recently on the jobs, can be related to one of these changes #30695?

@aojea I don't think it's related. #30695 triggered the build of new images but the failing test still use gcr.io/k8s-staging-test-infra/kubekins-e2e:v20230727-ea685f8747-master (from https://prow.k8s.io/prowjob?prowjob=eaf97c3f-3b14-4789-a02c-1a468cb2e565)

@kannon92
Copy link
Contributor

It impacts a lot more jobs than just e2e-gce.

I saw it this AM when I was trying to test new presubmits I added.

/test pull-crio-cgroupv1-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-features
/test pull-crio-cgroupv1-node-e2e-hugepages
/test pull-crio-cgroupv1-node-e2e-resource-managers

Were some examples of those that failed. And I also see node-e2e-containerd failing now. Seems to be any job that uses the bootstrap.py image.

@liggitt
Copy link
Member Author

liggitt commented Sep 19, 2023

the testgrid is not reflecting all the failures, this appears to be failing on all PRs

@liggitt
Copy link
Member Author

liggitt commented Sep 19, 2023

@rphillips
Copy link
Member

@liggitt attached a PR that might fix this issue, but there might be other locations that break.

@BenTheElder
Copy link
Member

bootstrap.py jobs will clone test-infra so changes to the scripts will be picked up immediately (yes, it's terrible, and deprecated)

changes to the image should be controlled by the tags

@BenTheElder
Copy link
Member

first failing run I see with this failure is at 11:07AM(ET): https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/120755/pull-kubernetes-e2e-gce/1704150135245639680

That run appears to have gcr.io/k8s-staging-test-infra/kubekins-e2e:v20230727-ea685f8747-master => ea685f8747

The bootstrap script itself hasn't had a commit for 6 months, did we change one of the job parameters, maybe in a preset or similar?

Or actually, maybe something changed with the compute metadata service?

@dims
Copy link
Member

dims commented Sep 19, 2023

here's how i could recreate the problem

export IMAGE=gcr.io/k8s-staging-test-infra/kubekins-e2e:v20230727-ea685f8747-master
docker run --privileged --rm   --entrypoint=/bin/bash   -it   $IMAGE -c "/bin/bash"
>>> import urllib.request, urllib.error, urllib.parse
>>> import os
>>> x = urllib.request.urlopen(urllib.request.Request('http://www.google.com')).read()
>>> print(type(x))
<class 'bytes'>
>>> os.environ['test']=x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/os.py", line 684, in __setitem__
    value = self.encodevalue(value)
  File "/usr/lib/python3.9/os.py", line 756, in encode
    raise TypeError("str expected, not %s" % type(value).__name__)
TypeError: str expected, not bytes

@rphillips
Copy link
Member

Kicked off a test here. Seems to be running.

@BenTheElder
Copy link
Member

(#30763 should be live in bootstrap.py jobs now and per above comment seems to be working)

@liggitt
Copy link
Member Author

liggitt commented Sep 19, 2023

confused about why but glad #30763 seems to have worked

@dims
Copy link
Member

dims commented Sep 19, 2023

#30763 is the right thing to land as long term in python3 that's how the code should work. we were probably lucky that it worked this long

@liggitt
Copy link
Member Author

liggitt commented Sep 19, 2023

resolving this as the tests are progressing now, thanks for the fix... $1 for tapping with a hammer, $999 for knowing where to tap

@liggitt liggitt closed this as completed Sep 19, 2023
@BenTheElder
Copy link
Member

I suspect either:

  • The metadata service response changed and caused this path to no longer implicitly decode as UTF-8 (maybe HTTP headers), I could imagine this url fetch method internally decodes based on sniffing headers etc.
  • or less likely: We stopped setting NODE_ENV somewhere causing that codepath to run (...unlikely, but lots to dig through to confirm)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

8 participants