-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[e2e failure] [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] [Serial] [Slow] ReplicaSet Should scale ... #54574
Comments
@kubernetes/sig-autoscaling-test-failures |
/priority failing-test |
@DirectXMan12 the moment this test started failing coincides exactly with merging of #53743. Which BTW is a very large commit to HPA which was not tagged with sig-autoscaling and therefore slipped us completely. I think we should revert #53743 for now and merge it again after fixing it. cc: @mwielgus |
I apologize for missing the SIG autoscaling label (although I'm surprised that the bot didn't complain about it. Perhaps because I'm the one who submitted it?). I'll track down why it's failing. |
found the issue. When you write your scaleTargetRef, it's important to actually specify an |
will have a PR in a couple minutes |
EDIT: @liggitt correctly pointed out that I misread things, and that |
PR posted ^ |
PR's continue to await review |
…nd-hpa-gvks Automatic merge from submit-queue (batch tested with PRs 53645, 54734, 54586, 55015, 54688). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix Incorrect Scale Subresources and HPA e2e ScaleTargetRefs The HPA e2es failed to actually set `apiVersion` on the created HPAs, which previous was ignored. Since the polymorphic scale client was merged, this behavior is no longer tolerated (it was never correct to begin with, but it accidentally worked). Additionally, the `apps` resources have their own version of scale. Until `apps/v1beta1` and `apps/v1beta2` go away, we need to support those versions in the scale client. Together, these broke some of the HPA e2es. Fixes #54574 ```release-note NONE ```
/reopen Unless we decide to punt that job from release-master-blocking, this is now impacting 1.9.0-alpha.3 (kubernetes/sig-release#27) |
/kind bug |
/status approved-for-milestone |
I'm also seeing this on our OS image validation testgrid: https://k8s-testgrid.appspot.com/sig-node-cos-image#e2e-gce-cosbeta-k8sdev-serial |
Looking into the current set of failures |
Looking at the failure logs, I'm seeing
So, I tried reproducing locally (provider=local, hack/local-up-cluster.sh), and I cannot. HPA seems to be able to fetch scale properly for replicasets and deployments in a vanilla fresh cluster-up environment. Is there something special about the way we stand up those test environments? |
Going to investigate this too. |
@spiffxp this appears to be passing now. As highlighted by @MaciekPytel on slack/sig-autoscaling, #55413 might be significant here. |
/remove-priority critical-urgent This is still affecting some upgrade tests, which I'm not actively watching yet. Once we hit code freeze, I will be watching them, and will bump priority accordingly. Does something need to be cherry-picked into the release-1.8 branch? |
@dims: GitHub didn't allow me to assign the following users: frobware. Note that only kubernetes members can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/close |
/reopen
eg: triage cluster b75045e2cb613e12dca1
@DirectXMan12 @frobware (taking a total guess) are there fixes that need to be cherry-picked into release-1.8? tracking this against v1.9.0-beta.1 (kubernetes/sig-release#34) |
/remove-priority important-soon |
@spiffxp I'd guess that we'd have to cherry-pick the test suite fixes back to the 1.8 test suite if you've got instances of the 1.8 test suite running against 1.9 code. |
The fix needed should be #54586. Let me try and repro locally (1.9 cluster, 1.8 tests) and see what happens. |
I've reproduced locally. The backport seems to fix the issue (just doing one final test run). Should have a PR up shortly. |
Now tracking against v1.9.0-beta.2 (kubernetes/sig-release#39) |
…est-scale-gvks Automatic merge from submit-queue. [e2e] make sure to specify APIVersion in HPA tests Previously, the HPA controller ignored APIVersion when resolving the scale subresource for a kind, meaning if it was set incorrectly in the HPA's scaleTargetRef, it would not matter. This was the case for several of the HPA e2e tests. Since the polymorphic scale client merged into Kubernetes 1.9, and we need to do upgrade testing, APIVersion now matters. This updates the HPA e2es to care about APIVersion, by passing kind as a full GroupVersionKind, and not just a string. Fixes #54574 (again) ```release-note NONE ```
[MILESTONENOTIFIER] Milestone Issue Needs Attention @DirectXMan12 @spiffxp @kubernetes/sig-autoscaling-misc Action required: During code freeze, issues in the milestone should be in progress. Action Required: This issue has not been updated since Dec 1. Please provide an update. Note: This issue is marked as Example update:
Issue Labels
|
/close |
/priority critical-urgent
/sig autoscaling
This test case started failing recently and affects a number of jobs: triage report
This is affecting multiple jobs on the release-master-blocking testgrid dashboard, and prevents us from cutting 1.9.0-alpha.2 (kubernetes/sig-release#22). Is there work ongoing to bring this job back to green?
triage cluster b75045e2cb613e12dca1
Suspect range from gci-gce-serial: 060b4b8...51244eb
Suspect range from gci-gke-serial: b1e2d7a...82a52a9
The text was updated successfully, but these errors were encountered: