Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube 1.4 cherry picks #11642

Closed
wants to merge 26 commits into from
Closed

Conversation

soltysh
Copy link
Contributor

@soltysh soltysh commented Oct 28, 2016

Resovles #11565.

@liggitt @smarterclayton ptal
@pweil- @mfojtik fyi

The summary (master #PR: title (cherry-pick #PR) - why):

  • 33052: Bump cAdvisor Godeps version to official release tag - Only updated Godep.json to match 1.4 release, no actual change happened.
  • 33086: Fix possible panic in PodAffinityChecker (33166) - Small fix, but can save us from nil panic
  • 33170: Remove closing audit log file and add error check when writing to audit (33172) - Improve audit logging, especially when log operation fails
  • 33346: disallow user to update loadbalancerSourceRanges (33513) - Changes Service validation, not sure we want to wait, but there's a TODO about being
    able to modify that
  • 32807: Fix race condition in setting node statusUpdateNeeded flag (34038) - Race fix
  • 32914: Limit the number of names per image reported in the node status (34157) - Fix OOM master etcd on big clusters
  • 34000: Set deserialization cache size based on target memory usage (34213) - To reduce memory usage to reasonable levels in smaller clusters
  • 33796: Fix issue in updating device path when volume is attached multiple times (34215) - P0 fixing wrong AWS volume mount issue
  • 33735: Fixes in HPA: consider only running pods; proper denominator in avg (34224) - Update which pods are calculated for HPA
  • 33968: scheduler: initialize podsWithAffinity (34266) - Fix panic in scheduler
  • 34076: Remove headers that are unnecessary for proxy target (34280) - P1 issue
  • 34694: Handle DeletedFinalStateUnknown in NodeController (34803) - Fix panic in NodeController caused by receiving DeletedFinalStateUnknown object from the cache.
  • 34809: NodeController waits for informer sync before doing anything (34813) - P0 issue
  • 34251: Fix nil pointer issue when getting metrics from volume mounter (34296) - Possible nil pointer panic
  • 34851: Only wait for cache syncs once in NodeController (34861) - P0 in branch 1.4, P1 in master
  • 34895: Fix non-starting node controller in 1.4 branch - P0 in branch 1.4
  • 34955: HPA: fixed wrong count for target replicas calculations (35406) - P0 issue
  • 35071: Change merge key for VolumeMount to mountPath (35208) - Fixes VolumeMounts edit operation
  • 35273: Fixed mutation warning in Attach/Detach controller (35633) - Fixes cache mutation
  • 34368: Node status updater should SetNodeStatusUpdateNeeded if it fails to (35653) - Fix node status update

Not sure about:

  • 35205 - contains fixes to etcd3
  • 33699 - moves informer to client

@smarterclayton
Copy link
Contributor

Are any of those cherry picks not from 1.4?

On Fri, Oct 28, 2016 at 10:28 AM, Maciej Szulik notifications@github.com
wrote:

Resovles #11565 #11565.

@liggitt https://github.com/liggitt @smarterclayton
https://github.com/smarterclayton ptal
@pweil- https://github.com/pweil- @mfojtik https://github.com/mfojtik
fyi
The summary (master #PR: title (cherry-pick #PR) - why):

33052: Bump cAdvisor Godeps version to official release tag - Only
updated Godep.json to match 1.4 release, no actual change happened.

33086: Fix possible panic in PodAffinityChecker (33166) - Small fix,
but can save us from nil panic

33170: Remove closing audit log file and add error check when writing
to audit (33172) - Improve audit logging, especially when log operation
fails

33346: disallow user to update loadbalancerSourceRanges (33513) -
Changes Service validation, not sure we want to wait, but there's a TODO
about being
able to modify that

32807: Fix race condition in setting node statusUpdateNeeded flag
(34038) - Race fix

32914: Limit the number of names per image reported in the node status
(34157) - Fix OOM master etcd on big clusters

34000: Set deserialization cache size based on target memory usage
(34213) - To reduce memory usage to reasonable levels in smaller clusters

33796: Fix issue in updating device path when volume is attached
multiple times (34215) - P0 fixing wrong AWS volume mount issue

33735: Fixes in HPA: consider only running pods; proper denominator in
avg (34224) - Update which pods are calculated for HPA

33968: scheduler: initialize podsWithAffinity (34266) - Fix panic in
scheduler

34076: Remove headers that are unnecessary for proxy target (34280) -
P1 issue

34694: Handle DeletedFinalStateUnknown in NodeController (34803) - Fix
panic in NodeController caused by receiving DeletedFinalStateUnknown object
from the cache.

34809: NodeController waits for informer sync before doing anything
(34813) - P0 issue

34251: Fix nil pointer issue when getting metrics from volume mounter
(34296) - Possible nil pointer panic

34851: Only wait for cache syncs once in NodeController (34861) - P0
in branch 1.4, P1 in master

34895: Fix non-starting node controller in 1.4 branch - P0 in branch
1.4

34955: HPA: fixed wrong count for target replicas calculations (35406)

  • P0 issue

    35071: Change merge key for VolumeMount to mountPath (35208) - Fixes

    VolumeMounts edit operation

    35273: Fixed mutation warning in Attach/Detach controller (35633) -

    Fixes cache mutation

    34368: Node status updater should SetNodeStatusUpdateNeeded if it
    fails to (35653) - Fix node status update

Not sure about:

  • 35205 - contains fixes to etcd3
  • 33699 - moves informer to client

You can view, comment on, or merge this pull request online at:

#11642
Commit Summary

  • bump(github.com/google/cadvisor): 0cdf4912793fac9990de3790c27334
    2ec31817fb
  • UPSTREAM: 33086: Fix possible panic in PodAffinityChecker
  • UPSTREAM: 33170: Remove closing audit log file and add error check
    when writing to audit
  • UPSTREAM: 33346: disallow user to update loadbalancerSourceRanges
  • UPSTREAM: 32807: Fix race condition in setting node
    statusUpdateNeeded flag
  • UPSTREAM: 32914: Limit the number of names per image reported in the
    node status
  • UPSTREAM: 34000: Set deserialization cache size based on target
    memory usage
  • Deserialization cache changes
  • UPSTREAM: 33796: Fix issue in updating device path when volume is
    attached multiple times
  • UPSTREAM: 33735: Fixes in HPA: consider only running pods; proper
    denominator in avg
  • UPSTREAM: 33968: scheduler: initialize podsWithAffinity
  • UPSTREAM: 34076: Remove headers that are unnecessary for proxy target
  • UPSTREAM: 34694: Handle DeletedFinalStateUnknown in NodeController
  • UPSTREAM: 34809: NodeController waits for informer sync before doing
    anything
  • UPSTREAM: 34251: Fix nil pointer issue when getting metrics from
    volume mounter
  • UPSTREAM: 34851: Only wait for cache syncs once in NodeController
  • UPSTREAM: 34895: Fix non-starting node controller in 1.4 branch
  • UPSTREAM: 34955: HPA: fixed wrong count for target replicas
    calculations
  • UPSTREAM: 35071: Change merge key for VolumeMount to mountPath
  • UPSTREAM: 35273: Fixed mutation warning in Attach/Detach controller
  • UPSTREAM: 34368: Node status updater should
    SetNodeStatusUpdateNeeded if it fails to

File Changes

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11642, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p3H-V5YI0xSiIOqkDc2ADA8jVCajks5q4gaRgaJpZM4Kjf6r
.

@smarterclayton
Copy link
Contributor

[test]

@soltysh
Copy link
Contributor Author

soltysh commented Oct 28, 2016

Are any of those cherry picks not from 1.4?

Nope, just 1.4 branch.

@timothysc
Copy link

timothysc commented Oct 28, 2016

Just out of curiosity, does anything break if we just updated the godep ref? It seems like we're applying a wand here.

Also, a +1 on the etcd3 changes those are fixes to issues we had upstream.

@smarterclayton
Copy link
Contributor

I did a patch rebase but it was failing upstream e2e and is too risky right now. We have to rebase 90 patches upstream right now in order to pick a new bump (which if this had been started two weeks ago would have been the right choice).

@timothysc
Copy link

/cc @mffiedler

@derekwaynecarr
Copy link
Member

This needs to pull in kubernetes/kubernetes#33806 to fix #11199

@soltysh
Copy link
Contributor Author

soltysh commented Oct 31, 2016

@derekwaynecarr I've cherry picked that, but it was a bit tricky, mind having a look if that's ok?
@smarterclayton b0c9f3b and 9607748 are dropped (1st - manually reverted, 2nd - reverted during upgrading cAdvisor)
@bparees I had to revert s2i bump @PI-Victor did, just fyi since it was pointing to non-existent commit

@bparees
Copy link
Contributor

bparees commented Oct 31, 2016

@soltysh did you talk to @PI-Victor about that? those changes are important.

@soltysh soltysh force-pushed the kube_1_4_cherry_picks branch from ddf513b to 64a133a Compare October 31, 2016 12:54
@soltysh
Copy link
Contributor Author

soltysh commented Oct 31, 2016

@bparees yes, if there's a requirement to have those changes in, we need to create a branch and have a official commit that is reachable.

@soltysh
Copy link
Contributor Author

soltysh commented Oct 31, 2016

@derekwaynecarr I had to split the cadvisor cherry-pick into two, one is the bump and the other is that cherry-pick itself. (the commit checker was complaining otherwise).

@ncdc
Copy link
Contributor

ncdc commented Oct 31, 2016

I thought we stated that it was rather late to be pulling in 34000?

@soltysh
Copy link
Contributor Author

soltysh commented Oct 31, 2016

I thought we stated that it was rather late to be pulling in 34000

Haven't heard about it, but if you say so it can be easily dropped.

@ncdc
Copy link
Contributor

ncdc commented Oct 31, 2016

@soltysh see #10719 (comment)

@soltysh
Copy link
Contributor Author

soltysh commented Oct 31, 2016

@ncdc @mfojtik pointed me to the comment before about having those in, I'll blame him ;) but sure I can remove it.

@soltysh soltysh force-pushed the kube_1_4_cherry_picks branch from 64a133a to a802ca1 Compare October 31, 2016 15:45
@soltysh
Copy link
Contributor Author

soltysh commented Oct 31, 2016

I dropped 34000, updated client-go to latest 1.4 and updated generated clientset. Hopefully that should be good.

@derekwaynecarr
Copy link
Member

@soltysh -- the cadvisor bump looks good to me. @sjenning fyi this would fix your issue if you want to verify against this PR.

@sjenning
Copy link
Contributor

@derekwaynecarr on it

@sjenning
Copy link
Contributor

@derekwaynecarr I confirmed this does fix #11199

@soltysh soltysh force-pushed the kube_1_4_cherry_picks branch from a802ca1 to f7027fd Compare November 1, 2016 06:05
@soltysh
Copy link
Contributor Author

soltysh commented Nov 1, 2016

Rebased against latest master and fixed the errors that showed up in tests.

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to f7027fd

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10898/) (Base Commit: 63642d6)

@mwringe
Copy link
Contributor

mwringe commented Nov 1, 2016

Is this being tracking in a bugzilla for OCP anywhere?

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 1, 2016 via email

@ncdc
Copy link
Contributor

ncdc commented Nov 1, 2016

I'll start looking into the panics.

@ncdc
Copy link
Contributor

ncdc commented Nov 1, 2016

The panic is because our vendored copies of k8s.io/kubernetes/pkg/client/restclient.Config and k8s.io/client-go/1.4/rest.Config don't have the same fields. They're identical except the former adds 1: Timeout. And vendor/k8s.io/kubernetes/test/e2e/framework/framework.go#getClientRepoConfig expects the structs to match.

@timothysc
Copy link

ETA on fixes? This is blocking a bunch of stuff.

@ncdc
Copy link
Contributor

ncdc commented Nov 1, 2016

Working on it

@ncdc
Copy link
Contributor

ncdc commented Nov 1, 2016

This is why the former struct has Timeout and the latter doesn't: #11104

@ncdc ncdc mentioned this pull request Nov 1, 2016
@ncdc
Copy link
Contributor

ncdc commented Nov 1, 2016

Testing a fix in #11709

@smarterclayton
Copy link
Contributor

Moving to #11709

@soltysh soltysh deleted the kube_1_4_cherry_picks branch November 2, 2016 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants