Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake: openshift router metrics The HAProxy router [It] should expose the profiling endpoints #15356

Closed
sosiouxme opened this issue Jul 19, 2017 · 8 comments
Assignees
Labels
component/routing kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1

Comments

@sosiouxme
Copy link
Member

sosiouxme commented Jul 19, 2017

Last three openshift-ansible merge tests all failed the same way so should maybe be kind/test-broken:

1 2 3

[Fail] [Conformance][networking][router] openshift router metrics The HAProxy router [It] should expose the profiling endpoints 
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:183
@sosiouxme sosiouxme added the kind/test-flake Categorizes issue or PR as related to test flakes. label Jul 19, 2017
@sosiouxme
Copy link
Member Author

------------------------------
[Conformance][networking][router] openshift router metrics The HAProxy router 
  should expose the profiling endpoints
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:184

[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:52
[BeforeEach] [Conformance][networking][router] openshift router metrics
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:130
STEP: Creating a kubernetes client
Jul 19 15:29:40.595: INFO: >>> kubeConfig: /etc/origin/master/admin.kubeconfig
STEP: Building a namespace api object
Jul 19 15:29:40.621: INFO: configPath is now "/tmp/extended-test-router-metrics-j471x-c62cz-user.kubeconfig"
Jul 19 15:29:40.621: INFO: The user is now "extended-test-router-metrics-j471x-c62cz-user"
Jul 19 15:29:40.621: INFO: Creating project "extended-test-router-metrics-j471x-c62cz"
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [Conformance][networking][router] openshift router metrics
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:65
[It] should expose the profiling endpoints
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:184
Jul 19 15:29:40.728: INFO: Creating new exec pod
STEP: preventing access without a username and password
Jul 19 15:29:48.748: INFO: Running '/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl --server=https://ip-172-18-2-207.ec2.internal:8443 --kubeconfig=/etc/origin/master/admin.kubeconfig exec --namespace=extended-test-router-metrics-j471x-c62cz execpod -- /bin/sh -c curl -s -o /dev/null -w '%{http_code}' "http://172.18.2.207:1936/debug/pprof/heap"'
Jul 19 15:29:49.605: INFO: stderr: ""
STEP: at /debug/pprof
Jul 19 15:29:49.605: INFO: Running '/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl --server=https://ip-172-18-2-207.ec2.internal:8443 --kubeconfig=/etc/origin/master/admin.kubeconfig exec --namespace=extended-test-router-metrics-j471x-c62cz execpod -- /bin/sh -c curl -s -u admin:ssZ4KvoRn1 "http://172.18.2.207:1936/debug/pprof/heap"'
Jul 19 15:29:49.963: INFO: stderr: ""
[AfterEach] [Conformance][networking][router] openshift router metrics
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:131
STEP: Collecting events from namespace "extended-test-router-metrics-j471x-c62cz".
STEP: Found 6 events.
Jul 19 15:29:49.978: INFO: At 2017-07-19 15:29:40 -0400 EDT - event for execpod: {default-scheduler } Scheduled: Successfully assigned execpod to ip-172-18-2-207.ec2.internal
Jul 19 15:29:49.978: INFO: At 2017-07-19 15:29:40 -0400 EDT - event for execpod: {kubelet ip-172-18-2-207.ec2.internal} SuccessfulMountVolume: MountVolume.SetUp succeeded for volume "default-token-8fvrc" 
Jul 19 15:29:49.978: INFO: At 2017-07-19 15:29:42 -0400 EDT - event for execpod: {kubelet ip-172-18-2-207.ec2.internal} Pulling: pulling image "gcr.io/google_containers/hostexec:1.2"
Jul 19 15:29:49.978: INFO: At 2017-07-19 15:29:46 -0400 EDT - event for execpod: {kubelet ip-172-18-2-207.ec2.internal} Pulled: Successfully pulled image "gcr.io/google_containers/hostexec:1.2"
Jul 19 15:29:49.978: INFO: At 2017-07-19 15:29:46 -0400 EDT - event for execpod: {kubelet ip-172-18-2-207.ec2.internal} Created: Created container
Jul 19 15:29:49.978: INFO: At 2017-07-19 15:29:47 -0400 EDT - event for execpod: {kubelet ip-172-18-2-207.ec2.internal} Started: Started container
Jul 19 15:29:49.983: INFO: POD                         NODE                          PHASE    GRACE  CONDITIONS
Jul 19 15:29:49.983: INFO: docker-registry-1-lxcwq     ip-172-18-2-207.ec2.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:22:28 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:22:31 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:22:28 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: registry-console-1-87bxk    ip-172-18-2-207.ec2.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:22:49 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:23:19 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:22:49 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: router-1-vkj1l              ip-172-18-2-207.ec2.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:21:44 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:22:04 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:21:44 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: update-demo-nautilus-5d207  ip-172-18-2-207.ec2.internal  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:47 -0400 EDT  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:47 -0400 EDT ContainersNotReady containers with unready status: [update-demo]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:47 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: update-demo-nautilus-d4gf4  ip-172-18-2-207.ec2.internal  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:47 -0400 EDT  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:47 -0400 EDT ContainersNotReady containers with unready status: [update-demo]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:47 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: redis-master-0xrhl          ip-172-18-2-207.ec2.internal  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:19 -0400 EDT  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:42 -0400 EDT ContainersNotReady containers with unready status: [redis-master]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:19 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: ss-0                        ip-172-18-2-207.ec2.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:25 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:45 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:25 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: execpod                     ip-172-18-2-207.ec2.internal  Running  1s     [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:40 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:48 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-19 15:29:40 -0400 EDT  }]
Jul 19 15:29:49.983: INFO: 
Jul 19 15:29:49.985: INFO: 
Logging node info for node ip-172-18-2-207.ec2.internal
Jul 19 15:29:49.986: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:ip-172-18-2-207.ec2.internal,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/ip-172-18-2-207.ec2.internal,UID:3d7e5ec9-6cb7-11e7-89e6-0ec824a20892,ResourceVersion:2123,Generation:0,CreationTimestamp:2017-07-19 15:19:52 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: ip-172-18-2-207.ec2.internal,region: infra,zone: default,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NodeSpec{PodCIDR:,ExternalID:ip-172-18-2-207.ec2.internal,ProviderID:aws:////i-0d92f18ae096d9bca,Unschedulable:false,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-07-19 15:29:46 -0400 EDT 2017-07-19 15:19:52 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-07-19 15:29:46 -0400 EDT 2017-07-19 15:19:52 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-07-19 15:29:46 -0400 EDT 2017-07-19 15:19:52 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-07-19 15:29:46 -0400 EDT 2017-07-19 15:20:04 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{InternalIP 172.18.2.207} {Hostname ip-172-18-2-207.ec2.internal}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:f9370ed252a14f73b014c1301a9b6d1b,SystemUUID:EC2FA078-0824-E129-024F-32DBB557213C,BootID:a2a3b6ee-f4de-401f-936f-98c09edde325,KernelVersion:3.10.0-514.26.2.el7.x86_64,OSImage:Red Hat Enterprise Linux Server 7.3 (Maipo),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.7.0+695f48a16f,KubeProxyVersion:v1.7.0+695f48a16f,OperatingSystem:linux,Architecture:amd64,},Images:[{[openshift/origin-federation:3bb18b6 openshift/origin-federation:latest] 1269004315} {[openshift/origin-docker-registry:3bb18b6 openshift/origin-docker-registry:latest] 1132143354} {[openshift/origin-gitserver:3bb18b6 openshift/origin-gitserver:latest] 1119083585} {[openshift/openvswitch:3bb18b6 openshift/openvswitch:latest] 1099707747} {[openshift/node:3bb18b6 openshift/node:latest] 1098026103} {[openshift/origin-keepalived-ipfailover:3bb18b6 openshift/origin-keepalived-ipfailover:latest] 1074817813} {[openshift/origin-haproxy-router:3bb18b6 openshift/origin-haproxy-router:latest] 1069053272} {[openshift/origin-service-catalog:3bb18b6 openshift/origin-service-catalog:latest] 1061978954} {[openshift/origin-deployer:3bb18b6 openshift/origin-deployer:latest] 1048029234} {[openshift/origin-recycler:3bb18b6 openshift/origin-recycler:latest] 1048029234} {[openshift/origin-docker-builder:3bb18b6 openshift/origin-docker-builder:latest] 1048029234} {[openshift/origin-sti-builder:3bb18b6 openshift/origin-sti-builder:latest] 1048029234} {[openshift/origin-f5-router:3bb18b6 openshift/origin-f5-router:latest] 1048029234} {[openshift/origin:3bb18b6 openshift/origin:latest] 1048029234} {[openshift/origin-cluster-capacity:3bb18b6 openshift/origin-cluster-capacity:latest] 1001769744} {[docker.io/openshift/origin-release@sha256:1f2d230ff0e548e84a8cab04140442d14c192e24a977888682e292a66e3c0310 docker.io/openshift/origin-release:golang-1.8] 888306164} {[docker.io/openshift/origin-release@sha256:e7830c1917b24cd07897d69044d0a831d32563daeed2e5ef8260e080e841ba63] 888265752} {[docker.io/openshift/origin-release@sha256:28ba70dba48e007062d84740596478a34db8ba933955cd7fbd2fe4cb631fb8fe docker.io/openshift/origin-release:golang-1.7] 861320249} {[docker.io/openshift/origin-gce@sha256:409e2746c9e376c304ebe64928926f916e8d2b54e9c63630bbb598a9bc9857fb docker.io/openshift/origin-gce:latest] 779683339} {[docker.io/cockpit/kubernetes@sha256:86c54baed25788404dd22441d32a55a78588325159afb26678a764f46f4f6374 docker.io/cockpit/kubernetes:latest] 400184643} {[openshift/origin-egress-http-proxy:3bb18b6 openshift/origin-egress-http-proxy:latest] 395968520} {[openshift/origin-egress-router:3bb18b6 openshift/origin-egress-router:latest] 364716379} {[openshift/origin-base:3bb18b6 openshift/origin-base:latest] 363040846} {[openshift/origin-pod:3bb18b6 openshift/origin-pod:latest] 213205840} {[openshift/origin-source:3bb18b6 openshift/origin-source:latest] 192530021} {[docker.io/centos@sha256:c1010e2fe2b635822d99a096b1f4184becf5d1c98707cbccae00be663a9b9131 docker.io/centos:7] 192529651} {[openshift/hello-openshift:3bb18b6 openshift/hello-openshift:latest] 5840035} {[gcr.io/google_containers/mounttest@sha256:bec3122ddcf8bd999e44e46e096659f31241d09f5236bc3dc212ea584ca06856 gcr.io/google_containers/mounttest:0.8] 1450761}],VolumesInUse:[],VolumesAttached:[],},}
Jul 19 15:29:49.987: INFO: 
Logging kubelet events for node ip-172-18-2-207.ec2.internal
Jul 19 15:29:49.988: INFO: 
Logging pods the kubelet thinks is on node ip-172-18-2-207.ec2.internal
Jul 19 15:29:49.996: INFO: update-demo-nautilus-5d207 started at 2017-07-19 15:29:47 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container update-demo ready: false, restart count 0
Jul 19 15:29:49.996: INFO: redis-master-0xrhl started at 2017-07-19 15:29:19 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container redis-master ready: false, restart count 0
Jul 19 15:29:49.996: INFO: update-demo-nautilus-d4gf4 started at 2017-07-19 15:29:47 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container update-demo ready: false, restart count 0
Jul 19 15:29:49.996: INFO: execpod started at 2017-07-19 15:29:40 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container hostexec ready: true, restart count 0
Jul 19 15:29:49.996: INFO: ss-0 started at 2017-07-19 15:29:25 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container nginx ready: true, restart count 0
Jul 19 15:29:49.996: INFO: router-1-vkj1l started at 2017-07-19 15:21:44 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container router ready: true, restart count 0
Jul 19 15:29:49.996: INFO: docker-registry-1-lxcwq started at 2017-07-19 15:22:28 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container registry ready: true, restart count 0
Jul 19 15:29:49.996: INFO: registry-console-1-87bxk started at 2017-07-19 15:22:49 -0400 EDT (0+1 container statuses recorded)
Jul 19 15:29:49.996: INFO: 	Container registry-console ready: true, restart count 0
W0719 15:29:49.998740   63886 metrics_grabber.go:74] Master node is not registered. Grabbing metrics from Scheduler and ControllerManager is disabled.
Jul 19 15:29:50.016: INFO: 
Latency metrics for node ip-172-18-2-207.ec2.internal
Jul 19 15:29:50.016: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:2m3.001125s}
Jul 19 15:29:50.016: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m3.001125s}
Jul 19 15:29:50.016: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m3.001125s}
Jul 19 15:29:50.016: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:17.948203s}
Jul 19 15:29:50.016: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.99 Latency:14.114544s}
Jul 19 15:29:50.016: INFO: {Operation:pull_image Method:docker_operations_latency_microseconds Quantile:0.9 Latency:14.114544s}
STEP: Dumping a list of prepulled images on each node...
Jul 19 15:29:50.018: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "extended-test-router-metrics-j471x-c62cz" for this suite.
Jul 19 15:30:06.136: INFO: namespace: extended-test-router-metrics-j471x-c62cz, resource: bindings, ignored listing per whitelist
Jul 19 15:30:06.156: INFO: namespace extended-test-router-metrics-j471x-c62cz deletion completed in 16.137231611s


• Failure [25.562 seconds]
[Conformance][networking][router] openshift router metrics
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:186
  The HAProxy router
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:185
    should expose the profiling endpoints [It]
    /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:184

    Expected
        <string>: [eliding complete garbage]
    to contain substring
        <string>: # runtime.MemStats

    /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/router/metrics.go:183

@sdodson
Copy link
Member

sdodson commented Jul 20, 2017

@smarterclayton @deads2k did we regress something with the router metrics port in the rebase?

@pweil-
Copy link
Contributor

pweil- commented Jul 20, 2017

@knobunc not sure this is a flake, ptal

@deads2k
Copy link
Contributor

deads2k commented Jul 20, 2017

Capabilities were removed from prometheus, so some metrics were removed #15279

@deads2k
Copy link
Contributor

deads2k commented Jul 20, 2017

Capabilities were removed from prometheus, so some metrics were removed #15279

It didn't seem very urgent at the time. If it's important and blocking something, I'd be happy of some help re-enabling them.

@sdodson
Copy link
Member

sdodson commented Jul 20, 2017

Every openshift-ansible pr fails on this "flake". If origin PRs aren't being blocked by this too I'd like to also understand why that's the case.

@deads2k
Copy link
Contributor

deads2k commented Jul 20, 2017

Every openshift-ansible pr fails on this "flake". If origin PRs aren't being blocked by this too I'd like to also understand why that's the case.

The queue is slow, but we're getting green tests back: #15364

@deads2k
Copy link
Contributor

deads2k commented Jul 20, 2017

Actually, digging into this, these aren't the metrics I disabled. It's complaining about not finding pprof. This is the commit I was thinking about: 9abed32 . Looks like I may be innocent of this one.

@sdodson sdodson added the kind/bug Categorizes issue or PR as related to a bug. label Jul 26, 2017
knobunc added a commit to knobunc/origin that referenced this issue Jul 26, 2017
We were missing the debug=1 argument on the memory stats, so were getting them in a binary format.  With the flag added we pass tests properly.

Fixes openshift#15356
openshift-merge-robot added a commit that referenced this issue Jul 28, 2017
Automatic merge from submit-queue (batch tested with PRs 15486, 15482)

Correct the url for the memory stats in the router test

We were missing the debug=1 argument on the memory stats, so were getting them in a binary format.  With the flag added we pass tests properly.

Fixes #15356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/routing kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
Development

No branches or pull requests

5 participants