Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/observer/k8sobserver] Flaky test (DATA RACE) - TestExtensionObservePods #29448

Closed
sakulali opened this issue Nov 22, 2023 · 11 comments

Comments

@sakulali
Copy link
Contributor

Component(s)

extension/observer/k8sobserver

Describe the issue you're reporting

https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6959238824/job/18936065429?pr=26644

?   	github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/k8sobserver/internal/metadata	[no test files]
W1122 15:17:13.511892   13349 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
E1122 15:17:13.512096   13349 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
W1122 15:17:14.514024   13349 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
E1122 15:17:14.514145   13349 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
W1122 15:17:16.596549   13349 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
E1122 15:17:16.596684   13349 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:[229](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6959238824/job/18936065429?pr=26644#step:9:230): Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
==================
WARNING: DATA RACE
Write at 0x00c000477bf0 by goroutine 91:
  github.com/open-telemetry/opentelemetry-collector-contrib/internal/k8sconfig.CreateRestConfig.func1()
      /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/k8sconfig/config.go:117 +0x5c
  k8s.io/client-go/transport.HTTPWrappersForConfig()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/transport/round_trippers.go:41 +0xcb
  k8s.io/client-go/transport.New()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/transport/transport.go:60 +0x304
  k8s.io/client-go/rest.TransportFor()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/rest/transport.go:68 +0x64
  k8s.io/client-go/rest.HTTPClientFor()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/rest/transport.go:33 +0x30
  k8s.io/client-go/kubernetes.NewForConfig()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/kubernetes/clientset.go:468 +0xa7
  github.com/open-telemetry/opentelemetry-collector-contrib/internal/k8sconfig.MakeClient()
      /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/k8sconfig/config.go:136 +0x111
  github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/k8sobserver.newObserver()
      /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/extension/observer/k8sobserver/extension.go:84 +0xb9
  github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/k8sobserver.TestExtensionObservePods()
      /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/extension/observer/k8sobserver/extension_test.go:138 +0x3a4
  testing.tRunner()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:1576 +0x216
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:1629 +0x47

Previous read at 0x00c000477bf0 by goroutine 83:
  net/http.(*Transport).connectMethodForRequest()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/transport.go:843 +0xe6
  net/http.(*Transport).roundTrip()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/transport.go:580 +0xc46
  net/http.(*Transport).RoundTrip()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/roundtrip.go:17 +0x36
  k8s.io/client-go/transport.(*userAgentRoundTripper).RoundTrip()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/transport/round_trippers.go:168 +0x4e3
  net/http.send()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/client.go:252 +0x942
  net/http.(*Client).send()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/client.go:176 +0x164
  net/http.(*Client).do()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/client.go:716 +0x10de
  net/http.(*Client).Do()
      /opt/hostedtoolcache/go/1.20.11/x64/src/net/http/client.go:582 +0x566
  k8s.io/client-go/rest.(*Request).request()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/rest/request.go:1023 +0x2e1
  k8s.io/client-go/rest.(*Request).Do()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/rest/request.go:1063 +0xef
  k8s.io/client-go/tools/cache.NewFilteredListWatchFromClient.func1()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/listwatch.go:87 +0x224
  k8s.io/client-go/tools/cache.(*ListWatch).List()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/listwatch.go:106 +0xb4
  k8s.io/client-go/tools/cache.(*Reflector).list.func1.2()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:488 +0xbc
  k8s.io/client-go/tools/pager.SimplePageFunc.func1()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/pager/pager.go:40 +0x74
  k8s.io/client-go/tools/pager.(*ListPager).list()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/pager/pager.go:108 +0x207
  k8s.io/client-go/tools/pager.(*ListPager).ListWithAlloc()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/pager/pager.go:89 +0x373
  k8s.io/client-go/tools/cache.(*Reflector).list.func1()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:513 +0x2f4

Goroutine 91 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:1629 +0x805
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:2036 +0x8d
  testing.tRunner()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:1576 +0x216
  testing.runTests()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:2034 +0x87c
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.20.11/x64/src/testing/testing.go:1906 +0xb44
  main.main()
      _testmain.go:114 +0x2fc

Goroutine 83 (finished) created at:
  k8s.io/client-go/tools/cache.(*Reflector).list()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:479 +0x705
  k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:348 +0x338
  k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:291 +0x44
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:226 +0x48
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:227 +0xce
  k8s.io/client-go/tools/cache.(*Reflector).Run()
      /home/runner/go/pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:290 +0x256
  k8s.io/client-go/tools/cache.(*Reflector).Run-fm()
      <autogenerated>:1 +0x44
  k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
      /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:55 +0x3e
  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:72 +0x73
==================
--- FAIL: TestExtensionObservePods (3.00s)
    testing.go:1446: race detected during execution of test
W1122 15:17:19.612537   13349 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: failed to list *v1.Pod: Get "https://mock:1[234](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6959238824/job/18936065429?pr=26644#step:9:235)5/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
E1122 15:17:19.61[264](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6959238824/job/18936065429?pr=26644#step:9:265)3   13349 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
W1122 15:17:20.584617   13349 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
E1122 15:17:20.584727   13349 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
W1122 15:17:22.516375   13349 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
E1122 15:17:22.516490   13349 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://mock:12345/api/v1/pods?limit=500&resourceVersion=0": dial tcp: lookup mock on 127.0.0.53:53: server misbehaving
FAIL
@sakulali sakulali added the needs triage New item requiring triage label Nov 22, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@sakulali
Copy link
Contributor Author

sakulali commented Nov 22, 2023

It seems that there is concurrent access in

# Tests: 
cmd = go test -v -count=1 -race -timeout 300s -parallel 4 --tags="" -cover ./... -covermode=atomic -args -test.gocoverdir=/path/to/otel/coverage/unit 
cd opentelemetry-collector-contrib/extension/observer/k8sobserver
for i in {1..20}; do echo "Run $i"; ${cmd} &; done

When I commented out the following code, the DATA RACE seems to have disappeared. I'm wondering if authConf.WrapTransport needs to be initialized

authConf.WrapTransport = func(rt http.RoundTripper) http.RoundTripper {
// Don't use system proxy settings since the API is local to the
// cluster
if t, ok := rt.(*http.Transport); ok {
t.Proxy = nil
}
return rt
}

@sakulali
Copy link
Contributor Author

@sakulali
Copy link
Contributor Author

@sakulali
Copy link
Contributor Author

@sakulali
Copy link
Contributor Author

@sakulali
Copy link
Contributor Author

@crobert-1 crobert-1 added flaky test a test is flaky and removed needs triage New item requiring triage labels Nov 30, 2023
mx-psi pushed a commit that referenced this issue Dec 1, 2023
…ACE) of TestExtensionObservePods (#29463)

**Description:**
Try to fix flaky test (DATA RACE) of `TestExtensionObservePods`. Since
`config.ObservePods` is set true by default, for
`TestExtensionObserveServices` and `TestExtensionObserveNodes` test
case, we just focus on services observer and nodes observer, so we don't
need to use `k8sObserver.podListerWatcher` which can avoid data race in
the same process.


**Link to tracking Issue:**

#29448

**Testing:** 
go test for k8sobserver

**Preparation:**
``` Shell
DIR = opentelemetry-collector-contrib/extension/observer/k8sobserver
DATA_DIR = /path/to/otel/coverage/unit
CMD = go test -run TestExtensionObserve -v -count=1 -race -timeout 300s -parallel 4 --tags="" -cover ./... -covermode=atomic -args -test.gocoverdir=${DATA_DIR}
MULTI_CMD = for i in {1..50}; do echo "Run $i"; ./{CMD} &; done
```

**Tests:**
1. I found `config.ObservePods` is set true by default. Then
`TestExtensionObserveServices` and `TestExtensionObserveNodes` will
enable `k8sObserver.podListerWatcher` and try to access the same http
proxy since they are in the same process.
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/39641dfac9482c62ece9505fef89fcff81ee258b/extension/observer/k8sobserver/factory.go#L26-L33


2. I add log(as shown below) for `TestExtensionObserve*` functions and
found test case **in the same process**. Refer to [stack
overflow](https://stackoverflow.com/a/24376644), go test are executed as
goroutines and executed concurrently.
<img width="783" alt="image"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/3955787/a9baba19-0911-4d49-9395-8fb4a58a5e9b">

> --------- TestExtensionObserveServices process ID: 38910
> --------- TestExtensionObservePods process ID: 38910
> --------- TestExtensionObserveNodes process ID: 38910

3. I add log (as shown below) for http proxy setting addr, and found use
the same address:
<img width="951" alt="image"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/3955787/a5c8e4e7-b6bd-4d8b-a82f-c9142d31a249">

> -------------- use proxy connectMethodForRequest t.Proxy addr:
0xc0007142f0 -------------------
> -------------- transport t.Proxy (is) nil
> -------------- send request: {Method:GET
URL:https://mock:12345/api/v1/pods?limit=500&resourceVersion=0
Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1
Header:map[Accept:[application/json, */*]
User-Agent:[k8sobserver.test/v0.0.0 (darwin/amd64) kubernetes/$Format]]
Body:<nil> GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false
Host:mock:12345 Form:map[] PostForm:map[] MultipartForm:<nil>
Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil>
Response:<nil> ctx:0xc000410120}

4. So we may find one goroutine is writing http proxy and the other
goroutine read with http proxy(HTTP request for **Pods Lister**). As a
result, this has led to data race.

**Inference:**
I found that `config.ObservePods` is set true by default. For
`TestExtensionObserveServices` and `TestExtensionObserveNodes` test
case, we just focus on services observer and nodes observer, so we don't
need to use `k8sObserver.podListerWatcher` which can avoid data race in
the same process.

**Documentation:**

---------

Signed-off-by: sakulali <sakulali@126.com>
@crobert-1
Copy link
Member

Resolved by #29463

@crobert-1
Copy link
Member

+1 frequency

@crobert-1 crobert-1 reopened this Dec 14, 2023
@sakulali
Copy link
Contributor Author

Hi @crobert-1, i found this failed test time seems before the commit time?
image
image

@crobert-1
Copy link
Member

Good catch @sakulali, I saw this on a currently open PR, didn't realize the failed test was that long ago 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants