Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: TestClientSendsAGoAway #7160

Closed
arvindbr8 opened this issue Apr 22, 2024 · 8 comments · Fixed by #7224
Closed

Flaky test: TestClientSendsAGoAway #7160

arvindbr8 opened this issue Apr 22, 2024 · 8 comments · Fixed by #7224
Assignees

Comments

@arvindbr8
Copy link
Member

This is the first time I'm seeing it:
https://github.com/grpc/grpc-go/actions/runs/8787418013/job/24112571687?pr=7156

Might be worth to check it out.

@arvindbr8
Copy link
Member Author

--- FAIL: Test (47.61s)
    --- FAIL: Test/ClientSendsAGoAway (10.00s)
        tlogger.go:116: INFO clientconn.go:305 [core] [Channel #472]Channel created  (t=+184.125µs)
        tlogger.go:116: INFO clientconn.go:1680 [core] [Channel #472]original dial target is: "127.0.0.1:34251"  (t=+212.779µs)
        tlogger.go:116: INFO clientconn.go:1685 [core] [Channel #472]dial target "127.0.0.1:34251" parse failed: parse "127.0.0.1:34251": first path segment in URL cannot contain colon  (t=+246.482µs)
        tlogger.go:116: INFO clientconn.go:1706 [core] [Channel #472]fallback to scheme "passthrough"  (t=+274.044µs)
        tlogger.go:116: INFO clientconn.go:1714 [core] [Channel #472]parsed dial target is: passthrough:///127.0.0.1:34251  (t=+299.482µs)
        tlogger.go:116: INFO clientconn.go:1837 [core] [Channel #472]Channel authority set to "127.0.0.1:34251"  (t=+325.33µs)
        tlogger.go:116: INFO resolver_wrapper.go:197 [core] [Channel #472]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:34251",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:34251",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+464.651µs)
        tlogger.go:116: INFO balancer_wrapper.go:103 [core] [Channel #472]Channel switches to new LB policy "pick_first"  (t=+513.843µs)
        tlogger.go:116: INFO clientconn.go:846 [core] [Channel #472 SubChannel #473]Subchannel created  (t=+569.808µs)
        tlogger.go:116: INFO clientconn.go:535 [core] [Channel #472]Channel Connectivity change to CONNECTING  (t=+882.825µs)
        tlogger.go:116: INFO clientconn.go:305 [core] [Channel #472]Channel exiting idle mode  (t=+914.[194](https://github.com/grpc/grpc-go/actions/runs/8787418013/job/24112571687?pr=7156#step:4:195)µs)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #472 SubChannel #473]Subchannel Connectivity change to CONNECTING  (t=+971.722µs)
        tlogger.go:116: INFO clientconn.go:1322 [core] [Channel #472 SubChannel #473]Subchannel picks a new address "127.0.0.1:34251" to connect  (t=+998.061µs)
        tlogger.go:116: INFO clientconn.go:535 [core] [Channel #472]Channel Connectivity change to SHUTDOWN  (t=+1.652745ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #472]Closing the name resolver  (t=+1.674466ms)
        tlogger.go:116: INFO balancer_wrapper.go:135 [core] [Channel #472]ccBalancerWrapper: closing  (t=+1.694022ms)
        tlogger.go:116: INFO clientconn.go:1[207](https://github.com/grpc/grpc-go/actions/runs/8787418013/job/24112571687?pr=7156#step:4:208) [core] [Channel #472 SubChannel #473]Subchannel Connectivity change to SHUTDOWN  (t=+1.721133ms)
        tlogger.go:116: INFO clientconn.go:1553 [core] [Channel #472 SubChannel #473]Subchannel deleted  (t=+1.773004ms)
        tlogger.go:116: INFO clientconn.go:305 [core] [Channel #472]Channel deleted  (t=+1.793723ms)
        goaway_test.go:826: Context timed out
FAIL

@aranjans
Copy link
Contributor

@arvindbr8 This seems to be working for me. I ran the following commands to cross check and all of'em worked. Let me know in case i am missing something

go test -coverprofile=coverage.out -coverpkg=./... ./...
go test ./...
go test ./... -race

@purnesh42H
Copy link
Contributor

@arvindbr8 I assigned it back to you since @aranjans needs more clarification. Let me know if its ok

@arvindbr8
Copy link
Member Author

Sure.

We discussed this offline:

It's usually hard to reproduce these flakes that occur in GitHub CI on local. This is because, usually since our laptops have more juice than the free GitHub VMs that we use. Traditionally, we have an internal mechanism (discussed in chat with @aranjans) which would allow us to run the test in the order of 10-100K runs to check for flake rate. Once we have the flake rate, we can decide the priority of this issue.

Regardless imo, this is worth taking a look now since this was checked very recently by @aranjans and he has not so stale context on the test that he added.

@arvindbr8
Copy link
Member Author

@aranjans -- please let me know if you want to sync up again regarding the mechanism mentioned above.

@aranjans
Copy link
Contributor

aranjans commented May 9, 2024

Closing this issue as we are not able to repro this.

@aranjans aranjans closed this as completed May 9, 2024
@arjan-bal
Copy link
Contributor

@arjan-bal
Copy link
Contributor

arjan-bal commented May 10, 2024

Here is a repro of the race condition: https://github.com/grpc/grpc-go/compare/master...arjan-bal:repro-race-cond?expand=1

We should probably wait for the client's Channel to be READY before closing it.

@arjan-bal arjan-bal reopened this May 10, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants