Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update quic-go to v0.38.1 #2506

Merged
merged 2 commits into from
Aug 25, 2023
Merged

update quic-go to v0.38.1 #2506

merged 2 commits into from
Aug 25, 2023

Conversation

marten-seemann
Copy link
Contributor

No description provided.

@@ -178,7 +178,7 @@ func TestHashVerification(t *testing.T) {
var trErr *quic.TransportError
require.ErrorAs(t, err, &trErr)
require.Equal(t, quic.TransportErrorCode(0x12a), trErr.ErrorCode)
require.Contains(t, trErr.ErrorMessage, "cert hash not found")
require.Contains(t, errors.Unwrap(trErr).Error(), "cert hash not found")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That API seems a bit awkward. Maybe quic-go should expose an Error error field on the TransportError? Any thoughs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we set the error field on quic.TransportError correctly, this seems fine. Any one who needs an exact check can do errors.Is(err, targetErr). Anyone who needs to check string contains can do strings.Contains(err.Error(), "error string")

We can also change this test to:

require.Contains(t, errors.Error(), "cert hash not found")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem, the error field on TransportError is only included in the Error() string if ErrorMessage is empty.

Can we include both ErrorMessage and error.Error() in the final Error() string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle yes, but there's a lot going on in Error already: https://github.com/quic-go/quic-go/blob/824fd8a2f2eb8a08fe6cef7a693fee6be3819e01/internal/qerr/errors.go#L33-L49

Suggestions welcome!

Copy link
Member

@sukunrt sukunrt Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we include both this way?

func (e *TransportError) Error() string {
	str := fmt.Sprintf("%s (%s)", e.ErrorCode.String(), getRole(e.Remote))
	if e.FrameType != 0 {
		str += fmt.Sprintf(" (frame type: %#x)", e.FrameType)
	}
	var msg string
	if len(e.ErrorMessage) > 0 {
		msg += ": " + e.ErrorMessage
	}
	if e.error != nil {
		msg += ": " + e.error.Error()
	}
	// if we have no message, use ErrorCodes Message
	if msg == "" && len(e.ErrorCode.Message()) > 0 {
		msg += ": " + e.ErrorCode.Message()
	}
	return str + msg
}

And then within quic-go code we try to ensure that ErrorMessage and error don't have redundant information. That way we don't have to break any API.

I don't like the option of just exporting error because then clients who want sub string match for errors will have to check both trErr.Error() and trErr.Err.Error()

@marten-seemann
Copy link
Contributor Author

WebTransport interop is failing with rust-libp2p, but it's working with Chromium natively:
image

The only WebTransport-related change is quic-go updating the code point for HTTP datagrams to the value used in the RFC (quic-go/quic-go#3588). The failure would make sense if rust-libp2p was using an outdated version. @mxinden, any idea?

@mxinden
Copy link
Member

mxinden commented Aug 21, 2023

The only WebTransport-related change is quic-go updating the code point for HTTP datagrams to the value used in the RFC (quic-go/quic-go#3588). The failure would make sense if rust-libp2p was using an outdated version. @mxinden, any idea?

rust-libp2p seems to be using Chrome 112. Might that be the issue?

https://github.com/libp2p/rust-libp2p/blob/8050db77744c6733dcc3f72335f340a5129eb841/interop-tests/Dockerfile.chromium#L26

@marten-seemann
Copy link
Contributor Author

112 seems pretty recent, it's from April. Wouldn't expect that to be the problem, although I haven't tried it yet.

I'm a bit confused by the log output. Are the tests running in parallel and not synchronizing their log output?

Running test spec: chromium-js-v0.46 x go-libp2p-head (wss, noise, yamux)
Failure AbortError: The operation was aborted
    at abortChildProcess (node:child_process:720:27)
    at EventTarget.onAbortListener (node:child_process:790:7)
    at EventTarget.[nodejs.internal.kHybridDispatch] (node:internal/event_target:741:20)
    at EventTarget.dispatchEvent (node:internal/event_target:683:26)
    at abortSignal (node:internal/abort_controller:368:10)
    at AbortController.abort (node:internal/abort_controller:402:5)
    at Timeout._onTimeout (/home/runner/work/_actions/libp2p/test-plans/master/multidim-interop/src/compose-runner.ts:51:55)
    at listOnTimeout (node:internal/timers:569:17)
    at processTimers (node:internal/timers:512:7) {
  code: 'ABORT_ERR',
  cmd: 'docker compose -f /tmp/compose-runner/chromium-rust-v0-52-x-go-libp2p-head--webtransport-/compose.yaml up --exit-code-from=dialer --renew-anon-volumes',
  stdout: 'Attaching to chromium-rust-v0_52_x_go-libp2p-head__webtransport_-dialer-1, chromium-rust-v0_52_x_go-libp2p-head__webtransport_-listener-1, chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1\n' +
    "chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:C 21 Aug 2023 05:15:49.300 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.\n" +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:C 21 Aug 2023 05:15:49.300 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:C 21 Aug 2023 05:15:49.300 * Redis version=7.2.0, bits=64, commit=00000000, modified=0, pid=1, just started\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:C 21 Aug 2023 05:15:49.300 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:M 21 Aug 2023 05:15:49.301 * monotonic clock: POSIX clock_gettime\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:M 21 Aug 2023 05:15:49.301 * Running mode=standalone, port=6379.\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:M 21 Aug 2023 05:15:49.302 * Server initialized\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1     | 1:M 21 Aug 2023 05:15:49.302 * Ready to accept connections tcp\n',
  stderr: ' Network chromium-rust-v0_52_x_go-libp2p-head__webtransport__default  Creating\n' +
    ' Network chromium-rust-v0_52_x_go-libp2p-head__webtransport__default  Created\n' +
    ' Container chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1  Creating\n' +
    ' Container chromium-rust-v0_52_x_go-libp2p-head__webtransport_-redis-1  Created\n' +
    ' Container chromium-rust-v0_52_x_go-libp2p-head__webtransport_-listener-1  Creating\n' +
    ' Container chromium-rust-v0_52_x_go-libp2p-head__webtransport_-dialer-1  Creating\n' +
    ' Container chromium-rust-v0_52_x_go-libp2p-head__webtransport_-listener-1  Created\n' +
    ' Container chromium-rust-v0_52_x_go-libp2p-head__webtransport_-dialer-1  Created\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-dialer-1    | [1692594949.828][SEVERE]: bind() failed: Cannot assign requested address (99)\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-listener-1  | 2023/08/21 05:15:49 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.\n' +
    'chromium-rust-v0_52_x_go-libp2p-head__webtransport_-listener-1  | 2023/08/21 05:15:49 My multiaddr is:  [/ip4/127.0.0.1/udp/43830/quic-v1/webtransport/certhash/uEiBXKnbCC1zsFgvTQq570fXSXnwMAEfg5wnV1vP0EJtt9Q/certhash/uEiCtfrWN2cmhJkeNyHyGrviCnwvi4cMEfahD9v7YDMtotg /ip4/192.168.80.4/udp/43830/quic-v1/webtransport/certhash/uEiBXKnbCC1zsFgvTQq570fXSXnwMAEfg5wnV1vP0EJtt9Q/certhash/uEiCtfrWN2cmhJkeNyHyGrviCnwvi4cMEfahD9v7YDMtotg]\n',
  [cause]: DOMException [AbortError]: This operation was aborted
      at new DOMException (node:internal/per_context/domexception:53:5)
      at AbortController.abort (node:internal/abort_controller:400:18)
      at Timeout._onTimeout (/home/runner/work/_actions/libp2p/test-plans/master/multidim-interop/src/compose-runner.ts:51:55)
      at listOnTimeout (node:internal/timers:569:17)
      at processTimers (node:internal/timers:512:7)
}
Finished: chromium-js-v0.46 x go-libp2p-head (wss, noise, yamux) { handshakePlusOneRTTMillis: 116, pingRTTMilllis: 19 }

The relevant line seems to be the following:

chromium-rust-v0_52_x_go-libp2p-head__webtransport_-dialer-1    | [1692594949.828][SEVERE]: bind() failed: Cannot assign requested address (99)\n

Can you make sense of this?

@mxinden
Copy link
Member

mxinden commented Aug 21, 2023

Triggered once more to see whether this is an intermittent failure.

https://github.com/libp2p/go-libp2p/actions/runs/5922005874/job/16061429456

@mxinden
Copy link
Member

mxinden commented Aug 21, 2023

I'm a bit confused by the log output. Are the tests running in parallel and not synchronizing their log output?

As far as I can tell, they do run in parallel:

https://github.com/libp2p/test-plans/blob/03f5d992622d74f1fa07674fea8481b51fbe6afe/multidim-interop/testplans.ts#L86-L102

I don't think stdout nor stderr is synchronized across those test executions.

@marten-seemann
Copy link
Contributor Author

As far as I can tell, they do run in parallel:

https://github.com/libp2p/test-plans/blob/03f5d992622d74f1fa07674fea8481b51fbe6afe/multidim-interop/testplans.ts#L86-L102

I don't think stdout nor stderr is synchronized across those test executions.

We should probably fix that at some point. For reference, I had to deal with a very similar problem in the quic-go cross compilation workflow recently: quic-go/quic-go#3809

@marten-seemann
Copy link
Contributor Author

Triggered once more to see whether this is an intermittent failure.

Same failure. I already reran it a few times before. This is an actual failure.

@mxinden
Copy link
Member

mxinden commented Aug 22, 2023

Documenting my suspicions thus far:

chromium-rust-v0_52_x_go-libp2p-head__webtransport_-dialer-1    | [1692594949.828][SEVERE]: bind() failed: Cannot assign requested address (99)\n

I am assuming this is the chromium driver failing to bind to its port:

https://github.com/libp2p/rust-libp2p/blob/e974efb7558a88195a36647c3d6af4ca00c50bf3/interop-tests/src/bin/wasm_ping.rs#L103-L109

compose-runner.ts eventually times out the test:

2023-08-21T09:24:24.8375575Z   [cause]: DOMException [AbortError]: This operation was aborted
2023-08-21T09:24:24.8375999Z       at new DOMException (node:internal/per_context/domexception:53:5)
2023-08-21T09:24:24.8376410Z       at AbortController.abort (node:internal/abort_controller:400:18)
2023-08-21T09:24:24.8377152Z       at Timeout._onTimeout (/home/runner/work/_actions/libp2p/test-plans/master/multidim-interop/src/compose-runner.ts:51:55)
2023-08-21T09:24:24.8377621Z       at listOnTimeout (node:internal/timers:569:17)
2023-08-21T09:24:24.8377996Z       at processTimers (node:internal/timers:512:7)
2023-08-21T09:24:24.8378381Z }

What this suspicion can not explain is, why this is failing on this and only this pull request.

Also why would the port be taken? There aren't multiple rust-libp2p WASM tests spawned here.

@marten-seemann
Copy link
Contributor Author

I am assuming this is the chromium driver failing to bind to its port:

https://github.com/libp2p/rust-libp2p/blob/e974efb7558a88195a36647c3d6af4ca00c50bf3/interop-tests/src/bin/wasm_ping.rs#L103-L109

I'm not really familiar with the Chromium driver, but I have a similar Chrome interop test in webtransport-go, and it looks like it doesn't (explicitly?) need to bind to any port: https://github.com/quic-go/webtransport-go/blob/master/interop/interop.py#L28-L32. Not sure if that helps?

@MarcoPolo
Copy link
Collaborator

The port may be a red herring as a successful run emits this:

multidim-interop-dialer-1    | [1692727052.188][SEVERE]: bind() failed: Cannot assign requested address (99)
multidim-interop-listener-1  | 2023/08/22 17:57:32 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
multidim-interop-listener-1  | 2023/08/22 17:57:32 My multiaddr is:  [/ip4/127.0.0.1/udp/46570/quic-v1/webtransport/certhash/uEiAFypI9WPyJyJiJSvf4OYer8O-6KPB8gQohc5c5WpNfpQ/certhash/uEiAzEc1WmADBQDcGxxrb5rqDr66Wu4BdLHTuEE31XAjzWA /ip4/172.19.0.3/udp/46570/quic-v1/webtransport/certhash/uEiAFypI9WPyJyJiJSvf4OYer8O-6KPB8gQohc5c5WpNfpQ/certhash/uEiAzEc1WmADBQDcGxxrb5rqDr66Wu4BdLHTuEE31XAjzWA]
multidim-interop-dialer-1    | {"handshakePlusOneRTTMillis":37.699,"pingRTTMilllis":0.2}
multidim-interop-dialer-1 exited with code 0

@MarcoPolo
Copy link
Collaborator

I've successfully run the test by updating to chrome 115.

FROM selenium/standalone-chrome:115.0

I think the issue then is that rust-libp2p v0.52 is using an older version of chrome. @mxinden let's update this

As a note, js-libp2p uses playwright which always downloads the latest version of the browsers we are testing against. This kind of goes against my own view of reproducibility, but it is convenient. I haven't spent the effort to figure out how to prevent playwright from doing this, and it's a bit harder since it's 2 layers down the abstraction stack (aegier -> playwright-test -> playwright).

@marten-seemann
Copy link
Contributor Author

Thanks for digging into this @MarcoPolo!

@mxinden Could you update this in rust-libp2p asap? The v0.31 release is scheduled for Monday next week, and we'd like to get this quic-go update in (to ship GSO support). If that timeline doesn't work out, we'll have to disable the interop test to be able to move forward here.

@mxinden
Copy link
Member

mxinden commented Aug 23, 2023

Thanks for digging into this @MarcoPolo!

👍 thanks @MarcoPolo.

@mxinden Could you update this in rust-libp2p asap? The v0.31 release is scheduled for Monday next week, and we'd like to get this quic-go update in (to ship GSO support). If that timeline doesn't work out, we'll have to disable the interop test to be able to move forward here.

Got it. I will do the update either today or tomorrow.

mxinden added a commit to mxinden/rust-libp2p that referenced this pull request Aug 23, 2023
Our WASM Webtransport interoperability tests previously used Chrome 112. This Chrome version fails
to connect to go-libp2p with quic-go v0.38.0. See libp2p/go-libp2p#2506 for
failure. This is due to quic-go v0.38.0 moving to the updated code point for HTTP datagrams. See
quic-go/quic-go#3588 for details.

This commit upgrades our interop tests to use Chrome 115.
mergify bot pushed a commit to libp2p/rust-libp2p that referenced this pull request Aug 23, 2023
Our WASM Webtransport interoperability tests previously used Chrome 112. This Chrome version fails to connect to go-libp2p with quic-go v0.38.0. See libp2p/go-libp2p#2506 for failure. This is due to quic-go v0.38.0 moving to the updated code point for HTTP datagrams. See quic-go/quic-go#3588 for details.

This commit upgrades our interop tests to use Chrome 115.

Pull-Request: #4383.
mxinden added a commit to libp2p/test-plans that referenced this pull request Aug 23, 2023
Bump rust-libp2p v0.52 interop version to include
libp2p/rust-libp2p#4383 which fixes issue in
libp2p/go-libp2p#2506.
mxinden added a commit to libp2p/test-plans that referenced this pull request Aug 24, 2023
Bump rust-libp2p v0.52 interop version to include
libp2p/rust-libp2p#4383 which fixes issue in
libp2p/go-libp2p#2506.
@mxinden
Copy link
Member

mxinden commented Aug 24, 2023

libp2p/test-plans#273 is merged. Latest run succeeded.

https://github.com/libp2p/go-libp2p/actions/runs/5922005874/job/16169252860

@marten-seemann marten-seemann changed the title update quic-go to v0.38.0 update quic-go to v0.38.1 Aug 25, 2023
@marten-seemann
Copy link
Contributor Author

Updated quic-go to v0.38.1.

@marten-seemann marten-seemann merged commit fea268b into master Aug 25, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants