Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream max-lifetime feature should ensure clean exit #6

Closed
jmacd opened this issue Aug 10, 2023 · 2 comments · Fixed by #24
Closed

Stream max-lifetime feature should ensure clean exit #6

jmacd opened this issue Aug 10, 2023 · 2 comments · Fixed by #24

Comments

@jmacd
Copy link
Contributor

jmacd commented Aug 10, 2023

@moh-osman3 Has been working on f5/otel-arrow-adapter#211 which was in-flight when this repository migration began. @lquerel had asked for the mechanism to be configurable on the server side, which made more sense to @moh-osman3 and me after we studied the problem further. @moh-osman3 will rebase that work into this repository.

Meanwhile, there is one flaky test that I will ignore to move forward with the migration.

@jmacd
Copy link
Contributor Author

jmacd commented Aug 10, 2023

Failure is seen in #4.

=== RUN   TestStreamGracefulShutdown
panic: test timed out after 10m0s

goroutine 5200 [running]:
testing.(*M).startAlarm.func1()
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:2036 +0x8e
created by time.goFunc
	/opt/hostedtoolcache/go/1.19.12/x64/src/time/sleep.go:176 +0x32

goroutine 1 [chan receive, 9 minutes]:
testing.(*T).Run(0xc0004b0000, {0xf7deba?, 0x518865?}, 0xfc1370)
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1494 +0x37a
testing.runTests.func1(0xc0001fdb60?)
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1846 +0x6e
testing.tRunner(0xc0004b0000, 0xc00045fcd8)
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1446 +0x10b
testing.runTests(0xc0001ebf40?, {0x1851380, 0x11, 0x11}, {0x7f2c9b1375b8?, 0x40?, 0x1861940?})
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1844 +0x456
testing.(*M).Run(0xc0001ebf40)
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1726 +0x5d9
main.main()
	_testmain.go:79 +0x1aa

goroutine 4816 [select, 9 minutes]:
github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.(*healthyTestChannel).onRecv.func1()
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/common_test.go:195 +0x9a
reflect.Value.call({0xe03100?, 0xc0006c2480?, 0xc00056ba00?}, {0xf690dd, 0x4}, {0x18933f8, 0x0, 0x40f45f?})
	/opt/hostedtoolcache/go/1.19.12/x64/src/reflect/value.go:584 +0x8c5
reflect.Value.Call({0xe03100?, 0xc0006c2480?, 0x0?}, {0x18933f8?, 0x4d8816?, 0xe51f00?})
	/opt/hostedtoolcache/go/1.19.12/x64/src/reflect/value.go:368 +0xbc
github.com/golang/mock/gomock.(*Call).DoAndReturn.func1({0x0, 0x0, 0xc0004e86a0?})
	/home/runner/go/pkg/mod/github.com/golang/mock@v1.6.0/gomock/call.go:132 +0x485
github.com/golang/mock/gomock.(*Controller).Call(0xc010b0aa50, {0xee4ce0, 0xc0004e86a0}, {0xf69261, 0x4}, {0x0, 0x0, 0x0})
	/home/runner/go/pkg/mod/github.com/golang/mock@v1.6.0/gomock/controller.go:251 +0xf7
github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1/mock.(*MockArrowStreamService_ArrowStreamClient).Recv(0xc0004e86a0)
	/home/runner/work/otel-arrow/otel-arrow/api/experimental/arrow/v1/mock/arrow_service_mock.go:129 +0x59
github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.(*Stream).read(0xc0001da1b0, {0xc021050[280](https://github.com/open-telemetry/otel-arrow/actions/runs/5826962689/job/15802016892#step:5:281)?, 0x0?})
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/stream.go:351 +0x35
github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.(*Stream).run(0xc0001da1b0, {0x10f4358?, 0xc021050200?}, 0xc00d07b1b8, {0x0, 0x0, 0x0})
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/stream.go:167 +0x246
github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.(*streamTestCase).start.func1()
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/stream_test.go:84 +0x95
created by github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.(*streamTestCase).start
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/stream_test.go:82 +0xb8

goroutine 4815 [chan receive, 9 minutes]:
github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.(*streamTestCase).get(...)
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/stream_test.go:119
github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow.TestStreamGracefulShutdown(0xc000378340?)
	/home/runner/work/otel-arrow/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/stream_test.go:149 +0x33b
testing.tRunner(0xc0003784e0, 0xfc1370)
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/opt/hostedtoolcache/go/1.19.12/x64/src/testing/testing.go:1493 +0x35f
FAIL	github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow	600.106s
?   	github.com/open-telemetry/otel-arrow/collector/gen/exporter/otlpexporter/internal/arrow/grpcmock	[no test files]

@jmacd
Copy link
Contributor Author

jmacd commented Aug 16, 2023

See @moh-osman3's investigation in grpc/grpc-go#6504

jmacd added a commit that referenced this issue Aug 24, 2023
…_stream_lifetime (#23)

As discussed in grpc/grpc-go#6504, the client
should add jitter when configuring `max_connection_age_grace` because we
expect each stream will create a new connection. Since connection storms
will not be spread automatically by gRPC in this case, apply client
jitter.

Part of #6.
@jmacd jmacd changed the title Stream max-lifetime feature moving to server-side & test flakiness Stream max-lifetime feature should ensure clean exit Aug 25, 2023
@jmacd jmacd closed this as completed in #24 Aug 31, 2023
jmacd pushed a commit that referenced this issue Aug 31, 2023
Fixes #6

Applying learning from
grpc/grpc-go#6504 (comment),
which pointed out that the server is return EOF directly when the client
calls CloseSend(), which is causing an error signal on spans.

Instead the server should check if it received EOF from client
(indicating CloseSend() was called) and send StatusOK to the client. The
client will know to restart the stream when it gets a response with
`batchID=-1` and `status=OK`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant