Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC response sender is not thread safe #349

Closed
npepinpe opened this issue May 3, 2022 · 0 comments · Fixed by #358
Closed

gRPC response sender is not thread safe #349

npepinpe opened this issue May 3, 2022 · 0 comments · Fixed by #358
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test

Comments

@npepinpe
Copy link
Member

npepinpe commented May 3, 2022

Description

There's a bug with the interaction between the GrpcToLogStreamGateway and GrpcResponseWriter. The gateway runs every call in a single threaded executor - this means any external calls, such as GrpcToLogStreamGateway#responseCallback is called from the engine thread context, but then the actual logic is executed in the gateway's own thread. This frees the engine to do other things, such as read and process records, and potentially try to respond to more commands.

However, the actual mapping of the response is done in the gateway thread context. When mapping the response, it accesses the GrpcResponseWriter's internal buffer view to get the response record. This is not a thread safe object, and thus may have been changed between the time the gateway received the call to respond and the time it actually maps the response.

This can cause responses to become corrupt and not be sent out - instead an exception is thrown by the response mapper in the GrpcToLogStreamGateway. Unfortunately, the executor swallows this exception, so from our point of view "nothing" happens and no response is sent out.

Expected behaviour

Responses are properly mapped and sent out, and errors are displayed when they occur.

Reproduction steps

You can run the WorkerTest for the TestContainers extension in a loop until failure. It typically takes ~50 runs to occur, but can take up to ~500, YMMV.

Environment

  • OS:
  • Version:
@npepinpe npepinpe added Type: Bug kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test and removed Type: Bug labels May 3, 2022
@remcowesterhoud remcowesterhoud self-assigned this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants