Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.SubscriberClient#streamingPull throws DEADLINE_EXCEEDED error after 15 minutes #1885

Open
edosrecki opened this issue Feb 8, 2024 · 5 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@edosrecki
Copy link

edosrecki commented Feb 8, 2024

We use v1.SubscriberClient#streamingPull method to create a bidirectional stream which we use to consume and ack Pub/Sub messages. After 15 minutes client throws the DEADLINE_EXCEEDED (code 4) error. We have put a lot of effort to investigate why and where this error is thrown.

There is a default 15 minute timeout for streamingPull RPC call which is defined in @google-cloud/pubsub library here. It is possible to override this timeout by providing a different value when calling streamingPull method like so:

const subscriber = new v1.SubscriberClient()
const stream = subscriber.streamingPull({ timeout: 60 * 60 * 1000 /* 1 hour */ })

However, this only postpones the throwing of the DEADLINE_EXCEEDED error.

We have tracked down where this error is thrown, and it happens in @grpc/grpc-js library in ResolvingCall class in runDeadline method. This timer which is set on creation of the ResolvingCall class should be cleared when the response is received in the outputStatus method. The outputStatus method is called from onReceiveStatus handler, or from onReceiveMessage handler if certain conditions are true.

However, onReceiveStatus handler is NEVER called, which can be seen if you turn on debug logs:

GRPC_NODE_VERBOSITY=DEBUG
GRPC_NODE_TRACE=resolving_call # only focus on resolving_call class

Then you can clearly see that resolving_call is instantiated and that timer is created.

D 2024-02-08T14:46:36.689Z | v1.9.14 54511 | resolving_call | [0] Created
D 2024-02-08T14:46:36.689Z | v1.9.14 54511 | resolving_call | [0] Deadline: 2024-02-08T15:01:36.688Z
D 2024-02-08T14:46:36.689Z | v1.9.14 54511 | resolving_call | [0] Deadline will be reached in 899999ms

When message is received you also see:

D 2024-02-08T14:50:18.370Z | v1.9.14 54731 | resolving_call | [0] Received message
D 2024-02-08T14:50:18.371Z | v1.9.14 54731 | resolving_call | [0] Finished filtering received message

But you never see Received status log, which means that outputStatus method never gets called to reset the DEADLINE_EXCEEDED timer.

I believe that this is a bug, and keep in mind that this only happens with v1.SubscriberClient, but works correctly with latest Subscription class. When using that class, grpc debug output looks different, and you see 'Received status' logs, which means that timer gets restarted every time you receive the message.

D 2024-02-08T14:58:18.115Z | v1.9.14 55006 | resolving_call | [20] Received metadata
D 2024-02-08T14:58:18.115Z | v1.9.14 55006 | resolving_call | [20] Received message
D 2024-02-08T14:58:18.115Z | v1.9.14 55006 | resolving_call | [20] Finished filtering received message
D 2024-02-08T14:58:18.119Z | v1.9.14 55006 | resolving_call | [20] Received status
D 2024-02-08T14:58:18.119Z | v1.9.14 55006 | resolving_call | [20] ended with status: code=0 details=""

Environment details

  • OS: MacOS 14.2.1
  • Node.js version: 20.11.0
  • npm version: 10.2.4
  • @google-cloud/pubsub version: 4.3.0

Steps to reproduce

Check the repository with the code and instructions to reproduce the issue:

https://github.com/emartech/pub-sub-deadline-exceeded-repro

@edosrecki edosrecki added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Feb 8, 2024
@product-auto-label product-auto-label bot added the api: pubsub Issues related to the googleapis/nodejs-pubsub API. label Feb 8, 2024
@hongalex hongalex assigned hongalex and feywind and unassigned hongalex May 20, 2024
@feywind feywind added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 21, 2024
@feywind
Copy link
Collaborator

feywind commented May 21, 2024

@edosrecki Thanks for the issue. Our intent is that users will use the library's streaming pull facilities instead of calling the gapic-level functions directly. The service will periodically disconnect streams and requires the client side to manage that directly (reconnecting streams on failure, etc). So it's a lot of code work.

Is there a use case that requires using the gapic-level streaming pull function? If so, we'd be interested in hearing about that to potentially adapt the higher level one to cover it.

@ik-southpole
Copy link

We have been experiencing the same issue. We are using v1.SubscriberClient because we required more control over batch processing. It seems until this bug is addressed one must implement a reconnect manually because the subscriber client doesn't respect DEADLINE_EXCEEDED from retry.retryCodes.

@feywind
Copy link
Collaborator

feywind commented Jun 5, 2024

@ik-southpole Are you able to share some details about what's not working about batch processing as it is?

Our implementation for our handwritten layer is here, if you can't avoid it:

https://github.com/googleapis/nodejs-pubsub/blob/main/src/message-stream.ts

Maybe that would help figure out if there's something missing.

@ik-southpole
Copy link

ik-southpole commented Jun 5, 2024

Our process requires buffering n messages before processing and acknowledging them all at once. However, the Subscription interface, which implements the EventEmitter interface, emits messages individually, while we needed to handle them in batches as Message[]. Additionally, when using Subscription, the "Ack message count by delivery type" always showed pull instead of streaming_pull. To achieve higher throughput as recommended by the documentation, we opted for streaming_pull. Consequently, using the same subscriber implementation, we observed that Subscription achieved 700 acks per second, whereas v1.SubscriberClient#streamingPull (custom) doubled our application's throughput.

@feywind
Copy link
Collaborator

feywind commented Jun 6, 2024

@ik-southpole Thanks for the info. I know that there's a desire to provide a way to receive chunks of messages as Message[] to user callbacks, but I'm not sure where it is on the roadmap.

@kamalaboulhosn Do you have thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants