-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pub Sub messages failed due to due to DEADLINE_EXCEEDED: #3867
Comments
Could you tell us what your application should do in this circumstances? Knowing that will make it easier to make recommendations; I think some form of load shedding is still required generally though. |
@pongad Our application is trying to dump millions of data objects to other system as part of migration. We use pubsub to publish all such data objects . If the application crashes the only thing we can do is to restart the dump , because the data migrated is very important and not even a single record of this data can be missed , considering the amount of data (over 50m ) it becomes impossible to read delta from database (as it becomes really slow with where clauses) , thus only option is to restart full dump . ! But good news is meanwhile i tried some new approach to put some brakes to messages being posted to pub sub and give it some breathing space. Basically we maintain a count which is difference of messages posted and messages delivered ( success or failure). Whenever that count is greater than 800 we sleep all incoming threads(for 12 seconds as RPC total timeout is 10 seconds) to allow pub sub to clear its outstanding messages. As of now i have processed around 30M records and it performed well with 0 failures vs earlier (10% of messages failed). Can something similar be added to pub sub Out of the box which recognizes when pub sub is overloaded and gives it some breathing space until outstanding messages clear ? |
This makes sense. I think we added something like this in the beginning but then found that different people want to different behavior. If you have a batch job you want to slow down. If you have something interactive it might make sense to error etc. I think it might be better to add some kind of adapter on top like Publisher pub = Publisher.newBuilder().....
BlockingPublisher blockPub = new BlockingPublisher(pub, 1000); // at most 1000 in flight at once.
for (int i = 0; i < 1000; i++) {
blockPub.publish(someMessage); // these don't block.
}
blockPub.publish(anotherMessage); // blocks until one of the above finishes. @JesseLovelace @chingor13 WDYT? We should get pubsub team to sign off before actually doing this. |
I am seeing this error as well. @pongad @ayushj158 can you both explain what exactly is going on here? I don't understand the finer details of the bug by reading your comments above. Can one of you elaborate on this please? Where and why is the DEADLINE_EXCEEDED exception generated? Is it generated at the pubsub client before sending the messages or is it at the pubsub server? I have all the flow control settings configured in my pubsub client app. Here is how it looks:
I see DEADLINE_EXCEEDED error in the logs at random intervals. I am not ingesting too much traffic either. It is less than 1MB per second at the moment. The version of the client i am using is here:
Here is the complete exception stacktrace:
Appreciate you help. Thanks. |
@sholavanalli where are you setting those parameters? I haven't worked on this code for a while but I don't think Currently |
@sholavanalli As pongad mentioned i am also not aware of such arguments being respected by Pub Sub client and as for your question, it does fails at pub sub client because of the timeout (which occurs due to the load you put on pub sub client vs what is it able to send to pub sub server) , basically you are overloading the pub sub client with more load than it could handle. For our case we used load shedding to give some breathing space to pub sub as soon as we identify it is getting clogged,i have tired load shedding to send more than 130m records and it worked seamlessly without a single failure but it does degrade the performance a little . If you need the solution until GCP comes with the OOB solution , i can help you there. |
@pongad I set those parameters to the flow control settings that is set on the pubsub client. The client batches messages to pubsub when one of the 3 thresholds is reached. |
This needs throttling and it will be available once we implement new batcher with flow control. Closing this here and we will track this in #3003 |
Thanks for stopping by to let us know something could be better!
Please include as much information as possible:
Environment details
Steps to reproduce
Stacktrace
o.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: .
grpc-default-worker-ELG-2-2]` 7267 --- INFO i.g.n.s.i.n.h.c.h.DefaultHttp2ConnectionDecoder - [id: 0x204dac20, L:/10.8.61.36:58898 - R:pubsub.googleapis.com/172.217.20.74:443] ignoring HEADERS frame for stream RST_STREAM sent. {}
``
Code snippet
com.google.cloud.pubsub.v1.Publisher.publish(com.google.pubsub.v1.PubsubMessage;)
Any additional information below
We are publishing some bulk messages to pub sub and intermittently we keep seeing few messages getting failed due to :
I checked one the previous open issues #2722 which states the issues is resolved. I am not sure what is the resolution ?
just upgrading to the higher versions will fix the issue or
we need to add some custom timeout settings (if yes where , RetrySettings?)
@pongad @kir-titievsky
Following these steps will guarantee the quickest resolution possible.
Thanks!
The text was updated successfully, but these errors were encountered: