-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shield HTTP request handlers from async cancellations. #4314
Conversation
1004 tests run: 964 passed, 0 failed, 40 skipped (full report)The comment gets automatically updated with the latest test results
b613988 at 2023-06-01T05:37:59.704Z :recycle: |
8658d82
to
c53deb9
Compare
4cbf1aa
to
3b27ee6
Compare
68d605b
to
90431c8
Compare
f7b5731
to
d015049
Compare
90431c8
to
3a6f946
Compare
If the timeline is already being deleted, return an error. We used to notice the duplicate request and error out in persist_index_part_with_deleted_flag(), but it's better to detect it earlier. Add an explicit lock for the deletion. Note: This doesn't do anything about the async cancellation problem (github issue #3478): if the original HTTP request dropped, because the client disconnected, the timeline deletion stops half-way through the operation. That needs to be fixed, too, but that's a separate story. (This is a simpler replacement for PR #4194. I'm also working on the cancellation shielding, see PR #4314.)
d015049
to
7777ee2
Compare
All the prerequisite PRs have now been merged, and I have rebased this over 'main'. This is now ready to be reviewed and merged. |
d13a3ad
to
c9b8eaf
Compare
Unsure what are the compilation errors, maybe in relation to |
c9b8eaf
to
e3e41e5
Compare
Some of them at least, yes. I don't undertstand the Re-triggered. |
e3e41e5
to
68750b7
Compare
We now spawn a new task for every HTTP request, and wait on the JoinHandle. If Hyper drops the Future, the spawned task will keep running. This protects the rest of the pageserver code from unexpected async cancellations. This creates a CancellationToken for each request and passes it to the handler function. If the HTTP request is dropped by the client, the CancellationToken is signaled. None of the handler functions make use for the CancellationToken currently, but they now they could. The CancellationToken arguments also work like documentation. When you're looking at a function signature and you see that it takes a CancellationToken as argument, it's a nice hint that the function might run for a long time, and won't be async cancelled. The default assumption in the pageserver is now that async functions are not cancellation-safe anyway, unless explictly marked as such, but this is a nice extra reminder. Spawning a task for each request is OK from a performance point of view because spawning is very cheap in Tokio, and none of our HTTP requests are very performance critical anyway. Fixes issue #3478
Per discussion, this is intentionally enabled even when compiled without the testing feature.
Because the handler will now continue running, that's not a series issue anymore. But a line in the log might be still useful.
With the refactoring of testing_api_handler, the 'testing' handlers are compiled even if 'testing' feature is not enabled. They are unreachable, but the compiler doesn't realize that.
For the cancellation guard to work, we need to do the tokio::spawn() inside the request_span(), not the other way round. Fix test case, now that the first timeline_delete() continues to run in the background, even if the HTTP connection is closed.
68750b7
to
b613988
Compare
Thanks for the review! |
We now spawn a new task for every HTTP request, and wait on the JoinHandle. If Hyper drops the Future, the spawned task will keep running. This protects the rest of the pageserver code from unexpected async cancellations.
This creates a CancellationToken for each request and passes it to the handler function. If the HTTP request is dropped by the client, the CancellationToken is signaled. None of the handler functions make use for the CancellationToken currently, but they now they could.
The CancellationToken arguments also work like documentation. When you're looking at a function signature and you see that it takes a CancellationToken as argument, it's a nice hint that the function might run for a long time, and won't be async cancelled. The default assumption in the pageserver is now that async functions are not cancellation-safe anyway, unless explictly marked as such, but this is a nice extra reminder.
Spawning a task for each request is OK from a performance point of view because spawning is very cheap in Tokio, and none of our HTTP requests are very performance critical anyway.
Fixes issue #3478