Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.1.x] CORE-8082 cloud_io: add missing error handling to #24079

Conversation

pgellert
Copy link
Contributor

@pgellert pgellert commented Nov 8, 2024

Backport of PR #24059

Fixes #24077

Cherry pick conflicts:

  • BAZEL build not present earlier
  • remote.cc has moved to a different path

This is to allow passing in a `retry_chain_logger` which does not
inherit from `ss::logger` but wraps it.

(cherry picked from commit f388831)
The call to `drain_response_stream` may throw various transport related
errors (see one example below of a Broken Pipe error observed in CI).
These errors should be handled inside the `remote::download_object`
method because the caller's expectation is that download-related errors
are communicated via the `download_result` return type rather than
through an exception. Some of these errors (like the broken pipe error
below) could also be retried, whereas with the previous implementation
they were not retried.

These exceptions are often ignored by the caller and may be printed as
"Exceptional future ignored" log lines, which cause CI failures and are
less useful for debugging.

The below is an example of one such ignored exceptional future in the
remote partition finalizing background fibre:
```
INFO  2024-10-29 12:41:17,708 [shard 1:main] cloud_storage - [fiber474 kafka/fuzzy-operator-6356-dzxvff/4] - remote_partition.cc:1406 - Finalizing remote storage state...
DEBUG 2024-10-29 12:41:17,723 [shard 1:main] cloud_io - [fiber819~0|1|19984ms] - remote.cc:430 - Receive OK response from "37836c6f-30b0-482f-bb4e-0f3dffdb5cbe/meta/kafka/fuzzy-operator-6356-dzxvff/1_3447/manifest.bin"
WARN  2024-10-29 12:41:17,723 [shard 1:main] http - /37836c6f-30b0-482f-bb4e-0f3dffdb5cbe/meta/kafka/fuzzy-operator-6356-dzxvff/1_3447/manifest.bin - client.cc:414 - receive error std::__1::system_error (error generic:32, System error during SSL read: [error:FFFFFFFF80000020:system library::Broken pipe]: Broken pipe)
WARN  2024-10-29 12:41:17,723 [shard 1:main] seastar - Exceptional future ignored: std::__1::system_error (error generic:32, System error during SSL read: [error:FFFFFFFF80000020:system library::Broken pipe]: Broken pipe), backtrace: 0xa73be23 0xa392e05 0x360a6b8 0x9352157 0x360a71a 0xa48cc6f 0xa49045c 0xa4e77ca 0xa402f3f /opt/redpanda/lib/libc.so.6+0x961b6 /opt/redpanda/lib/libc.so.6+0x11839b
```

(cherry picked from commit ad14537)
@pgellert pgellert added this to the v24.1.x-next milestone Nov 8, 2024
@pgellert pgellert added the kind/backport PRs targeting a stable branch label Nov 8, 2024
@pgellert pgellert self-assigned this Nov 8, 2024
@pgellert pgellert marked this pull request as ready for review November 8, 2024 15:50
@pgellert pgellert requested review from Lazin, nvartolomei, a team and BenPope and removed request for a team November 8, 2024 15:50
@vbotbuildovich
Copy link
Collaborator

the below tests from https://buildkite.com/redpanda/redpanda/builds/57846#01930c79-b3a1-437f-b18d-195d6d1737d7 have failed and will be retried

cloud_storage_rpfixture

@piyushredpanda piyushredpanda merged commit 6764fd6 into redpanda-data:v24.1.x Nov 9, 2024
18 checks passed
@BenPope BenPope modified the milestones: v24.1.x-next, v24.1.18 Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants