Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storcon: forward requests from stepped down instance to the current l…
…eader (#8954) ## Problem It turns out that we can't rely on external orchestration to promptly route trafic to the new leader. This is downtime inducing. Forwarding provides a safe way out. ## Safety We forward when: 1. Request is not one of ["/control/v1/step_down", "/status", "/ready", "/metrics"] 2. Current instance is in [`LeadershipStatus::SteppedDown`] state 3. There is a leader in the database to forward to 4. Leader from step (3) is not the current instance If a storcon instance is persisted in the database, then we know that it is the current leader. There's one exception: time between handling step-down request and the new leader updating the database. Let's treat the happy case first. The stepped down node does not produce any side effects, since all request handling happens on the leader. As for the edge case, we are guaranteed to always have a maximum of two running instances. Hence, if we are in the edge case scenario the leader persisted in the database is the stepped down instance that received the request. Condition (4) above covers this scenario. ## Summary of changes * Conversion utilities for reqwest <-> hyper. I'm not happy with these, but I don't see a better way. Open to suggestions. * Add request forwarding logic * Update each request handler. Again, not happy with this. If anyone knows a nice to wrap the handlers, lmk. Me and Joonas tried :/ * Update each handler to maybe forward * Tweak tests to showcase new behaviour
- Loading branch information
b719d58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5056 tests run: 4882 passed, 0 failed, 174 skipped (full report)
Flaky tests (5)
Postgres 17
test_pageserver_compaction_smoke
: release-arm64test_ondemand_wal_download_in_replication_slot_funcs
: release-arm64Postgres 16
test_slots_and_branching
: release-arm64Postgres 15
test_tenant_config
: release-arm64Postgres 14
test_tenant_config
: release-arm64Code coverage* (full report)
functions
:32.0% (7413 of 23182 functions)
lines
:49.8% (59577 of 119717 lines)
* collected from Rust tests only
b719d58 at 2024-09-17T12:27:56.696Z :recycle: