Skip to content

Commit

Permalink
storcon: forward requests from stepped down instance to the current l…
Browse files Browse the repository at this point in the history
…eader (#8954)

## Problem
It turns out that we can't rely on external orchestration to promptly
route trafic to the new leader. This is downtime inducing.
Forwarding provides a safe way out.

## Safety
We forward when:
1. Request is not one of ["/control/v1/step_down", "/status", "/ready",
"/metrics"]
2. Current instance is in [`LeadershipStatus::SteppedDown`] state
3. There is a leader in the database to forward to
4. Leader from step (3) is not the current instance

If a storcon instance is persisted in the database, then we know that it
is the current leader.
There's one exception: time between handling step-down request and the
new leader updating the
database.

Let's treat the happy case first. The stepped down node does not produce
any side effects,
since all request handling happens on the leader.

As for the edge case, we are guaranteed to always have a maximum of two
running instances.
Hence, if we are in the edge case scenario the leader persisted in the
database is the
stepped down instance that received the request. Condition (4) above
covers this scenario.

## Summary of changes
* Conversion utilities for reqwest <-> hyper. I'm not happy with these,
but I don't see a better way. Open to suggestions.
* Add request forwarding logic
* Update each request handler. Again, not happy with this. If anyone
knows a nice to wrap the handlers, lmk. Me and Joonas tried :/
* Update each handler to maybe forward
* Tweak tests to showcase new behaviour
  • Loading branch information
VladLazar authored Sep 17, 2024
1 parent 2db840d commit b719d58
Show file tree
Hide file tree
Showing 2 changed files with 607 additions and 35 deletions.
Loading

1 comment on commit b719d58

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5056 tests run: 4882 passed, 0 failed, 174 skipped (full report)


Flaky tests (5)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 32.0% (7413 of 23182 functions)
  • lines: 49.8% (59577 of 119717 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
b719d58 at 2024-09-17T12:27:56.696Z :recycle:

Please sign in to comment.