Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: ensure http servers shut down gracefully #262

Merged
merged 1 commit into from
Nov 4, 2019
Merged

Conversation

mccutchen
Copy link
Contributor

Problem

Both sso-proxy and sso-auth abort all in-flight requests on shutdown. Instead, they should stop accepting new requests and allow in-flight requests to complete before exiting.

Addressing this will make deploys of the sso services less disruptive to end users.

Solution

Because graceful shutdown is just tricky enough to get right, here we add a new internal httpserver package that exposes a single function, Run(), which will run an http server and arrange for graceful shutdown when receiving SIGTERM or SIGINT. A shutdown timeout must also be provided, after which the server will abort any outstanding requests.

Notes

Reproducing

The easiest way I know to reproduce the behavior uses httpbin via the quickstart:

  1. In the quickstart dir, docker-compose up -d to bring up the demo services
  2. In a browser, visit http://httpbin.sso.localtest.me/delay/9 to start a request that will take 9 seconds to complete
  3. Before those 9 seconds elapse, stop the sso-proxy container (note the -t 30 to tell docker to wait up to 30 seconds before forcibly killing it):
docker stop -t 30 $(docker ps | grep /bin/sso-proxy | awk '{print $1}')

On latest master, you should immediately get an HTTP 502 error.

With these changes, you should instead get a successful response after the 9 seconds have elapsed, and then sso-proxy will be shut down.

Unit tests

I'm not entirely sure how to test this, given that

  1. it relies on the process receiving either SIGTERM or SIGINT, and
  2. the simple httpserver.Run() interface manages the whole lifecycle of the server, leaving no straightforward way to control the listener or figure out what port it is listening on (assuming we'd want it to choose a random available port)

If anybody has any good ideas for tackling (1) I'm all ears; the issues in (2) are under our control, so we could conceivably make the interface more flexible and use a trick like this to choose a random port in advance (though this would be inherently race-y).

I'll also leave a few notes inline below.

@mccutchen mccutchen added the bug Something isn't working label Oct 23, 2019
@codecov
Copy link

codecov bot commented Oct 23, 2019

Codecov Report

Merging #262 into master will increase coverage by 0.04%.
The diff coverage is 70.58%.

@@            Coverage Diff             @@
##           master     #262      +/-   ##
==========================================
+ Coverage   62.14%   62.18%   +0.04%     
==========================================
  Files          53       54       +1     
  Lines        4166     4197      +31     
==========================================
+ Hits         2589     2610      +21     
- Misses       1390     1399       +9     
- Partials      187      188       +1
Impacted Files Coverage Δ
internal/proxy/options.go 84.67% <ø> (ø) ⬆️
internal/auth/configuration.go 51.53% <100%> (+0.24%) ⬆️
internal/pkg/httpserver/httpserver.go 66.66% <66.66%> (ø)

Copy link
Contributor Author

@mccutchen mccutchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes/Qs for reviewers!

cmd/sso-auth/main.go Outdated Show resolved Hide resolved
cmd/sso-proxy/main.go Outdated Show resolved Hide resolved
go.sum Show resolved Hide resolved
internal/pkg/httpserver/httpserver.go Show resolved Hide resolved
cmd/sso-auth/main.go Outdated Show resolved Hide resolved
cmd/sso-proxy/main.go Outdated Show resolved Hide resolved
internal/pkg/httpserver/httpserver.go Outdated Show resolved Hide resolved
cmd/sso-auth/main.go Outdated Show resolved Hide resolved
@mccutchen
Copy link
Contributor Author

@jphines thanks for the feedback, I think I've addressed all your comments.

I'm not entirely sure how to test this

So I did add some tests, please let me know what you think of the approach I took. In short, in the unit tests we redefine the set of shutdown signals that httpserver.Run() listens for, and send the new shutdown signal to the test process to exercise this behavior.

Note, though, the giant dumb caveat here, explaining why this doesn't perfectly test the behavior, and why I'm not sure the extra complexity to perfectly test it is worth it:

"timeout elapsed": {
shutdownTimeout: 50 * time.Millisecond,
requestDelay: 250 * time.Millisecond,
expectShutdownErr: true,
// In real usage, we would expect the request to be aborted when
// the server is shut down and its process exits before it can
// finish responding.
//
// But because we're running the server within the test process,
// which does not exit after shutdown, the goroutine handling the
// long-running request does not seem to get canceled and the
// request ends up completing successfully even after the server
// has shut down.
//
// Properly testing this would require something like re-running
// the test binary as a subprocess to which we can send SIGTERM,
// but doing that would add a lot more complexity (e.g. having it
// bind to a random available port and then running a separate
// subprocess to figure out the port to which it is bound, all in a
// cross-platform way).
//
// If we wanted to go that route, some examples of the general
// approach can be seen here:
//
// - http://cs-guy.com/blog/2015/01/test-main/#toc_3
// - https://talks.golang.org/2014/testing.slide#23
expectRequestErr: false,

@jphines
Copy link
Contributor

jphines commented Nov 4, 2019

@mccutchen I agree with that the additional complexity necessary to "properly" test the signal handling doesn't seem worth it at this point in time.

This seems reasonable to land as-is.

Thank you so much for the contribution, this is a great changeset. 💯 🚀

jphines
jphines previously approved these changes Nov 4, 2019
@mccutchen mccutchen merged commit a29ba90 into master Nov 4, 2019
@mccutchen mccutchen deleted the graceful-shutdown branch November 4, 2019 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants