Skip to content

Commit

Permalink
[keyserver] run a temporary express health check server during migrat…
Browse files Browse the repository at this point in the history
…ion for AWS load balancer health checks

Summary:
We had a problem when running our last keyserver migration on Ashoat's keyserver where the migration took much longer than prior migrations. The AWS load balancer's default grace period for the ecs task ran out and
the migration process was cut off in the middle. This diff runs a temporary health check express server during the migration process so AWS can establish the ecs task as healthy

Was originally worried that we would need to fully stop the health check express server, but given that it's only on the master process, I don't believe this is necessary. Additionally, the default `close` provided function prevents new connections

Test Plan:
Tested this locally by creating a dummy migration that slept for 10 seconds. During this time, keyserver endpoints were unavailable but the health check returned an OK response. Upon completing the migration,
both the health route and keyserver endpoints were available.

Also temporarily modified the health check route to timeout infinitely. Upon finishing the migration, the ongoing health check was terminated and the express server keyserver endpoints was able to be initialized.

Will not land without testing on AWS, ensuring a healthy primary node goes online with a dummy migration running past the default AWS grace period

Reviewers: varun, ashoat

Reviewed By: ashoat

Subscribers: tomek

Differential Revision: https://phab.comm.dev/D13172
  • Loading branch information
wyilio committed Sep 9, 2024
1 parent 4320412 commit 8e5753f
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 0 deletions.
10 changes: 10 additions & 0 deletions keyserver/flow-typed/npm/stoppable_vx.x.x.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// flow-typed signature: 4ffffaa246e31defe6bd7054e7869482
// flow-typed version: <<STUB>>/stoppable_v1.1.0/flow_v0.202.1

declare module 'stoppable' {
declare module.exports: {
<T: http$Server | ?https$Server>(server: T, grace?: number): T & {
+stop: (callback?: (error: ?Error, gracefully: boolean) => void) => void,
}
}
}
1 change: 1 addition & 0 deletions keyserver/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@
"sharp": "^0.30.5",
"siwe": "^2.1.4",
"sql-template-strings": "^2.2.2",
"stoppable": "^1.1.0",
"tcomb": "^3.2.29",
"twin-bcrypt": "^2.1.1",
"uuid": "^3.4.0",
Expand Down
30 changes: 30 additions & 0 deletions keyserver/src/keyserver.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import type { $Request, $Response } from 'express';
import expressWs from 'express-ws';
import os from 'os';
import qrcode from 'qrcode';
import stoppable from 'stoppable';

import './cron/cron.js';
import { qrCodeLinkURL } from 'lib/facts/links.js';
Expand Down Expand Up @@ -110,13 +111,42 @@ void (async () => {

if (cluster.isMaster) {
if (isPrimaryNode) {
const healthCheckApp = express();
healthCheckApp.use(express.json({ limit: '250mb' }));
healthCheckApp.get('/health', (req: $Request, res: $Response) => {
res.send('OK');
});

// We use stoppable to allow forcibly stopping the health check server
// on the master process so that non-master processes can successfully
// initialize their express servers on the same port without conflict
const healthCheckServer = stoppable(
healthCheckApp.listen(
parseInt(process.env.PORT, 10) || 3000,
listenAddress,
),
0,
);

const didMigrationsSucceed: boolean = await migrate();
if (!didMigrationsSucceed) {
// The following line uses exit code 2 to ensure nodemon exits
// in a dev environment, instead of restarting. Context provided
// in https://github.com/remy/nodemon/issues/751
process.exit(2);
}

if (healthCheckServer) {
await new Promise((resolve, reject) => {
healthCheckServer.stop(err => {
if (err) {
reject(err);
} else {
resolve();
}
});
});
}
}

if (shouldDisplayQRCodeInTerminal && isPrimaryNode) {
Expand Down
5 changes: 5 additions & 0 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -22492,6 +22492,11 @@ stealthy-require@^1.1.1:
resolved "https://registry.yarnpkg.com/stealthy-require/-/stealthy-require-1.1.1.tgz#35b09875b4ff49f26a777e509b3090a3226bf24b"
integrity sha512-ZnWpYnYugiOVEY5GkcuJK1io5V8QmNYChG62gSit9pQVGErXtrKuPC55ITaVSukmMta5qpMU7vqLt2Lnni4f/g==

stoppable@^1.1.0:
version "1.1.0"
resolved "https://registry.yarnpkg.com/stoppable/-/stoppable-1.1.0.tgz#32da568e83ea488b08e4d7ea2c3bcc9d75015d5b"
integrity sha512-KXDYZ9dszj6bzvnEMRYvxgeTHU74QBFL54XKtP3nyMuJ81CFYtABZ3bAzL2EdFUaEwJOBOgENyFj3R7oTzDyyw==

stopwords-iso@^1.1.0:
version "1.1.0"
resolved "https://registry.yarnpkg.com/stopwords-iso/-/stopwords-iso-1.1.0.tgz#dc303db6b0842d4290bc1339b4eaf37b94219395"
Expand Down

0 comments on commit 8e5753f

Please sign in to comment.