Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway downstream health checks #3930

Merged
merged 10 commits into from
Apr 2, 2020

Conversation

trevor-scheer
Copy link
Member

@trevor-scheer trevor-scheer commented Mar 30, 2020

Purpose

This PR introduces a new configuration option to the gateway: performServiceHealthChecks. This option adds an additional precautionary step to the gateway load and schema update code paths.

A gateway with this setting turned on will send a simple graphql query to all downstream services ({ __typename }) to ensure the service is life and responsive. It performs these queries on initial schema load as well as during a schema update. The interesting failure behaviors are:

  • On load: throw an error, failure to load.
  • On schema update: "rollback" (never roll forward). Log the event, but continue serving requests with the old schema.

TODO:

  • Docs
  • Changelog
  • PR Description

@trevor-scheer trevor-scheer force-pushed the trevor/gateway-downstream-health-checks branch from 1aeb4c4 to 88d049a Compare March 30, 2020 20:06
@trevor-scheer trevor-scheer requested a review from abernix March 31, 2020 00:10
@trevor-scheer trevor-scheer marked this pull request as ready for review March 31, 2020 22:54
@trevor-scheer trevor-scheer changed the title [WIP] Gateway downstream health checks Gateway downstream health checks Apr 1, 2020
Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wonderful! Really a nice addition to the reliability of Gateway and I think addresses the largest concern raised in https://github.com/apollographql/apollo-server/issues/3540.

In that regard, we may also want to provide documentation for how to use it with Apollo Server's onHealthCheck to meet the requirements of most health-check probes, which might be as simple as:

const gateway = new ApolloGateway({ performServiceHealthChecks: true });
const server = new ApolloServer({
  gateway,
  onHealthCheck() {
    /* Assuming no existing functionality */
    return gateway.performServiceHealthChecks();
  }
});

... and then a recommendation to leverage https://server:port/.well-known/apollo/server-health as the endpoint to ping.

Nothing blocking here, and you should feel free to move this forward however you like, but I did leave a few comments within.

packages/apollo-gateway/src/index.ts Outdated Show resolved Hide resolved
packages/apollo-gateway/src/index.ts Outdated Show resolved Hide resolved
packages/apollo-gateway/src/index.ts Outdated Show resolved Hide resolved
packages/apollo-gateway/src/index.ts Show resolved Hide resolved
docs/source/api/apollo-gateway.mdx Outdated Show resolved Hide resolved
packages/apollo-gateway/CHANGELOG.md Outdated Show resolved Hide resolved
packages/apollo-gateway/src/index.ts Show resolved Hide resolved
@trevor-scheer trevor-scheer force-pushed the trevor/gateway-downstream-health-checks branch from 5755f83 to 565d96e Compare April 2, 2020 03:57
Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

565d96e looks good to me, even if it's a complicated beast.

Let's not do it now, but in the future, we might consider exposing a __testing property if it makes things particularly more simple. Hiding that from TypeScript might take some slight but worthwhile crafting.

packages/apollo-gateway/src/index.ts Outdated Show resolved Hide resolved
@trevor-scheer trevor-scheer force-pushed the trevor/gateway-downstream-health-checks branch from 49692bb to 81d9af1 Compare April 2, 2020 19:32
@trevor-scheer trevor-scheer merged commit 5eef2a6 into release-2.12.0 Apr 2, 2020
@trevor-scheer trevor-scheer deleted the trevor/gateway-downstream-health-checks branch April 2, 2020 19:39
abernix pushed a commit to apollographql/federation that referenced this pull request Sep 4, 2020
This commit introduces a new configuration option to the gateway: performServiceHealthChecks. This option adds an additional
precautionary step to the gateway load and schema update code paths.

A gateway with this setting turned on will send a simple graphql query
to all downstream services ({ __typename }) to ensure the service is
live and responsive. It performs these queries on initial schema load
as well as during a schema update. The interesting failure behaviors are:

* On load: throw an error, failure to load.
* On schema update: "rollback" (never roll forward). Log the event, but continue serving requests with the old schema.
Apollo-Orig-Commit-AS: apollographql/apollo-server@5eef2a6
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants