Ever wanted to run Federated GraphQL in AWS Lambda?
In this repo we do a comparison of Cold/Warm Start times of Federated GraphQL solutions (Cosmo, Mesh, Apollo Gateway, Apollo Router) and provide a minimal/scrappy build of Apollo Router built for Lambda via Amazon Linux 2, as well as a version of Cosmo built in Go that you can utilize.
-
⚡️ TL;DR I'd recommend the
lambda-cosmo-custom
(available asbootstrap-cosmo-arm
in the Releases) alternative which is a lot less hacky and much more performant (300-500ms Cold Starts). See the README for details on how to use. -
⚡️ TL;DR 2: Of the Apollo Router variants,
lambda-directly-optimized
beats all other variants and is on par with the alternatives for Cold and Warm Starts (use thebootstrap-directly-optimized-graviton-arm-size
binary).
Overview:
- Motivation
- Measurements
- How to use
- Comparison: Federation via Apollo Router (Cold Starts)
- Comparison: Federation via Apollo Router (Warm Starts)
- Comparison: Rust Subgraph in AWS Lambda
- Comparison: Federation via Apollo Gateway
- Comparison: Federation via GraphQL Mesh
- Comparison: Federation via Cosmo Router
Serverless is great to get started with a low-cost Cloud setup that'll scale you from zero to profitable without having to worry about infrastructure overhead. That said, there are not many great Federated GraphQL solutions that work out-of-the box for Serverless. Router from Apollo and Cosmo from Wundergraph are both tailoered to long-running processes, e.g. in a k8s cluster. Mesh and Apollo Gateway are both JavaScript programs which incur a massive penalty in Cold Start times and are thus not a great solution.
Currently Apollo Router does not support running in AWS Lambda (apollographql/router#364). Instead it's focusing on running as a long-lived process, which means that it's less optimized for quick startup, as well as built with dependencies that does not mesh with Lambda's Amazon Linux 2 environment. Similarly for Cosmo, although with Cosmo we can actually use the binary although with a bit of indirection.
But what if we were a little bit creative? Could we get it to work? The answer is: Yes! (sorta)...
This repository contains five examples:
lambda-with-server/
: Spins up a Apollo Router using the apollo-router crate, and proxies Lambda Events to the HTTP server locally.lambda-directly/
: Uses the TestHarness that Apollo Router uses to easily make GraphQL requests in its tests without needing a full Router. The Lambda takes the incoming event, runs it through theTestHarness
and returns the result.lambda-directly-optimized/
: Same approach aslambda-directly
, but we only construct the TestHarness once and then reuse it across all invocations. We also optimize loading configurations as well as initializing the Supergraph by doing it during Lambda's Initialization phase, which runs at full resource. Additionally, we buid this for the ARM architecture and also optimize it for the AWS Graviton CPU.lambda-cosmo
: A small Rust wrapper that starts the Cosmo binary and proxies events to the server.lambda-cosmo-custom
: Spins up a Cosmo sever using the Cosmo Router and proxies Lambda Events to the HTTP server locally, similar tolambda-with-server
.
We do some additional tricks to reduce the size of the Apollo variants in the bootstrap-directly-optimized-graviton-arm-size
binary, which has an impact on Cold Starts:
- We remove location details, panic string formatting, and abort on panic
- We rebuild and optimize libstd with build-std, which combined with the above brings us from ~71MB down to ~49MB.
We use upx to reduce the size of the binaries.Unfortuntately, the overhead of decompressing the binary significantly increases Cold Start times, e.g.lambda-directly-optimized
goes up from 0.8s to 2.5s, despite a binary reduction from 73.71MB to 18MB.
Check out the code and Dockerfile
for each. There's really not a lot going on, and it is a minimal implementation compared to what you'd want in Production. My current recommendation would be either use the bootstrap-directly-optimized-graviton-arm
binary produced from the lambda-directly-optimized
approach in AWS Lambda, or to run Apollo Router in App Runner, which it does extremely well (I can max out the allowed 200 concurrent requests on a 0.25 CPU and 0.5GB Memory setting).
Measurement (ms) | GraphQL Mesh (512 MB) |
GraphQL Mesh (1024 MB) |
GraphQL Mesh (2048 MB) |
lambda-directly-optimized (512 MB) |
lambda-directly-optimized (1024 MB) |
lambda-directly-optimized (2048 MB) |
Cosmo (512 MB) |
Cosmo (1024 MB) |
Cosmo (2048 MB) |
Apollo Gateway (512 MB) |
Apollo Gateway (1024 MB) |
Apollo Gateway (2048 MB) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Average warm start response time | 10.2 ms | 10 ms | 10.3 ms | 6.8 ms | 6.2 ms | 6.8 ms | 10.7 ms | 10 ms | 9.8 ms | 8.8 ms | 8.9 ms | 9.8 ms |
Average cold start response time | 615.9 ms | 609.8 ms | 565.2 ms | 703.3 ms | 681.8 ms | 678 ms | 442.9 ms | 464.7 ms | 427.7 ms | 1037.7 ms | 871.2 ms | 851 ms |
Fastest warm response time | 6.9 ms | 7.9 ms | 8 ms | 5 ms | 5 ms | 6 ms | 6.9 ms | 7.9 ms | 7.9 ms | 6.9 ms | 6.9 ms | 6.9 ms |
Slowest warm response time | 38.9 ms | 38.9 ms | 38.9 ms | 11 ms | 11 ms | 9 ms | 19 ms | 11.9 ms | 10.9 ms | 12 ms | 12 ms | 10.9 ms |
Fastest cold response time | 495.9 ms | 495.9 ms | 495.9 ms | 625 ms | 625 ms | 625 ms | 328 ms | 328 ms | 328 ms | 797 ms | 797 ms | 797 ms |
Slowest cold response time | 877 ms | 786.9 ms | 786.9 ms | 2724 ms | 804 ms | 724.9 ms | 581 ms | 531 ms | 505 ms | 1170 ms | 1039.9 ms | 898 ms |
Of the Apollo variants specifically:
Approach | Advantage | Performance |
---|---|---|
lambda-with-server |
· Full router functionality (almost) | · Cold Start: ~1.58s · Warm Start: ~49ms |
lambda-directly |
· No need to wait for a server to start first (lower overhead) | · Cold Start: ~1.32s · Warm Start: ~314ms |
lambda-directly-optimized |
· No need to wait for a server to start first (lower overhead) · Built for ARM · Optimized for the Graviton CPU |
Optimized for size · Cold Start: ~0.7s · Warm Start: ~20ms Optimized for speed · Cold Start: ~0.9s · Warm Start: ~20ms |
Each of the approach are generic and can be used as-is. You can simply grab whichever variant you want from the Releases page, which uploads the bootstrap
artifact from each of them.
For example, let's say we want to try running lambda-with-server
.
First we create a folder to hold our artifacts in, which we will .zip up and deploy to Lambda:
$ mkdir apollo-router
Then we download the relevant
$ curl -sSL https://github.com/codetalkio/apollo-router-lambda/releases/latest/download/bootstrap-directly-optimized-graviton-arm-size -o bootstrap
$ mv bootstrap ./apollo-router/bootstrap
Now we just need to add our router.yaml
and supergraph.graphql
since the services will look these up from their same folder during startup:
# From whereever your router.yaml is:
$ cp router.yaml ./apollo-router/router.yaml
# From whereever your supergraph.graphql is:
$ cp supergraph.graphql ./apollo-router/supergraph.graphql
You now have the following contents in your apollo-router
folder:
.
├── ms-router
│  ├── bootstrap
│  ├── router.yaml
│  └── supergraph.graphql
And you're ready to deploy using your preferred method of AWS CDK/SAM/SLS/SST/CloudFormation/Terraform.
The lambda-directly-optimized
approach is the only one that enters the realm of "acceptable" cold starts. Still high, but almost always below 1 second. Both of the other approachs unfortunately have quite a high cold start time. The lambda-directly
approach wins by a tiny margin, but none are great. None of the variants talk to any Subgraphs, this is purely measuring the overhead of startup.
lambda-with-server
A good 450ms of this is spent just waiting for the Router to spin up:
Breakdown of only the router (making no queries to subgraphs):
Measurement (ms) | 128 MB | 256 MB | 512 MB | 1024 MB | 2048 MB |
---|---|---|---|---|---|
Average warm start response time | 8.3 ms | 8.7 ms | 7.6 ms | 7.6 ms | 8 ms |
Average cold start response time | 2870.9 ms | 2570.4 ms | 2174.1 ms | 1012.8 ms | 943.4 ms |
Fastest warm response time | 6 ms | 6 ms | 6 ms | 6.9 ms | 6.9 ms |
Slowest warm response time | 16.9 ms | 16.9 ms | 16.9 ms | 16.9 ms | 16.9 ms |
Fastest cold response time | 837 ms | 837 ms | 837 ms | 837 ms | 837 ms |
Slowest cold response time | 3861.9 ms | 3861.9 ms | 2612.9 ms | 1625 ms | 1139 ms |
lambda-directly
lambda-directly-optimized
(optimized for speed)
A few samples of lambda-directly-optimized
(optimized for speed) Cold Starts:
Breakdown of only the router (making no queries to subgraphs):
Measurement (ms) | 128 MB | 256 MB | 512 MB | 1024 MB | 2048 MB |
---|---|---|---|---|---|
Average warm start response time | 9.7 ms | 5.4 ms | 5.6 ms | 6.1 ms | 5.8 ms |
Average cold start response time | 858 ms | 837.6 ms | 775.5 ms | 768.3 ms | 753.2 ms |
Fastest warm response time | 4.9 ms | 4.9 ms | 4.9 ms | 4.9 ms | 4.9 ms |
Slowest warm response time | 23 ms | 8 ms | 7 ms | 7 ms | 7 ms |
Fastest cold response time | 719 ms | 719 ms | 719 ms | 719 ms | 719 ms |
Slowest cold response time | 1075 ms | 981.9 ms | 981.9 ms | 981.9 ms | 868 ms |
lambda-directly-optimized
(optimized for size)
A few samples of lambda-directly-optimized
(optimized for size) Cold Starts:
Breakdown of only the router (making no queries to subgraphs):
Measurement (ms) | 128 MB | 256 MB | 512 MB | 1024 MB | 2048 MB |
---|---|---|---|---|---|
Average warm start response time | 5.2 ms | 5.6 ms | 5.2 ms | 5.6 ms | 5.5 ms |
Average cold start response time | 735.8 ms | 735.6 ms | 698.1 ms | 698.8 ms | 688.1 ms |
Fastest warm response time | 4 ms | 4 ms | 4.9 ms | 4.9 ms | 4.9 ms |
Slowest warm response time | 72.9 ms | 20.9 ms | 9.9 ms | 8 ms | 8 ms |
Fastest cold response time | 617 ms | 617 ms | 617 ms | 617 ms | 617 ms |
Slowest cold response time | 985 ms | 985 ms | 894.9 ms | 894.9 ms | 762 ms |
Here we see both lambda-directly-optimized
and lambda-with-server
shine. Once it's started the Apollo Router/TestHarness, then it has relatively little overhead. lambda-directly
on the other hand will build a TestHarness
on each new request, and will keep paying a high cost, slowing it down.
Both of these examples talk to 1 warm subgraph implemented in Rust, to simulate a real warm run.
lambda-with-server
lambda-directly
lambda-directly-optimized
(optimized for size)
For comparison so that you know how far we could go, here's a subgraph in Rust implemented using async-graphql and wrapped up in cargo-lambda.
Cold Start (201ms):
Warm Start (8ms):
To have something to compare the Apollo Router PoC more directly against, here's one alternative using Apollo Gateway.
Cold start (1.23ms):
Warm start (120ms):
Breakdown of only the router (making no queries to subgraphs):
Measurement (ms) | 512 MB | 1024 MB | 2048 MB |
---|---|---|---|
Average warm start response time | 8.8 ms | 8.9 ms | 9.8 ms |
Average cold start response time | 1037.7 ms | 871.2 ms | 851 ms |
Fastest warm response time | 6.9 ms | 6.9 ms | 6.9 ms |
Slowest warm response time | 12 ms | 12 ms | 10.9 ms |
Fastest cold response time | 797 ms | 797 ms | 797 ms |
Slowest cold response time | 1170 ms | 1039.9 ms | 898 ms |
Another comparison point against the Apollo Router PoC, here's one alternative using GraphQL Mesh.
Cold start (956ms):
Breakdown of only the router (making no queries to subgraphs):
Measurement (ms) | 512 MB | 1024 MB | 2048 MB |
---|---|---|---|
Average warm start response time | 10.2 ms | 10 ms | 10.3 ms |
Average cold start response time | 615.9 ms | 609.8 ms | 565.2 ms |
Fastest warm response time | 6.9 ms | 7.9 ms | 8 ms |
Slowest warm response time | 38.9 ms | 38.9 ms | 38.9 ms |
Fastest cold response time | 495.9 ms | 495.9 ms | 495.9 ms |
Slowest cold response time | 877 ms | 786.9 ms | 786.9 ms |
Another comparison point against the Apollo Router PoC, here's one alternative using Cosmo Router, using the variant from lambda-cosmo-custom.
Cold start (339ms):
Breakdown of only the router (making no queries to subgraphs):
Measurement (ms) | ms-cosmo (512 MB) | ms-cosmo (1024 MB) | ms-cosmo (2048 MB) |
---|---|---|---|
Average warm start response time | 10.7 ms | 10 ms | 9.8 ms |
Average cold start response time | 442.9 ms | 464.7 ms | 427.7 ms |
Fastest warm response time | 6.9 ms | 7.9 ms | 7.9 ms |
Slowest warm response time | 19 ms | 11.9 ms | 10.9 ms |
Fastest cold response time | 328 ms | 328 ms | 328 ms |
Slowest cold response time | 581 ms | 531 ms | 505 ms |