From 71cb018497e4b55348d36fa0fde66a0b016fa173 Mon Sep 17 00:00:00 2001 From: flakey5 <73616808+flakey5@users.noreply.github.com> Date: Fri, 29 Nov 2024 14:53:14 -0800 Subject: [PATCH] update Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com> --- docs/README.md | 9 ++++ docs/architecture.md | 50 +++++++++++++++++-- docs/{debugging.md => debugging-prod.md} | 9 +++- docs/deploying.md | 6 +-- docs/dev-setup.md | 25 ++++------ docs/r2.md | 26 ++++------ docs/release-process.md | 28 ++++------- docs/sops/README.md | 6 +++ docs/sops/incident-flow.md | 40 ++++++--------- docs/sops/rolling-back-a-release.md | 13 +++-- docs/sops/switch-between-worker-and-origin.md | 15 +++--- 11 files changed, 129 insertions(+), 98 deletions(-) rename docs/{debugging.md => debugging-prod.md} (68%) diff --git a/docs/README.md b/docs/README.md index f04e878..5714cfa 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,3 +1,12 @@ # Documentation Documentation for the Release Worker. + +## Table of Contents + +- [Architecture](./architecture.md) +- [Dev Setup](./dev-setup.md) +- [Debugging Production](./debugging-prod.md) +- [Deploying](./deploying.md) +- [R2](./r2.md) +- [Node.js Release Process](./release-process.md) diff --git a/docs/architecture.md b/docs/architecture.md index dd0b627..0e3ffc6 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,13 +1,55 @@ # Architecture -Documentation on the architecture of the worker (i.e. how it works, how it fits -into Node.js' infrastructure, etc.). +Documentation on the architecture of the worker (i.e. how it works, how it fits into Node.js' infrastructure, etc.). ## Network Request Flow -How a request flows through Node.js' infrastructure as a whole +A high-level overview of how a request flows through Node.js' infrastructure: + +```mermaid +flowchart LR + request[Request] --> cloudflare(Cloudflare Routing Rules) + cloudflare -- /dist/, /download/, /docs/, /api/, /metrics/ --> worker@{ shape: procs, label: "Release Worker"} + cloudflare -- /... --> website(Website) + worker -- Cache miss --> r2[(R2 bucket)] + worker -- Error --> originServer(Origin Server) + originServer + website + r2 +``` ## Worker Request Flow -How the Release Worker routes requests +The Release Worker uses a middleware approach to routing requests. + +When an instance of the worker starts up, it registers a number of routes and their middlewares. +It then builds a "chain" of middlewares to call in the same order they're given to handle the request. + +When a request hits the worker, the router gives it to the first middleware in the chain. +That middleware can then either handle the request and return a response or pass it onto the next middleware. +This goes on until the request is handled or we run out of middlewares to handle the request, upon which we throw an error. + +We currently have the following middlewares (in no particular order): + +- [CacheMiddleware](../src/middleware/cacheMiddleware.ts) - Caches responses to GET request. +- [R2Middleware](../src/middleware/r2Middleware.ts) - Fetches resource from R2. +- [OriginMiddleware](../src/middleware/originMiddleware.ts) - Fetches resource from the origin server. + Used as a fallback if the R2 middleware fails. +- [NotFoundMiddleware](../src/middleware/notFoundMiddleware.ts) - Handles not found requests. +- [OptionsMiddleware](../src/middleware/optionsMiddleware.ts) - Handles OPTIONS requests. +- [SubstituteMiddleware](../src/middleware/subtituteMiddleware.ts) - Handles requests that need URL substituing (i.e. `/dist/latest/` -> `/dist/`) and then feeds them back into the router. + +### Diagram +```mermaid +flowchart TD + request[Request] --> worker(Release Worker) + worker --> routerHandle("Router.handle") + routerHandle -- HTTP GET --> cacheMiddleware("Cache Middleware") + routerHandle -- HTTP HEAD --> r2Middleware + routerHandle -- HTTP OPTIONS --> optionsMiddleware("Options Middleware") + routerHandle -- Request --> substituteMiddleware("Substitute Middleware") + substituteMiddleware -- Substituted Request --> routerHandle + cacheMiddleware -- Cache miss --> r2Middleware("R2 Middleware") + r2Middleware -- Error --> originMiddleware("Origin Middleware") +``` diff --git a/docs/debugging.md b/docs/debugging-prod.md similarity index 68% rename from docs/debugging.md rename to docs/debugging-prod.md index 09d6bc6..b3e3695 100644 --- a/docs/debugging.md +++ b/docs/debugging-prod.md @@ -1,10 +1,15 @@ -# Debugging +# Debugging Prod Steps to aid with debugging the Release Worker's production environment. +> [!NOTE] +> This is mostly meant for Node.js Web Infra team members. +> Some of these steps require access to resources only made available to Collaborators. + ## Steps -- Check [Sentry](https://nodejs-org.sentry.io/issues/?project=4506191181774848). All errors should be reported here. +- Check [Sentry](https://nodejs-org.sentry.io/issues/?project=4506191181774848). + All errors should be reported here. - If a local reproduction is found, Cloudflare has an implementation of [Chrome's DevTools](https://developers.cloudflare.com/workers/observability/dev-tools/). diff --git a/docs/deploying.md b/docs/deploying.md index cfb5865..ba0ec02 100644 --- a/docs/deploying.md +++ b/docs/deploying.md @@ -4,10 +4,8 @@ Guide on how to deploy the Release Worker. ## Staging Deployments -The Release Worker is automatically deployed to its staging environment when a -new commit is pushed to the `main` branch through the [Deploy Worker](https://github.com/nodejs/release-cloudflare-worker/actions/workflows/deploy.yml) action. +The Release Worker is automatically deployed to its staging environment when a new commit is pushed to the `main` branch through the [Deploy Worker](https://github.com/nodejs/release-cloudflare-worker/actions/workflows/deploy.yml) action. ## Production Deployments -The Release Worker is deployed to its production environment by a Collaborator -manually running the [Deploy Worker](https://github.com/nodejs/release-cloudflare-worker/actions/workflows/deploy.yml) action. +The Release Worker is deployed to its production environment by a Collaborator manually running the [Deploy Worker](https://github.com/nodejs/release-cloudflare-worker/actions/workflows/deploy.yml) action. diff --git a/docs/dev-setup.md b/docs/dev-setup.md index abcfa2a..c5360c8 100644 --- a/docs/dev-setup.md +++ b/docs/dev-setup.md @@ -6,29 +6,25 @@ Documentation on how to run the Release Worker locally. ### 1. Prepare environment -Read and follow the [Getting Started](../CONTRIBUTING.md) guide to get your -local environment setup. +Read and follow the [Getting Started](../CONTRIBUTING.md) guide to get your local environment setup. ### 2. Setup your Cloudflare account -Currently we run the worker in [remote mode](https://developers.cloudflare.com/workers/testing/local-development/#develop-using-remote-resources-and-bindings) as there isn't a nice way to -locally populate an R2 bucket. This means that, to run the Release Worker -locally, you must have a Cloudflare account that has an R2 bucket named -`dist-prod`. You will also need to populate the bucket yourself. +Currently we run the worker in [remote mode](https://developers.cloudflare.com/workers/testing/local-development/#develop-using-remote-resources-and-bindings) as there isn't a nice way to locally populate an R2 bucket. +This means that, to run the Release Worker locally, you must have a Cloudflare account that has an R2 bucket named +`dist-prod`. +You will also need to populate the bucket yourself. -Both of these will hopefully change in the future to make running the Release -Worker easier. +Both of these will hopefully change in the future to make running the Release Worker easier. ### 3. Create secrets for directory listings This step is optional but recommended. -The Release Worker uses R2's S3 API for directory listings. In order for -directory listings to work, you need to make an R2 API key for your `dist-prod` -bucket and provide it to the worker. +The Release Worker uses R2's S3 API for directory listings. +In order for directory listings to work, you need to make an R2 API key for your `dist-prod` bucket and provide it to the worker. -Generating the API key can be done through the Cloudflare dashboard -[here](https://dash.cloudflare.com/?account=/r2/api-tokens). +Generating the API key can be done through the Cloudflare dashboard [here](https://dash.cloudflare.com/?account=/r2/api-tokens). Then, make a `.dev.vars` file in the root of this repository with the following: @@ -39,5 +35,4 @@ S3_ACCESS_KEY_SECRET= ### 4. Run the worker -Start the worker locally with `npm start`. You may be prompted to log into -your Cloudflare account. +Start the worker locally with `npm start`. You may be prompted to log into your Cloudflare account. diff --git a/docs/r2.md b/docs/r2.md index b1c82c1..932d73a 100644 --- a/docs/r2.md +++ b/docs/r2.md @@ -2,8 +2,8 @@ ## What is it? -[R2](https://developers.cloudflare.com/r2/) is Cloudflare's blob storage -provider. +[R2](https://developers.cloudflare.com/r2/) is Cloudflare's blob storage provider. +We use it to store all of the release assets stored by the Release Worker. ## Noteworthy points @@ -12,31 +12,23 @@ provider. R2 stores files flatly, meaning a directory does not exist in R2. However, R2 allows characters such as slashes (/) in an object's name. -For directories we can then specify a prefix (like `nodejs/release/`) and R2 -will only return objects that has a name that starts with that prefix. - -## How are we using it? - -R2 holds the entire contents of the release assets served by the worker. +For directories we can then specify a prefix (like `nodejs/release/`) and R2 will only return objects that has a name that starts with that prefix. ### Bindings API -R2 allows integration with Workers through their [bindings API](https://developers.cloudflare.com/r2/api/workers/workers-api-usage/). We use this when fetching files. +R2 allows integration with Workers through their [bindings API](https://developers.cloudflare.com/r2/api/workers/workers-api-usage/). +We use this when fetching files. ### S3 API -Due to some performance issues we were seeing with R2's `list` binding command, -we opted to use R2's S3 API for listing directories. +Due to some performance issues we were seeing with R2's `list` binding command, we opted to use R2's S3 API for listing directories. ### Buckets We have two R2 buckets: -- `dist-staging` - Holds staged releases. This bucket is private and should not - be publicly accessible. +- `dist-staging` - Holds staged releases. This bucket is private and should not be publicly accessible. -- `dist-prod` - Holds released versions of Node.js. Everything in this bucket - should be considered publicly accessible. +- `dist-prod` - Holds released versions of Node.js. Everything in this bucket should be considered publicly accessible. -(see [Release Process](./release-process.md) for more information on how we use -these buckets) +(see [Release Process](./release-process.md) for more information on how we use these buckets) diff --git a/docs/release-process.md b/docs/release-process.md index 1ed1dfd..b6d4142 100644 --- a/docs/release-process.md +++ b/docs/release-process.md @@ -1,11 +1,10 @@ # Release Process -Documentation on the general order of events that happen when releasing a new -version of Node.js +Documentation on the general order of events that happen when releasing a new version of Node.js > [!NOTE] -> This focuses on the flow of release assets (binaries, doc files). This may -> not include the full process for releases (i.e. getting necessary approvals). +> This focuses on the flow of release assets (binaries, doc files). +> This may not include the full process for releases (i.e. getting necessary approvals). ## Release types @@ -40,22 +39,17 @@ These branches no longer receive new releases. New builds are scheduled on the release CI (https://ci-release.nodejs.org). These builds compile Node.js on the various platforms and compile the docs. -Upon a build completing successfully, the build's output (binaries, doc files) -will then be uploaded to the origin server and the `dist-staging` bucket in -Node.js' Cloudflare account. +Upon a build completing successfully, the build's output (binaries, doc files) will then be uploaded to the origin server and the `dist-staging` bucket in Node.js' Cloudflare account. -The release assets synced to the origin server are under -`/home/staging/nodejs/` path. The release assets synced to the -`dist-staging` bucket are under the `/nodejs/` [_prefix_](./r2.md#directories). +The release assets synced to the origin server are under `/home/staging/nodejs/` path. +The release assets synced to the `dist-staging` bucket are under the `/nodejs/` [_prefix_](./r2.md#directories). ### 2. Release promotion -When a release is ready to be released, it is promoted. For mainline releases, -this is done by the releaser running the [`release.sh`](https://github.com/nodejs/node/tree/main/tools/release.sh) -script in the Node.js repository. For nightly releases, this is done once a day. +When a release is ready to be released, it is promoted. +For mainline releases, this is done by the releaser running the [`release.sh`](https://github.com/nodejs/node/tree/main/tools/release.sh) script in the Node.js repository. +For nightly releases, this is done once a day by [automated tooling](https://github.com/nodejs/build/blob/main/ansible/www-standalone/tools/promote/promote_nightly.sh). -On the origin server, the release's assets are copied from -`/home/staging/nodejs/` to `/home/dist/nodejs/`. +On the origin server, the release's assets are copied from `/home/staging/nodejs/` to `/home/dist/nodejs/`. -For R2, the release's assets are copied from the `dist-staging` bucket to the -`dist-prod` bucket. +For R2, the release's assets are copied from the `dist-staging` bucket to the `dist-prod` bucket. diff --git a/docs/sops/README.md b/docs/sops/README.md index 11e1f49..a16a41b 100644 --- a/docs/sops/README.md +++ b/docs/sops/README.md @@ -1,3 +1,9 @@ # Standard Operating Procedures Documents detailing standardized processes for the Release Worker. + +## Table of Contents + +- [Incident Flow](./incident-flow.md) +- [Rolling Back a Release](./rolling-back-a-release.md) +- [Switching between the Worker and Origin Server](./switch-between-worker-and-origin.md) diff --git a/docs/sops/incident-flow.md b/docs/sops/incident-flow.md index ce61dfe..6560da1 100644 --- a/docs/sops/incident-flow.md +++ b/docs/sops/incident-flow.md @@ -7,27 +7,22 @@ Procedure for what to do if there's an incident with the Release Worker. 1. If the incident was caused by a recent change, try [rollbacking the release](./rolling-back-a-release.md). -2. If the incident affects traffic towards the Release Worker, update the - Node.js status page (https://status.nodejs.org). If it is a ongoing security - incident that we cannot disclose publicly yet, do not includes the details - of the incident in the status page. +2. If the incident affects traffic towards the Release Worker, update the Node.js status page (https://status.nodejs.org). + If it is a ongoing security incident that we cannot disclose publicly yet, do not includes the details of the incident in the status page. - Optional, but preferably updates will be echoed on social media. - - Please also monitor any issues in repositories such as - this one, - [nodejs/node](https://github.com/nodejs/node), and - [nodejs/nodejs.org](https://github.com/nodejs/nodejs.org) for users asking - about the incident and link them to the status page. + - For any prolonged incidents, please consider pinning an issue tracking the incident so as to avoid spam. + + - Please also monitor any issues in repositories such as this one, + [nodejs/node](https://github.com/nodejs/node), + and [nodejs/nodejs.org](https://github.com/nodejs/nodejs.org) + for users asking about the incident and link them to the status page. 3. [Steps for debugging the worker when it's deployed](../debugging.md) -4. If there is an ongoing security incident requiring code changes, a force - push to the `main` branch can be performed by a - [Collaborator](../CONTRIBUTING.md#contributing) if there is reasonable risk - that opening a PR with the change would allow more bad actors to exploit the - vulnerability. The code changes must still be approved by another - Collaborator before the force push is performed, however. +4. If there is an ongoing security incident requiring code changes, a force push to the `main` branch can be performed by a [Collaborator](../CONTRIBUTING.md#contributing) if there is reasonable risk that opening a PR with the change would allow more bad actors to exploit the vulnerability. + The code changes must still be approved by another Collaborator before the force push is performed, however. 5. If the issue requires support from Cloudflare, try reaching out through the `ext-nodejs-cloudflare` channel in the OpenJS Slack. @@ -38,16 +33,11 @@ Procedure for what to do if there's an incident with the Release Worker. ## What qualifies an an incident? -There is no exact criteria, however, these cases will most likely call for an -incident to be declared: +There is no exact criteria, however, these cases will most likely call for an incident to be declared: -1. The production deployment of the Release Worker is unavailable to the public - or is otherwise operating in a way that impacts users' abilities to interact - with it en masse. This includes behaviors that we are responsible for and - those that Cloudflare is responsible for. +1. The production deployment of the Release Worker is unavailable to the public or is otherwise operating in a way that impacts users' abilities to interact with it en masse. + This includes behaviors that we are responsible for and those that Cloudflare is responsible for. -2. There is a ongoing security issue that involves the production deployment of - the Release Worker. +2. There is a ongoing security issue that involves the production deployment of the Release Worker. -Note the Node.js Web Infrastructure, Build, and TSC teams can declare an -incident wherever they see fit, however. +Note the Node.js Web Infrastructure, Build, and TSC teams can declare an incident wherever they see fit, however. diff --git a/docs/sops/rolling-back-a-release.md b/docs/sops/rolling-back-a-release.md index 97bf11f..256ddc8 100644 --- a/docs/sops/rolling-back-a-release.md +++ b/docs/sops/rolling-back-a-release.md @@ -1,9 +1,10 @@ # Rolling Back A Release > [!WARNING] -> Rolling back a release should only be done when necessary, such as a -> quick-fix for an on-going incident, and by a [Collaborator](../CONTRIBUTING.md#contributing). -> The Web Infrastructure team should be made aware each time this happens. +> Rolling back a release should only be done when necessary, +> such as a quick-fix for an on-going incident, +> and by a [Collaborator](../CONTRIBUTING.md#contributing). +> The Web Infrastructure team should be aware each time this happens. ## Option A: via Github Actions @@ -17,10 +18,8 @@ This is the preferred way, but takes a little bit longer. 4. Merge PR & Deploy it -If the rollback is prompted by an incident where the worker is entirely -unavailable (i.e. all requests failing) or there is a security vulnerability -present, a Collaborator may forcibly push the commit reverting the release onto -the `main` branch. +If the rollback is prompted by an incident where the worker is entirely unavailable (i.e. all requests failing) or there is a security vulnerability present, +a Collaborator may forcibly push the commit reverting the release onto the `main` branch. ## Option B: via Cloudflare Dash diff --git a/docs/sops/switch-between-worker-and-origin.md b/docs/sops/switch-between-worker-and-origin.md index a0f87b5..3fdafeb 100644 --- a/docs/sops/switch-between-worker-and-origin.md +++ b/docs/sops/switch-between-worker-and-origin.md @@ -1,14 +1,16 @@ # Switching Between The Worker and The Origin Server -Steps for toggling server production traffic between the Release Worker and -origin server. +Steps for toggling server production traffic between the Release Worker and origin server. + +This is most relevant during incidents involving the Release Worker. ## Option A. Worker Routes You need write access to Node.js' Cloudflare account for this option. > [!NOTE] -> This assumes the Cloudflare config for the origin server has remained in-tact and is still production ready. +> This assumes the Cloudflare config for the origin server has remained in-tact +> and is still production ready. ### Steps @@ -20,7 +22,6 @@ You need write access to Node.js' Cloudflare account for this option. - Go to [src/routes/index.ts](../../src/routes/index.ts). -- Order the `R2Middleware`'s and `OriginMiddleware`'s to reflect the correct - order that they should be invoked in. For example, prioritizing the origin - server over R2 means the `OriginMiddleware` should appear before the - `R2Middleware`, and vice-versa for prioritizing R2. +- Order the `R2Middleware`'s and `OriginMiddleware`'s to reflect the correct order that they should be invoked in. + For example prioritizing the origin server over R2 means the `OriginMiddleware` should appear before the `R2Middleware`. + The opposite is the same for prioritizing R2.