Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNTOR-2253: update release process doc #5084

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
309 changes: 98 additions & 211 deletions docs/release_process.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,11 @@

## Environments

- [Production][prod] - Run by SRE team in GCP
- [Stage][stage] - Run by SRE team in GCP
- [Dev][dev] - Run by ENGR team in Heroku
- [Production][prod] - Run manually by ENGR team (unless we need Env Var changes)
- [Stage][stage] - Run automatically on PR merges
- Locals: Run by ENGRs on their own devices. (See [README][readme] and other [`docs/`][docs].)

## Code branches
## Development

Standard Monitor development follows a branching strategy similar to
[GitHub Flow][github-flow], where all branches stem directly from `main` and
Expand All @@ -17,7 +16,7 @@ are merged back to `main`:
2. Make changes
3. Create a pull request to `main`
4. Address review comments
5. Merge pull request
5. Merge the pull request

```mermaid
%%{init: { 'theme': 'base', 'gitGraph': {'rotateCommitLabel': true} } }%%
Expand Down Expand Up @@ -58,14 +57,11 @@ merge back to `main` when they are ready.

## Release Timeline

The standard release interval for Monitor is 1 week, meaning every week there
will be a new version of the Monitor web app on the [Production][prod]
environment. To do this, we first release code to [Dev][dev] and
[Stage][stage].
The standard release interval for Monitor is one week, meaning there should be at least one new version of the Monitor web app on the [Production][prod] environment. However, since we've started doing 1-click deploy to production, our release cycle can become more and more frequent.

## Preview Deployment

Every time a PR is open, a docker image is created and deployed to the
Every time a PR is opened, a docker image is created and deployed to the
preview deployment environment powered by GCP Cloud Run / CloudSQL.
A brand new database is created and schema is migrated specific to that PR.
A brand new Cloud Run service is set up and cleaned up along with the
Expand All @@ -75,224 +71,115 @@ set up and changes are ready to be reviewed.

## Release to Stage

Every commit to `main` is automatically deployed to the [Stage][stage] server
via Github Actions and Jenkins.

### Create Release Notes on GitHub

Every commit to `main` is automatically deployed to the [Stage][stage] server via Github Actions and Jenkins.

### PR Merges
PRs can only be merged when they passes all the required checks:
* Lint
* Build
* Unit Tests
* E2E Tests
* Deploy Previews

A PR also needs at least one approval from an ENGR team member to be merged into `main`.

Once a PR is successfully merged:
1. ensure that the merge commit in `main` branch passes all checks and a docker image is successfully deployed.
2. Jenkins will kick off the deployment of the latest built docker image to stage environment
3. A webhook will send status messages into the `#fx-monitor-engineering` channel.
* Watch for messages: `FX MONITOR STAGE STARTED` and `FX MONITOR STAGE COMPLETE`

## Release to Production
Before releasing to production, we need to assess the current state of our work on stage. We need to cross-reference what's already on stage and what's been greenlit by QA. To do this, we need to find the difference between what was released last time in production and what we currently have on stage.

### Check the diff in Release Notes and notify the team
[make a release on GitHub][github-new-release] in order to check the difference.
1. Choose the tag you want for your release. Use today's date (e.g., `2024.09.01`)
2. Type the same tag name for the release title (e.g., `2024.09.01`)
3. Click the "Generate release notes" button!
4. *DO NOT* press "Publish release" yet
5. Copy and Paste the release notes in the engineering slack channel so the team is aware
6. Go through the PRs, cross-reference the tickets in the PRs with the [Jira][jira] board to see if QA has approved the tickets. If anything is unclear, make sure to tag the author of the PR.
7. If anything has not been properly tested, make a note, and again, double check with the person
8. If everything looks good, proceed to release, otherwise refer to the section `Stage-fixes` below.

### Update Production Environment Variables
In the cases where we need to update or add new environment variables, we need to get help from SRE:
1. File an [SRE ticket][sre-board] for the env var change.
- In the title, make sure to mention "Production"
- Make sure to include the value and the correct variable name
- Make sure to specify if it's a `secret` or a regular variable
2. When appropriate, wait for SRE to make the changes before proceeding with the production release.

### 1-click Production Release
After you push the tag to GitHub, you should also
[make a pre-release on GitHub][github-new-release] for the tag.

1. Choose the tag you just pushed (e.g., `2022.08.02`)
2. Type the same tag name for the release title (e.g., `2022.08.02`)
3. Click "Previous tag:" and choose the tag currently on production.
- You can find this at [the `__version__` endpoint][prod-version].
4. Click the "Generate release notes" button!
5. Check the pre-release box.
6. Click "Publish release"
[make a release on GitHub][github-new-release] for the tag.

1. After all the checks above look good, click "Publish release"
2. Go to the `main` branch and make sure all the checks succeeded
3. Go to [DockerHub][dockerhub] to ensure that a tag with today's date is present.
4. Run [E2E cron][e2e] against stage (with the latest update)
* if there are errors, make sure the cause is understood
* fix the e2e errors or change the tests when appropriate before proceeding
6. Check the stage Sentry and GCP error logs
7. Run [1-Click Deploy Github Action][1-click deploy]
* Click on `Run workflow`
* `Branch:main` is selected
* `prod` is selected for environment
* Input the tag created earlier (today's date, e.g., `2024.09.01`)
* Click on `Run workflow` when ready
8. A webhook will send status messages into the `#fx-monitor-engineering` channel.
* Watch for messages: `pushing to production started` and `successfully deployed to production`
9. After successful deploy, conduct some basic sanity check:
* Check sentry prod project for a spike in any new issues
* Check [grafana dashboard][grafana-dashboard] for any unexpected spike in ops
* Spot-check the site for basic functionality

### Update Jira

On [our Jira board](https://mozilla-hub.atlassian.net/jira/software/c/projects/MNTOR/boards/447), take a look at the tickets listed under "Merged tot main". If those were included in the release you just created, drag those tickets to the "Done" column, in the "Merged" section (sections will appear as you start dragging). This will notify QA that they can verify the behaviour on stage.

If you're unsure whether a ticket was included in the release, ask the person it is assigned to to move it if needed.

## Release to Prod

We leave the tag on [Stage][stage] for a week so that we (and especially QA)
can check the tag on GCP infrastucture before we deploy it to production. To
deploy the tag to production:

1. Run [e2e] tests against the stage
- Ensure the tests are successful / Resolve any issues before moving to the next step
2. File an [SRE ticket][sre-board] to deploy the tag to [Prod][prod].
- Include a link to the GitHub Release
- Include a link to successful e2e tests
- You can assign it directly to our primary SRE for the day
3. When SRE starts the deploy, "cloudops-jenkins" will send status messages
into the #fx-monitor-engineering channel.
4. When you see `PROMOTE PROD COMPLETE`, do some checks on prod:
- Check sentry prod project for a spike in any new issues
- Check [grafana dashboard][grafana-dashboard] for any unexpected spike in ops
- Spot-check the site for basic functionality
- Ping SDET to run end-to-end tests on prod
5. Update the GitHub Release from "pre-release" to a full release and reference the production deploy SRE Jira ticket.
On our [Jira][jira] board, review the tickets listed under "Merged to main." If those were included in the release you just created, drag those tickets to either the "Promoted to Prod" or "Done" column. This will notify QA that they can verify the behavior on Prod if necessary.

If you're unsure whether a ticket was included in the release, ask the assigned person to move it if needed.
## Stage-fixes

Ideally, every change can ride the regular weekly release "trains". But
sometimes we need to make and release changes before the regularly scheduled
release.

### "Clean `main`" flow

If a bug is caught on [Stage][stage] in a tag that is scheduled to go to
[Prod][prod], we need to fix the bug before the scheduled prod deploy. If
`main` is "clean" - i.e., nothing else has merged yet, we can use the regular
GitHub Flow:

1. Create a stage-fix branch from the tag. E.g.:
- `git branch stage-fix-2022.08.02 2022.08.02`
- `git switch stage-fix-2022.08.02`
2. Make changes
3. Create a pull request to `main`
4. Address review comments
5. Merge pull request
6. Make and push a new tag. E.g.: `2022.08.02.01`

```mermaid
%%{init: { 'theme': 'base' } }%%
gitGraph
commit
branch change-1
commit
commit
checkout main
branch change-2
commit
checkout main
merge change-1
branch change-3
commit
commit
checkout main
merge change-2 tag: "2022.08.02"
checkout change-3
commit
commit
checkout main
branch stage-fix-2022.08.02
commit
checkout main
merge stage-fix-2022.08.02 tag: "2022.08.02.01"
```

### "Dirty `main`" flow

If a bug is caught on [Stage][stage] in a tag that is scheduled to go to
[Prod][prod], we need to fix the bug before the scheduled prod deploy. If
`main` is "dirty" - i.e., other changes have merged, we can make the new tag
from the stage-fix branch.

1. Create a stage-fix branch from the tag. E.g.:
- `git branch stage-fix-2022.08.02 2022.08.02`
- `git switch stage-fix-2022.08.02`
2. Make changes
3. Create a pull request to `main`
4. Address review comments
5. Merge pull request
6. Make and push a new tag _from the `stage-fix` branch_

```mermaid
%%{init: { 'theme': 'base' } }%%
gitGraph
commit
branch change-1
commit
commit
checkout main
branch change-2
commit
checkout main
merge change-1
branch change-3
commit
commit
checkout main
merge change-2 tag: "2022.08.02"
checkout change-3
commit
commit
checkout main
branch stage-fix-2022.08.02
commit tag: "2022.08.02.01"
checkout main
merge change-3
merge stage-fix-2022.08.02
```

### Creating GitHub Release Notes for stage-fix release

Whether you make a "clean" or "dirty" stage-fix, after you push the new tag to
GitHub, you should [make a pre-release on GitHub][github-new-release] for the
new release tag.

1. Choose the tag you just pushed (e.g., `2022.08.02.01`)
2. Type the same tag name for the releae title (e.g., `2022.08.02.01`)
3. Click "Previous tag:" and choose the previous tag. (e.g., `2022.08.02`)
4. Click the "Generate release notes" button!
5. Check the pre-release box.
6. Click "Publish release"

## Example of regular release + "clean" stage-fix release + regular release

```mermaid
%%{init: { 'theme': 'base' } }%%
gitGraph
commit
branch change-1
commit
commit
checkout main
branch change-2
commit
checkout main
merge change-1
branch change-3
commit
commit
checkout main
merge change-2 tag: "2022.08.02"
checkout change-3
commit
commit
checkout main
branch stage-fix-1
commit
checkout main
merge stage-fix-1 tag: "2022.08.02.01"
checkout main
merge change-3
branch change-4
commit
checkout main
branch change-5
commit
commit
commit
checkout main
merge change-4
merge change-5 tag: "2022.08.09"
```

Ideally, every change can ride the regular weekly release "trains". But sometimes, not everything in `main` can go out. Since we've adopted feature flags, these scenarios are becoming rarer. However, we still cannot guarantee that they never happen.

Wherever feature flags aren't applicable, there are generally two scenarios we need to consider:
1. If the diff in changes is minimal (eg. can be traced back to a PR or two), the easiest way is to revert
2. If the diff is not minimal, or a significant portion of the tickets haven't been QA'd:
* we can choose to delay the release (ask the team for consensus)
* we can create a separate release branch

### Revert
1. Revert the PR(s)
2. Create a Github [Release][github-new-release]
3. Revert the revert after production deployment is successful
* After the revert of revert is successfully merged into `main`, stage should be automatically put back to the state before Production release

### Separate release branch
1. Create a branch on top of `main`
2. Work on taking out the features that should not be included (not feature-flagged)
3. Create a Github [Release][github-new-release]
* In the release, make sure to pick your branch (`main` is default)
* Generate the release note, double check and make sure that it makes sense
4. Proceed with the production release


## Future

Since the "clean main" flow is simpler, we are working towards a release
process where `main` is _always_ clean - even if changes have been merged to
it. To keep `main` clean, we will need to make use of feature-flags to
effectively hide any changes that are not ready for production. See the
[feature flags][feature-flags] docs for more.

When we are confident that `main` can always be released, we may get rid of
release tags completely, and move to something more like a GitLab Flow where we
merge from `main` to long-running branches for `dev`, `stage`, `pre-prod`, and
`prod` environments.
After adding 1-click production deploy capability and broadly adopting [feature flags][feature-flags], we are looking into ways to increase our production release frequency. The main challenge here is to coordiate our QA effort with our latest stage CICD deployments.

![Future release process with long-running
branches](release-process-future-long-branches.png "Future release process with
long-running branches")
We are starting to look into creating daily GitHub pre-releases via GHA, and once QA'd, having these deployed automatically or manually by base load engineers.

[prod]: https://monitor.firefox.com/
[stage]: https://stage.firefoxmonitor.nonprod.cloudops.mozgcp.net/
[dev]: https://fx-breach-alerts.herokuapp.com/
[readme]: https://github.com/mozilla/blurts-server/blob/main/README.md
[docs]: https://github.com/mozilla/blurts-server/tree/main/docs
[github-flow]: https://docs.github.com/en/get-started/quickstart/github-flow
[github-heroku-incident]: https://blog.heroku.com/april-2022-incident-review
[service-account]: https://mana.mozilla.org/wiki/display/TS/List+of+Heroku+service+accounts
[calver]: https://calver.org/
[sre-board]: https://mozilla-hub.atlassian.net/jira/software/c/projects/SVCSE/boards/316
[github-new-release]: https://github.com/mozilla/blurts-server/releases/new
[prod-version]: https://monitor.firefox.com/__version__
[grafana-dashboard]: https://earthangel-b40313e5.influxcloud.net/d/dEpkGp4Wz/fx-monitor?orgId=1&from=now-7d&to=now
[e2e]: https://github.com/mozilla/blurts-server/actions/workflows/e2e_cron.yml
[jira]: https://mozilla-hub.atlassian.net/jira/software/c/projects/MNTOR/boards/447
[dockerhub]: https://hub.docker.com/r/mozilla/blurts-server/tags
[1-click deploy]: https://github.com/mozilla/blurts-server/actions/workflows/production_deploy.yml
Loading