Skip to content

Commit

Permalink
ADR for deployment slots (#149)
Browse files Browse the repository at this point in the history
* Update ADR template based on changes in TI; write ADR for deployment slots; clarify sticky settings
* Add more info about decision process

---------

Co-authored-by: Samuel Aquino <saquino@flexion.us>
Co-authored-by: jherrflexion <118225331+jherrflexion@users.noreply.github.com>
Co-authored-by: jcrichlake <145698165+jcrichlake@users.noreply.github.com>
  • Loading branch information
4 people authored Sep 4, 2024
1 parent 5c6cc50 commit f02320d
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 3 deletions.
27 changes: 25 additions & 2 deletions adr/001-architecture-decision-records.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 1. Architecture Decision Records

Date: 2024-05-14
Date: 2024-05-14, updated 2024-09-03

## Decision

Expand All @@ -21,5 +21,28 @@ We want to record our architectural decisions so that...
- New team members who join can see why we made the decisions we made.
- The team can revise or revisit decisions with more confidence and context.

### Related Issues
## Impact

_The outcomes of the decision, both positive and negative. This section explains the impact of the decision, such as trade-offs, risks, and what needs to be done to implement it._

### Positive

- **Transparency**: ADRs make decision-making more transparent, helping current and future team members understand the rationale behind decisions.
- **Historical Context**: They provide valuable historical context, aiding in future decision-making and avoiding repeated mistakes.
- **Onboarding**: ADRs speed up the onboarding process by quickly familiarizing new team members with architectural decisions.
- **Consistency**: A standardized format ensures consistent documentation, making records easier to maintain and reference.

### Negative

- **Overhead**: Maintaining ADRs requires time and effort.
- **Outdated Records**: If not regularly updated, ADRs can become outdated and misleading.

### Risks

- **Incomplete Documentation**: Not all decisions may be documented, leading to gaps in the record.
- **Misalignment**: ADRs may not always match the actual implementation, causing confusion.

## Related Issues

- #1
- #13
52 changes: 52 additions & 0 deletions adr/010-deployment-slots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# 10. Deployment Slots for Zero Downtime Deploys for the Web App

Date: 2024-09-03

## Decision
1. We will use Azure Web App Deployment Slots to facilitate zero-downtime deploys of the SFTP Ingestion Service web app.
2. Because the ingestion service is queue-driven and in order to keep both the pre-live and production slots healthy,
we will use `sticky_settings` to keep queue configuration on only the production slot in each environment.

## Status

Accepted.

## Context
1. Even though the Ingestion Service's queue-driven workflow is resilient to small downtimes, implementing zero-downtime
deploys is a standard best practice. Using Azure Deployment Slots also lets us have fast and easy rollbacks in addition to
zero-downtime deployment, and is consistent with the workflow we're using in TI.
2. Because the Ingestion Service is queue-driven, turning 'off' the pre-live slot (which routes http traffic) doesn't
stop it from reading queues. To prevent actions from being duplicated, we're keeping queue configuration settings only
on the production/live slot, which will leave the pre-live slot running and healthy, but not active.

Even though there are some significant downsides to Deployment Slots, they're Azure's recommended
approach to zero-downtime deploys (ZDD), and they're lower effort and lower risk than the alternatives.
Other options to achieve ZDD are Kubernetes (significantly more complexity and effort), creating
our own custom deploy system (significantly more complexity, effort, and risk), or switching to
a cloud service provider that makes this easier, like AWS (not currently in scope as an option).

## Impact
### Positive
- **Zero-downtime deploys**: Zero-downtime deploys are a best practice.
- **Easy rollback**: Deployment slots make it easy to roll back to the previous version of the
app if we find errors after deploy.
- **Consistency**: Deployment Slots are an Azure feature specifically designed to enable
zero-down time deployment. We use deployment slots in all ingestion service environments and
in the Trusted Intermediary web app.

### Negative
- **Incomplete support for Linux**: The auto-swap feature is not available for Linux-based web apps like ours.
so we had to include an explicit swapping step in our updated deployment process.
- **Opaque responses from `az webapp deployment slot swap` CLI**: When there are issues swapping slots, the CLI doesn't
return any details about the issue. The swapping operation can also take as much as 20 minutes
to time out if there's a silent failure, which slows down deploy and validation.
- **Steep learning curve**: Most of the official docs and unofficial resources
(such as blogs and tutorials) for deployment slots are written for people using Windows
servers and Microsoft-published programing languages. This lack of support for other platforms
and languages means a lot more trial and error is involved.

### Risks
- Because of the incomplete support for and documentation of our usecase, we may not have
chosen the optimal implementation of this feature. It may also be time-consuming to
troubleshoot if we run into future issues.
- Future developers may be confused by which settings should be `sticky` and which should not.
3 changes: 2 additions & 1 deletion operations/template/app.tf
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,8 @@ resource "azurerm_linux_web_app" "sftp" {
}

# When adding new settings that are needed for the live app but shouldn't be used in the pre-live
# slot, add them to `sticky_settings` as well as `app_settings` for the main app resource
# slot, add them to `sticky_settings` as well as `app_settings` for the main app resource.
# All queue-related settings should be `sticky` so that the pre-live slot does not send or consume messages.
app_settings = {
DOCKER_REGISTRY_SERVER_URL = "https://${azurerm_container_registry.registry.login_server}"
WEBSITES_PORT = 8080
Expand Down

0 comments on commit f02320d

Please sign in to comment.