Skip to content

Commit

Permalink
Merge pull request #503 from GSA/main
Browse files Browse the repository at this point in the history
Production deploy 9/25/23
  • Loading branch information
ccostino authored Sep 25, 2023
2 parents da0734b + cc5b114 commit 5e53f06
Show file tree
Hide file tree
Showing 7 changed files with 233 additions and 102 deletions.
84 changes: 76 additions & 8 deletions .github/workflows/adr-accepted.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,86 @@
name: ADR accepted

on:
issues:
types:
- closed

permissions:
contents: read

jobs:
main:
name: ADR accepted
accept:
runs-on: ubuntu-latest

steps:
- name: memorialize the ADR
uses: 18F/adr-automation/accepted@actioning

- name: check for tags
if: "${{ !contains(github.event.issue.labels.*.name, 'ADR: accepted' )}}"
shell: bash
run: exit 0

- name: checkout main branch
uses: actions/checkout@v3
with:
ref: main
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
label: "ADR: accepted"
path: docs/adrs
ssh-key: ${{ secrets.SSH_PRIVATE_KEY }}

- name: get ADR number
id: next
shell: bash
run: |
mkdir -p docs/adrs
LAST_ADR=$(ls docs/adrs/*.md | grep -Eo "/[0-9]+-" | sort | tail -n1 | grep -Eo "[0-9]+")
LAST_ADR=$(echo "$LAST_ADR" | sed -E 's/^0+//')
NEXT_ADR=$(($LAST_ADR + 1))
NEXT_ADR=$(printf "%04i" "$NEXT_ADR")
echo "number=$NEXT_ADR" >> "$GITHUB_OUTPUT"
- name: get date
id: date
shell: bash
run: echo "date=$(date +'%B %d, %Y')" >> "$GITHUB_OUTPUT"

- name: build filename
id: filename
shell: bash
run: |
SLUG=$(printf '%q\n' "${{ github.event.issue.title }}" | tr A-Z a-z)
SLUG=$(printf '%q\n' "$SLUG" | iconv -c -t ascii//TRANSLIT)
SLUG=$(printf '%q\n' "$SLUG" | sed -E 's/[^a-z0-9]+/-/g' | sed -E 's/-+/-/g' | sed -E 's/^-+|-+$//g')
FILENAME="docs/adrs/${{ steps.next.outputs.number }}-$SLUG.md"
echo "slug=$SLUG" >> "$GITHUB_OUTPUT"
echo "filename=$FILENAME" >> "$GITHUB_OUTPUT"
- name: write the ADR
uses: DamianReeves/write-file-action@v1.2
with:
path: ${{ steps.filename.outputs.filename }}
write-mode: overwrite
contents: |
# ${{ github.event.issue.title }}
Status: Accepted
Date: ${{ steps.date.outputs.date }}
${{ github.event.issue.body }}
- name: branch, commit, and open PR
shell: bash
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
BRANCH="adr/auto/${{ steps.filename.outputs.slug }}"
git config --global user.email "tts@gsa.gov"
git config --global user.name "Notify ADR Automation"
git checkout -b $BRANCH
git add docs/adrs/*.md
git commit -m "add ADR ${{ steps.next.outputs.number }}: ${{ github.event.issue.title }}"
git push -f origin $BRANCH
gh pr create \
--title "Add ADR ${{ steps.next.outputs.number }} to the repo" \
--body "This pull request was opened automatically because #${{ github.event.issue.number }} was closed after being marked as an approved ADR. It contains a markdown file capturing the ADR body at the time the issue was closed. Please verify that the markdown is correct before merging!" || true
gh pr merge $BRANCH --auto --squash || true
96 changes: 49 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,53 +100,55 @@ A direct installation of PostgreSQL will not put the `createdb` command on your

## Documentation

- [Infrastructure overview](#infrastructure-overview)
- [GitHub Repositories](#github-repositories)
- [Terraform](#terraform)
- [AWS](#aws)
- [New Relic](#new-relic)
- [Onboarding](#onboarding)
- [Setting up the infrastructure](#setting-up-the-infrastructure)
- [Testing](#testing)
- [CI testing](#ci-testing)
- [Manual testing](#manual-testing)
- [To run a local OWASP scan](#to-run-a-local-owasp-scan)
- [Deploying](#deploying)
- [Egress Proxy](#egress-proxy)
- [Sandbox environment](#sandbox-environment)
- [Database management](#database-management)
- [Initial state](#initial-state)
- [Data Model Diagram](#data-model-diagram)
- [Migrations](#migrations)
- [Purging user data](#purging-user-data)
- [One-off tasks](#one-off-tasks)
- [How messages are queued and sent](#how-messages-are-queued-and-sent)
- [Writing public APIs](#writing-public-apis)
- [Overview](#overview)
- [Documenting APIs](#documenting-apis)
- [New APIs](#new-apis)
- [API Usage](#api-usage)
- [Connecting to the API](#connecting-to-the-api)
- [Postman Documentation](#postman-documentation)
- [Using OpenAPI documentation](#using-openapi-documentation)
- [Queues and tasks](#queues-and-tasks)
- [Priority queue](#priority-queue)
- [Celery scheduled tasks](#celery-scheduled-tasks)
- [US Notify](#us-notify)
- [System Description](#system-description)
- [Run Book](#run-book)
- [ Alerts, Notifications, Monitoring](#-alerts-notifications-monitoring)
- [ Restaging Apps](#-restaging-apps)
- [ Smoke-testing the App](#-smoke-testing-the-app)
- [ Configuration Management](#-configuration-management)
- [ DNS Changes](#-dns-changes)
- [Exporting test results for compliance monitoring](#exporting-test-results-for-compliance-monitoring)
- [ Known Gotchas](#-known-gotchas)
- [ User Account Management](#-user-account-management)
- [ SMS Phone Number Management](#-sms-phone-number-management)
- [Data Storage Policies \& Procedures](#data-storage-policies--procedures)
- [Potential PII Locations](#potential-pii-locations)
- [Data Retention Policy](#data-retention-policy)
- [Infrastructure overview](./docs/all.md#infrastructure-overview)
- [GitHub Repositories](./docs/all.md#github-repositories)
- [Terraform](./docs/all.md#terraform)
- [AWS](./docs/all.md#aws)
- [New Relic](./docs/all.md#new-relic)
- [Onboarding](./docs/all.md#onboarding)
- [Setting up the infrastructure](./docs/all.md#setting-up-the-infrastructure)
- [Using the logs](./docs/all.md#using-the-logs)
- [Testing](./docs/all.md#testing)
- [CI testing](./docs/all.md#ci-testing)
- [Manual testing](./docs/all.md#manual-testing)
- [To run a local OWASP scan](./docs/all.md#to-run-a-local-owasp-scan)
- [Deploying](./docs/all.md#deploying)
- [Egress Proxy](./docs/all.md#egress-proxy)
- [Managing environment variables](./docs/all.md#managing-environment-variables)
- [Sandbox environment](./docs/all.md#sandbox-environment)
- [Database management](./docs/all.md#database-management)
- [Initial state](./docs/all.md#initial-state)
- [Data Model Diagram](./docs/all.md#data-model-diagram)
- [Migrations](./docs/all.md#migrations)
- [Purging user data](./docs/all.md#purging-user-data)
- [One-off tasks](./docs/all.md#one-off-tasks)
- [How messages are queued and sent](./docs/all.md#how-messages-are-queued-and-sent)
- [Writing public APIs](./docs/all.md#writing-public-apis)
- [Overview](./docs/all.md#overview)
- [Documenting APIs](./docs/all.md#documenting-apis)
- [New APIs](./docs/all.md#new-apis)
- [API Usage](./docs/all.md#api-usage)
- [Connecting to the API](./docs/all.md#connecting-to-the-api)
- [Postman Documentation](./docs/all.md#postman-documentation)
- [Using OpenAPI documentation](./docs/all.md#using-openapi-documentation)
- [Queues and tasks](./docs/all.md#queues-and-tasks)
- [Priority queue](./docs/all.md#priority-queue)
- [Celery scheduled tasks](./docs/all.md#celery-scheduled-tasks)
- [US Notify](./docs/all.md#us-notify)
- [System Description](./docs/all.md#system-description)
- [Run Book](./docs/all.md#run-book)
- [ Alerts, Notifications, Monitoring](./docs/all.md#-alerts-notifications-monitoring)
- [ Restaging Apps](./docs/all.md#-restaging-apps)
- [ Smoke-testing the App](./docs/all.md#-smoke-testing-the-app)
- [ Configuration Management](./docs/all.md#-configuration-management)
- [ DNS Changes](./docs/all.md#-dns-changes)
- [Exporting test results for compliance monitoring](./docs/all.md#exporting-test-results-for-compliance-monitoring)
- [ Known Gotchas](./docs/all.md#-known-gotchas)
- [ User Account Management](./docs/all.md#-user-account-management)
- [ SMS Phone Number Management](./docs/all.md#-sms-phone-number-management)
- [Data Storage Policies \& Procedures](./docs/all.md#data-storage-policies--procedures)
- [Potential PII Locations](./docs/all.md#potential-pii-locations)
- [Data Retention Policy](./docs/all.md#data-retention-policy)

## License && public domain

Expand Down
7 changes: 7 additions & 0 deletions app/celery/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,12 @@ def save_sms(self, service_id, notification_id, encrypted_notification, sender_i
return

try:
job_id = notification.get("job", None)
created_by_id = None
if job_id:
job = dao_get_job_by_id(job_id)
created_by_id = job.created_by_id

saved_notification = persist_notification(
template_id=notification["template"],
template_version=notification["template_version"],
Expand All @@ -203,6 +209,7 @@ def save_sms(self, service_id, notification_id, encrypted_notification, sender_i
api_key_id=None,
key_type=KEY_TYPE_NORMAL,
created_at=datetime.utcnow(),
created_by_id=created_by_id,
job_id=notification.get("job", None),
job_row_number=notification.get("row_number", None),
notification_id=notification_id,
Expand Down
24 changes: 24 additions & 0 deletions app/clients/cloudwatch/aws_cloudwatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import time

from boto3 import client
from flask import current_app

from app.clients import AWS_CLIENT_CONFIG, Client
from app.cloudfoundry_config import cloud_config
Expand Down Expand Up @@ -50,6 +51,7 @@ def _get_log(self, my_filter, log_group_name, sent_at):
# Check all cloudwatch logs from the time the notification was sent (currently 5 minutes previously) until now
now = round(time.time() * 1000)
beginning = sent_at
current_app.logger.info(f"TIME RANGE TO CHECK {beginning} to {now}")
next_token = None
all_log_events = []
while True:
Expand All @@ -72,13 +74,20 @@ def _get_log(self, my_filter, log_group_name, sent_at):
all_log_events.extend(log_events)
if len(log_events) > 0:
# We found it
current_app.logger.info(
f"WE FOUND THE EVENT WE WERE LOOKING FOR? {log_events}"
)
break
next_token = response.get("nextToken")
if not next_token:
break
if len(all_log_events) == 0:
print(f"WE FOUND NO LOG EVENTS OVER TIME RANGE {beginning} to {now}")
return all_log_events

def check_sms(self, message_id, notification_id, created_at):
if os.getenv("LOCALSTACK_ENDPOINT_URL"):
current_app.logger.info("GADZOOKS WE ARE RUNNING WITH LOCALSTACK")
region = cloud_config.sns_region
# TODO this clumsy approach to getting the account number will be fixed as part of notify-api #258
account_number = cloud_config.ses_domain_arn
Expand All @@ -87,24 +96,39 @@ def check_sms(self, message_id, notification_id, created_at):
account_number = account_number[0]

log_group_name = f"sns/{region}/{account_number}/DirectPublishToPhoneNumber"
current_app.logger.info(
f"LOG GROUP NAME: {log_group_name} MESSAGE ID: {message_id}"
)
filter_pattern = '{$.notification.messageId="XXXXX"}'
filter_pattern = filter_pattern.replace("XXXXX", message_id)
all_log_events = self._get_log(filter_pattern, log_group_name, created_at)
current_app.logger.info(f"NUMBER OF ALL LOG EVENTS {len(all_log_events)}")

if all_log_events and len(all_log_events) > 0:
current_app.logger.info(
"SHOULD RETURN SUCCESS BECAUSE WE FOUND A SUCCESS MESSAGE FOR MESSAGE ID"
)
event = all_log_events[0]
message = json.loads(event["message"])
current_app.logger.info(f"MESSAGE {message}")
return "success", message["delivery"]["providerResponse"]

log_group_name = (
f"sns/{region}/{account_number}/DirectPublishToPhoneNumber/Failure"
)
current_app.logger.info(f"FAILURE LOG GROUP NAME {log_group_name}")
all_failed_events = self._get_log(filter_pattern, log_group_name, created_at)
current_app.logger.info(
f"NUMBER OF ALL FAILED LOG EVENTS {len(all_failed_events)}"
)
if all_failed_events and len(all_failed_events) > 0:
current_app.logger.info("SHOULD RETURN FAILED BECAUSE WE FOUND A FAILURE")
event = all_failed_events[0]
message = json.loads(event["message"])
current_app.logger.info(f"MESSAGE {message}")
return "failure", message["delivery"]["providerResponse"]

print(f"RAISING EXCEPTION FOR MESSAGE_ID {message_id}")
raise Exception(
f"No event found for message_id {message_id} notification_id {notification_id}"
)
46 changes: 43 additions & 3 deletions docs/all.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@
- [New Relic](#new-relic)
- [Onboarding](#onboarding)
- [Setting up the infrastructure](#setting-up-the-infrastructure)
- [Using the logs](#using-the-logs)
- [Testing](#testing)
- [CI testing](#ci-testing)
- [Manual testing](#manual-testing)
- [To run a local OWASP scan](#to-run-a-local-owasp-scan)
- [Deploying](#deploying)
- [Egress Proxy](#egress-proxy)
- [Managing environment variables](#managing-environment-variables)
- [Sandbox environment](#sandbox-environment)
- [Database management](#database-management)
- [Initial state](#initial-state)
Expand Down Expand Up @@ -85,13 +87,17 @@ In addition to terraform directories in the api and admin apps above:

## Terraform

We use Terraform to manage our infrastructure, providing consistent setups across the environments.

Our Terraform configurations manage components via cloud.gov. This means that the configurations should work out of the box if you are using a Cloud Foundry platform, but will not work for setups based on raw AWS.

### Development

There are several remote services required for local development:

* s3
* ses
* sns
* S3
* SES
* SNS

Credentials for these services are created by running:

Expand Down Expand Up @@ -205,6 +211,20 @@ Example answers for toll-free registration form

![example answers for toll-free registration form](./toll-free-registration.png)

# Using the logs

If you're using the `cf` CLI, you can run `cf logs notify-api-ENV` and/or `cf logs notify-admin-ENV` to stream logs in real time. Add `--recent` to get the last few logs, though logs often move pretty quickly.

For general log searching, [the cloud.gov Kibana instance](https://logs.fr.cloud.gov/) is powerful, though quite complex to get started. For shortcuts to errors, some team members have New Relic access.

The links below will open a filtered view with logs from both applications, which can then be filtered further. However, for the links to work, you need to paste them into the URL bar while *already* logged into and viewing the Kibana page. If not, you'll just be redirected to the generic dashboard.

Production: https://logs.fr.cloud.gov/app/discover#/view/218a6790-596d-11ee-a43a-090d426b9a38
Demo: https://logs.fr.cloud.gov/app/discover#/view/891392a0-596e-11ee-921a-1b6b2f4d89ed
Staging: https://logs.fr.cloud.gov/app/discover#/view/73d7c820-596e-11ee-a43a-090d426b9a38

Once in the view, you'll likely want to adjust the time range in the upper right of the page.

# Testing

```
Expand Down Expand Up @@ -304,6 +324,26 @@ application to a select list of allowed domains.
Update the allowed domains by updating `deploy-config/egress_proxy/notify-api-<env>.allow.acl`
and deploying an updated version of the application throught he normal deploy process.

## Managing environment variables

For an environment variable to make its way into the cloud.gov environment, it *must* end up in the `manifest.yml` file. Based on the deployment approach described above, there are 2 ways for this to happen.

### Secret environment variables

Because secrets are pulled from GitHub, they must be passed from our action to the deploy action and then placed into `manifest.yml`. This means that they should be in a 4 places:

- [ ] The GitHub secrets store
- [ ] The deploy action in the `env` section using the format `{secrets.SECRET_NAME}`
- [ ] The deploy action in the `push_arguments` section using the format `--var SECRET_NAME="$SECRET_NAME"`
- [ ] The manifest using the format `SECRET_NAME: ((SECRET_NAME))`

### Public environment variables

Public env vars make up the configuration in `deploy-config`. These are pulled in together by the `--vars-file` line in the deploy action. To add or update one, it should be in 2 places:

- [ ] The relevant YAML file in `deploy-config` using the format `var_name: value`
- [ ] The manifest using the format `((var_name))`

## Sandbox environment

There is a sandbox space, complete with terraform and `deploy-config/sandbox.yml` file available
Expand Down
Loading

0 comments on commit 5e53f06

Please sign in to comment.