Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy Datasette to fly.io instead of Cloud Run #3018

Merged
merged 6 commits into from
Nov 9, 2023
Merged

Deploy Datasette to fly.io instead of Cloud Run #3018

merged 6 commits into from
Nov 9, 2023

Conversation

jdangerx
Copy link
Member

@jdangerx jdangerx commented Nov 6, 2023

This got a little more complicated than I had hoped, because I ran into image size limitations with datasette publish fly. There are a few workarounds, documented in the publish.py script.

However, this deploy script now works from a pudl-etl docker container running in GCP! So I have hope that it should work in the nightly build.

Right now, the entire "inspect databases, compress databases, build docker image, deploy" process takes about an hour:

  • 20 min to inspect
  • 5 min to compress
  • 25 min to build docker image, mostly in uploading 3.6 GB of context to the build server
  • 10 min to upload docker image
  • 2 min: actual startup time once the docker image is uploaded to the fly.io registry

This might go faster on a bigger box - the daz-dev VM is on e2-standard-4 and the nightly builds run on e2-highmem-8.

Before we merge this, we need to add a FLY_ACCESS_TOKEN to github secrets, or pull the secret from Google Secret Manager within the container. Open to either approach.

After we merge this, we need to:

  • wait for nightly builds to successfully deploy to fly.io
  • move our DNS to point data.catalyst.coop at catalyst-coop-pudl.fly.dev
  • delete our cloud deploy service

@jdangerx jdangerx requested a review from bendnorman November 6, 2023 17:46
Copy link

codecov bot commented Nov 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f25e921) 88.7% compared to head (a4eb7ac) 88.7%.
Report is 5 commits behind head on dev.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #3018   +/-   ##
=====================================
  Coverage   88.7%   88.7%           
=====================================
  Files         91      90    -1     
  Lines      11010   10988   -22     
=====================================
- Hits        9768    9752   -16     
+ Misses      1242    1236    -6     
Files Coverage Δ
src/pudl/metadata/classes.py 86.6% <100.0%> (+<0.1%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@bendnorman bendnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Mostly clarifying questions.

devtools/datasette/publish.py Outdated Show resolved Hide resolved
.gitignore Show resolved Hide resolved
@@ -97,7 +97,7 @@ if [[ ${PIPESTATUS[0]} == 0 ]]; then
# Deploy the updated data to datasette
if [ $GITHUB_REF = "dev" ]; then
gcloud config set run/region us-central1
source ~/devtools/datasette/publish.sh
python ~/devtools/datasette/publish.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we be notified if this script fails so we don't run into the scenario where we're unaware of many datasette deployment failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, since the notify_slack happens before we even try to publish - yeah, that's a good point, though I'm not sure if we had this functionality in the old world.

Do you think the following diff would work?

diff --git a/docker/gcp_pudl_etl.sh b/docker/gcp_pudl_etl.sh
index 5e0929b33..0fe5c26fe 100644
--- a/docker/gcp_pudl_etl.sh
+++ b/docker/gcp_pudl_etl.sh
@@ -85,10 +85,8 @@ function notify_slack() {
 # 2>&1 redirects stderr to stdout.
 run_pudl_etl 2>&1 | tee $LOGFILE
 
-# Notify slack if the etl succeeded.
+# if pipeline is successful, distribute + publish datasette
 if [[ ${PIPESTATUS[0]} == 0 ]]; then
-    notify_slack "success"
-
     # Dump outputs to s3 bucket if branch is dev or build was triggered by a tag
     if [ $GITHUB_ACTION_TRIGGER = "push" ] || [ $GITHUB_REF = "dev" ]; then
         copy_outputs_to_distribution_bucket
@@ -99,8 +97,13 @@ if [[ ${PIPESTATUS[0]} == 0 ]]; then
         gcloud config set run/region us-central1
         python ~/devtools/datasette/publish.py
     fi
-else
-    notify_slack "failure"
 fi
 
+# Notify slack about entire pipeline's success or failure;
+# PIPESTATUS[0] either refers to the failed ETL run or the last distribution
+# task that was run above
+if [[ ${PIPESTATUS[0]} == 0 ]]; then notify_slack "success" else notify_slack
+    "failure" fi
+
+
 shutdown_vm

Annoyingly, the most straightforward way I can imagine testing it is "include assert False at the end of publish.py and wait for the nightly build to fail there" - any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test it out on this branch, you could run the fast ETL, skip the tests and include assert False at the end of publish.py.

That logic looks correct! Any suggestions for how to capture the logs for the distribute + publish datasette steps? With your current proposal if, the ETL will succeed and the datasette deployment fails, we'll get a notification that something failed but won't find any failures in the logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I captured the logs by tee -a $LOGFILE so that the publish logs just end up at the end of the pudl-etl.log. This worked for debugging a failed deploy just now so I think that's good enough :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it thank you!

devtools/datasette/fly/run.sh Show resolved Hide resolved
@jdangerx jdangerx requested a review from bendnorman November 7, 2023 15:58
@bendnorman
Copy link
Member

Just a reminder to open a PR for the .github/workflows/build-deploy-pudl.yml in main once these changes are in dev.

@jdangerx jdangerx force-pushed the fly-io branch 2 times, most recently from f735653 to c49bfcb Compare November 8, 2023 17:08
@jdangerx jdangerx merged commit d8512b5 into dev Nov 9, 2023
9 checks passed
@jdangerx jdangerx deleted the fly-io branch November 9, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants