Skip to content

Latest commit

 

History

History
114 lines (85 loc) · 6.06 KB

prerequisites.md

File metadata and controls

114 lines (85 loc) · 6.06 KB

Project Setup

Prerequisites:

Python 3

This project was tested with Python 3.11. Use a Python version manager and a virtual environment to install your dependencies.

Poetry: Python Dependency Manager

To install Poetry you can view the installation instructions here.

Google Cloud Platform Account

Sign up for a free test account here, and enable billing.

Prefect Cloud

Sign up for a free account here.

Google Cloud CLI

Installation instruction for gcloud here.

Terraform

You can view the installation instructions for Terraform here

Git and Github Repository

To install git, check out instructions here. Creation steps for a remote github repository here.


Setup Steps:

  1. Clone this repository

  2. Remove git history with rm -rf .git

  3. Rename the env file to .env

  4. Reinitialize git with git init

  5. Create a virtual environment python -m venv venv and activate it with source venv/bin/activate

  6. Install dependencies with poetry install --no-root

  7. Go to Prefect Cloud create a workspace and an API Key

  8. Fill out all the PREFECT related environment variables in your .env file except PREFECT_API_URL and PREFECT_API_KEY

  9. Export your environment variables set -o allexport && source .env && set +o allexport

  10. Run make prefect-api-url to get your Prefect api url. If you're getting authentication errors, make sure your PREFECT_API_KEY environment variable is unset. See this bug.

  11. Uncomment the PREFECT_API_URL env var, and set it to what showed up in your terminal. Also uncomment the PREFECT_API_KEY env variable.

  12. Run gcloud init and follow instructions to setup your project.

  13. Run gcloud info to check that all is configured correctly, you should see that your CLI is configured to use your created project.

  14. Enter your newly created projectID into the .env file, and fill out the other environment variables that relate to GCP. The GCP environment variables are the names of resources that you would like to be created, but that haven't been created yet except for the ProjectID. More info in comments of .env file. Make sure you do not have trailing spaces in your environment variable names.

  15. Export your environment variables again as you've added some new ones, with set -o allexport && source .env && set +o allexport

  16. Enable google cloud billing.

  17. Run make gcp-setup, this will enable the GCP services that we'll use for this project, create a service account with editor permissions, and download a json format api key to the path you specified in .env file.

  18. Make sure to included the GCP service account file to your .gitignore so its not version controlled.

  19. In Prefect Cloud, create 2 blocks: a Github Storage block, and a GCP credentials block. For the GCP credentials block, enter the JSON directly as the block will be accessed by CloudRun and will not have access to your local file system.

  20. Go into your terraform directory cd terraform

  21. Run terraform init to initialize.

  22. Run terraform plan to see the changes to be applied.

  23. Run terraform apply to deploy your resources.

  24. Setup your Github Action Secrets for CI/CD (most are same as env vars in .env file)
    github action secrets

    SECRET NAME Description
    GCP_COMPUTE_ENGINE_NAME Same as in .env
    GCP_COMPUTE_ENGINE_REGION TF_VAR_COMPUTE_ENGINE_REGION in .env
    GCP_DATASET_NAME Same as in .env
    GCP_DATASET_TABLE_NAME Same as in .env
    GCP_PROJECT_ID Same as in .env
    GCP_RESOURCE_REGION Same as in .env
    GCP_SERVICE_ACCOUNT_API_KEY_BASE64 Transform the JSON key of your GCP Service Account, into a base64 encoded string. Blog post about it here
    GCP_SERVICE_ACCOUNT_EMAIL Same as in .env
    PREFECT_AGENT_QUEUE_NAME Same as in .env
    PREFECT_API_KEY Same as in .env
    PREFECT_API_URL Same as in .env
    PREFECT_CLOUD_RUN_BLOCK_NAME Name that you would like to give your Cloud Run block in the Prefect UI. You don't need to create this block, the Github Actions will. Just choose the name.
    PREFECT_GCP_CREDENTIALS_BLOCK_NAME This is the name of the block you created in the Prefect UI a few steps above.
    PREFECT_GITHUB_BLOCK_NAME This is the name of the block you created in the Prefect UI a few steps above.

  25. In the dbt/nyc_stats/models, make sure that the schema.yml file matches up with your BigQuery setup. Also in stg_complaints.sql, make sure that the source reference matches up the name of your BigQuery table. Also check that all the references to db and tables in the core folder are ok.

  26. Push the code to your own remote repository. This will automatically (with the help of Github Actions), create a Docker image and push it to Artifact Registry, so that your flows can use that infrastructure when running. It will also create a CloudRunJob block in Prefect Cloud.

    git add .
    git commit -m 'initial commit'
    git remote add origin url-of-your-git-repo
    git branch -M main
    git push -u origin main