Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Production Image Setup using LocalExecutor #4511

Closed
6 tasks done
btylerburton opened this issue Oct 25, 2023 · 6 comments
Closed
6 tasks done

Configure Production Image Setup using LocalExecutor #4511

btylerburton opened this issue Oct 25, 2023 · 6 comments

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Oct 25, 2023

User Story

Datagovteam would like to implement a production quality airflow installation using the LocalExecutor. This is necessary in order to establish a baseline for Airflow performance in Cloud.gov, and will allow us to compare performance against other executors more quantitatively.

Related:

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN I run cf login -a api.fr.cloud.gov --sso and authenticate
    AND I run cf target -o gsa-datagov -s <development|staging|production>
    WHEN I run cf push airflow --vars-file my_vars_file --strategy rolling
    THEN I expect to see the instance of the airflow admin panel when I visit the expected cloud.gov internal URL
    AND I will see my test DAG loaded here.

Background

Given the overwhelming number of options for configuration, establishing a performance baseline with a LocalExecutor configured for production, with an external RDS DB, will allow us to benchmark other solutions more effectively.

Cloud.gov also supports building applications from images, and given that we work with containers locally and have been traditionally supporting a both an image-to-container solution for local developement and a buildpak-to-container solution for production, then this POC will attempt to unify those two environments.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • Build production image using Breeze tools
  • Configure Airflow to use LocalExecutor instead of SequentialExecutor
  • Configure manifest.yml to use that image when deploying separate applications
  • Configure a cloud.gov Postgres RDb to work with Airflow
  • Deploy the applications onto cloud.gov a test DAG, either by extending the Docker image and COPYing it into to the contianer DAG folder, or by mounting a persistent storage volume.
@btylerburton
Copy link
Contributor Author

Deployed prod image successfully using airflow standalone

Image

@btylerburton
Copy link
Contributor Author

@btylerburton
Copy link
Contributor Author

Airflow is working, and we are seeing logs!

Image

@btylerburton
Copy link
Contributor Author

Draft PR is parked at GSA/datagov-harvester#2

README and other cleanup TBD.

@btylerburton
Copy link
Contributor Author

After much struggle with the Docker image, we pivoted to using the Python buildpack and were happily greeted with a healthy scheduler.

Image
Image

@btylerburton
Copy link
Contributor Author

PR is here: GSA/datagov-harvester#3

@github-project-automation github-project-automation bot moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Nov 21, 2023
@gujral-rei gujral-rei moved this from ✔ Done to 🗄 Closed in data.gov team board Nov 22, 2023
@btylerburton btylerburton added H2.0/orchestrator and removed H2.0/Harvest-General General Harvesting 2.0 Issues labels Dec 13, 2023
@btylerburton btylerburton removed their assignment Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant