Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spike: 4d] Standing up Airflow ECS Executor POC on Cloud.gov #4503

Closed
3 tasks
btylerburton opened this issue Oct 19, 2023 · 4 comments
Closed
3 tasks

[spike: 4d] Standing up Airflow ECS Executor POC on Cloud.gov #4503

btylerburton opened this issue Oct 19, 2023 · 4 comments
Assignees

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Oct 19, 2023

Purpose

In order to reap the benefits of cloud.gov, datagovteam would like to explore running Airflow using the new AWS ECS Executor

Given above question, conducting prototyping is needed to provide factual knowledge on future steps.

4d of effort has been allocated and once compete, findings will be demonstrated and specific future actions will be decided.

Acceptance Criteria

[ACs should be clearly demo-able/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN I run cf login -a api.fr.cloud.gov --sso and authenticate
    AND I run cf target -o gsa-datagov -s <development|staging|production>
    WHEN I run cf push airflow-ecs-test --vars-file my_vars_file --strategy rolling
    THEN I expect to see the instance of the airflow admin panel when I visit the expected cloud.gov internal URL
    AND I will see my test DAG loaded here.

Background

There is a new Executor which will support ECS. This is a WIP, but it will soon be merged into core and production ready by the time we would need it to be.

This spike is to prototype through any general configuration challenges and, more specifically, how to configure a connection between the ECS Tasks and the Airflow Metadata Database.

Here's a system diagram of the ECS Container Executor:

Screenshot 2023-10-19 at 2 01 25 PM

This is the WIP PR into core: apache/airflow#34381

And here is Airflow's documentation on integrating an External DB: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html

Sketch

  • Manually deploy Airflow with the ECS Executor in the development space
    • Include an externally configured DB for the Airlfow Metadata Databse
@btylerburton btylerburton changed the title [spike: 5d] Standing up Airflow ECS Executor POC on Cloud.gov [spike: 4d] Standing up Airflow ECS Executor POC on Cloud.gov Oct 19, 2023
@btylerburton btylerburton moved this to 🏗 In Progress [8] in data.gov team board Oct 19, 2023
@btylerburton btylerburton self-assigned this Oct 19, 2023
@btylerburton
Copy link
Contributor Author

Given the response from cloud.gov support that networking an internal DB via an external route to something outside cloud.gov is not something they can support given their security controls, this options presents the same issues as the K8s executor.

In light of this, this ticket is going back into planning in favor of establishing a baseline with the LocalExecutor within our ecosystem and then determining if we need to either write our own custom CFExecutor or if we incur the security risks with running an RDb as a brokered service in AWS.

@btylerburton btylerburton moved this from 🏗 In Progress [8] to New Dev in data.gov team board Oct 25, 2023
@btylerburton
Copy link
Contributor Author

Moving back to "new dev"

@nickumia
Copy link

networking an internal DB via an external route to something outside cloud.gov

Does this mean that you were trying to get a public endpoint for a cloud.gov-managed RDS instance?

I don't think that the issues are exactly the same as the K8S Executor. Securing RDS would be a lot simpler than securing K8S. But, of course a cloud.gov-native executor would be the best option. .... Another hack... could be to create a proxy app that forwards cloud.gov RDS to wherever it needs to connect.

@btylerburton btylerburton moved this from New Dev to 🧊 Icebox in data.gov team board Dec 14, 2023
@btylerburton
Copy link
Contributor Author

Closing as we've pivoted away from Airflow

@github-project-automation github-project-automation bot moved this from 🧊 Icebox to ✔ Done in data.gov team board Mar 28, 2024
@gujral-rei gujral-rei moved this from ✔ Done to 🗄 Closed in data.gov team board Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants