A framework for building Airflow DAGs via YAML files. Castor comprises four modules:
- Config files
- DAG factory
- Task creator
- Task strategies
- Operator Factory
A set of YAMLs files defined by the user. Each YAML file represents an Airflow DAG.
The YAML comprises two sections: dag
and tasks
.
The dag
section contains all oficial parameters supported by an Airflow DAG. Check this for more information.
This is an example of a DAG section in a YAML file:
dag:
dag_id: 'init_castor_dag'
default_args: '{"owner": "castor", "start_date": "2021-06-13"}'
schedule_interval: '@once'
catchup: False
tags:
- example
The parameters a task should include are:
- [Mandatory] task_name: Name for the task
- [Mandatory] strategy: The strategy that should be used by the Task Creator to create the task (e.g., PythonOperatorStrategy)
- [Optional] depends_on: list of dependencies of the task. This are name of other tasks previously defined
- [Optional] args: Arguments supported by the Airflow operator associated to the Task Strategy
This is an example of a task
section in a YAML file:
- name: 'task_name'
strategy: 'strategy_name'
depends_on:
- 'AnotherTask'
- 'AnotherTask'
- ...
args:
retries: 2
trigger_rule: 'all_success'
provide_context: True
python_callable: 'print_params'
op_kwargs:
param1: 'value1'
This is a YAML file containing a simple Airflow DAG for showing Castor capabilities.
dag:
dag_id: 'init_castor_dag'
default_args: '{"owner": "castor", "start_date": "2021-06-13"}'
schedule_interval: '@once'
catchup: False
tags:
- example
tasks:
- name: 'start'
strategy: 'DummyOperatorStrategy'
- name: 't1'
strategy: 'PythonOperatorStrategy'
depends_on:
- 'start'
args:
retries: 2
trigger_rule: 'all_success'
provide_context: True
python_callable: 'print_params'
op_kwargs:
param1: 'value1'
- name: 't2'
strategy: 'PythonOperatorStrategy'
depends_on:
- 'start'
args:
retries: 2
trigger_rule: 'all_success'
provide_context: True
python_callable: 'print_params'
op_kwargs:
param1: 'value1'
- name: 't3'
strategy: 'PythonOperatorStrategy'
depends_on:
- 't1'
- 't2'
args:
retries: 2
trigger_rule: 'all_success'
provide_context: True
python_callable: 'print_params'
op_kwargs:
param1: 'value1'
- name: 'end'
strategy: 'DummyOperatorStrategy'
depends_on:
- 't3'
The DAG Factory is responsible for creating the DAGs based on the configuration defined in the YAML file.
The Task Creator is responsible for creating DAG tasks based on task strategies.
A task strategy represents a strategy in which a task can be executed. A strategy can be based on an Airflow operator (e.g., PythonOperatorStrategy).
The strategies supported by Castor at this moment in time are:
It is responsible for creating Airflow Operators based on a set of parameters supplied by the DAG Factory.
The operators supported by Castor at this moment in time are: