Contributions are welcome and are greatly appreciated! Every little bit helps, and credit will always be given.
Report bugs through Apache JIRA.
Please report relevant information and preferably code that exhibits the problem.
Look through the JIRA issues for bugs. Anything is open to whoever wants to implement it.
Look through the Apache JIRA for features.
Any unassigned "Improvement" issue is open to whoever wants to implement it.
We've created the operators, hooks, macros and executors we needed, but we've made sure that this part of Airflow is extensible. New operators, hooks, macros and executors are very welcomed!
Airflow could always use better documentation, whether as part of the official
Airflow docs, in docstrings, docs/*.rst
or even on the web as blog posts or
articles.
The best way to send feedback is to open an issue on Apache JIRA.
If you are proposing a new feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible to make it easier to implement.
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
The latest API documentation is usually available here.
To generate a local version:
- Set up an Airflow development environment.
- Install the
doc
extra.
pip install -e '.[doc]'
- Generate and serve the documentation as follows:
cd docs
./build.sh
./start_doc_server.sh
Note
The docs build script build.sh
requires bash 4.0 or greater.
If you are building on mac, you can install latest version of bash with homebrew.
Before you submit a pull request (PR) from your forked repo, check that it meets these guidelines:
Include tests, either as doctests, unit tests, or both, to your pull request.
The airflow repo uses Travis CI to run the tests and codecov to track coverage. You can set up both for free on your fork (see Travis CI Testing Framework usage guidelines). It will help you make sure you do not break the build with your PR and that you help increase coverage.
Rebase your fork, squash commits, and resolve all conflicts.
When merging PRs, wherever possible try to use Squash and Merge instead of Rebase and Merge.
Make sure every pull request has an associated JIRA ticket. The JIRA link should also be added to the PR description.
Preface your commit's subject & PR title with [AIRFLOW-XXX] COMMIT_MSG where XXX is the JIRA number. For example: [AIRFLOW-5574] Fix Google Analytics script loading. We compose Airflow release notes from all commit titles in a release. By placing the JIRA number in the commit title and hence in the release notes, we let Airflow users look into JIRA and GitHub PRs for more details about a particular change.
Add an Apache License header to all new files.
If you have pre-commit hooks enabled, they automatically add license headers during commit.
If your pull request adds functionality, make sure to update the docs as part of the same PR. Doc string is often sufficient. Make sure to follow the Sphinx compatible standards.
Make sure your code fulfils all the static code checks we have in our code. The easiest way to make sure of that is to use pre-commit hooks
Run tests locally before opening PR.
Make sure the pull request works for Python 3.6 and 3.7.
Adhere to guidelines for commit messages described in this article. This makes the lives of those who come after you a lot easier.
There are two environments, available on Linux and macOS, that you can use to develop Apache Airflow:
- Local virtualenv development environment that supports running unit tests and can be used in your IDE.
- Breeze Docker-based development environment that provides an end-to-end CI solution with all software dependencies covered.
The table below summarizes differences between the two environments:
Property | Local virtualenv | Breeze environment |
---|---|---|
Test coverage |
|
|
Setup |
|
|
Installation difficulty |
|
|
Team synchronization |
|
|
Reproducing CI failures |
|
|
Ability to update |
|
|
Disk space and CPU usage |
|
|
IDE integration |
|
|
Typically, you are recommended to use both of these environments depending on your needs.
All details about using and running local virtualenv environment for Airflow can be found in LOCAL_VIRTUALENV.rst.
Benefits:
- Packages are installed locally. No container environment is required.
- You can benefit from local debugging within your IDE.
- With the virtualenv in your IDE, you can benefit from autocompletion and running tests directly from the IDE.
Limitations:
You have to maintain your dependencies and local environment consistent with other development environments that you have on your local machine.
You cannot run tests that require external components, such as mysql, postgres database, hadoop, mongo, cassandra, redis, etc.
The tests in Airflow are a mixture of unit and integration tests and some of them require these components to be set up. Local virtualenv supports only real unit tests. Technically, to run integration tests, you can configure and install the dependencies on your own, but it is usually complex. Instead, you are recommended to use Breeze development environment with all required packages pre-installed.
You need to make sure that your local environment is consistent with other developer environments. This often leads to a "works for me" syndrome. The Breeze container-based solution provides a reproducible environment that is consistent with other developers.
You are STRONGLY encouraged to also install and use pre-commit hooks for your local virtualenv development environment. Pre-commit hooks can speed up your development cycle a lot.
All details about using and running Airflow Breeze can be found in BREEZE.rst.
The Airflow Breeze solution is intended to ease your local development as "It's a Breeze to develop Airflow".
Benefits:
- Breeze is a complete environment that includes external components, such as mysql database, hadoop, mongo, cassandra, redis, etc., required by some of Airflow tests. Breeze provides a preconfigured Docker Compose environment where all these services are available and can be used by tests automatically.
- Breeze environment is almost the same as used in Travis CI automated builds. So, if the tests run in your Breeze environment, they will work in Travis CI as well.
Limitations:
- Breeze environment takes significant space in your local Docker cache. There are separate environments for different Python and Airflow versions, and each of the images takes around 3GB in total.
- Though Airflow Breeze setup is automated, it takes time. The Breeze environment uses pre-built images from DockerHub and it takes time to download and extract those images. Building the environment for a particular Python version takes less than 10 minutes.
- Breeze environment runs in the background taking precious resources, such as
disk space and CPU. You can stop the environment manually after you use it
or even use a
bare
environment to decrease resource usage.
NOTE: Breeze CI images are not supposed to be used in production environments. They are optimized for repeatability of tests, maintainability and speed of building rather than production performance. The production images are not yet officially published.
We check our code quality via static code checks. See STATIC_CODE_CHECKS.rst for details.
Your code must pass all the static code checks in Travis CI in order to be eligible for Code Review. The easiest way to make sure your code is good before pushing is to use pre-commit checks locally as described in the static code checks documentation.
We support the following types of tests:
- Unit tests are Python
nose
tests launched withrun-tests
. Unit tests are available both in the Breeze environment and local virtualenv. - Integration tests are available in the Breeze development environment
that is also used for Airflow Travis CI tests. Integration test are special tests that require
additional services running, such as Postgres,Mysql, Kerberos, etc. These tests are not yet
clearly marked as integration tests but soon they will be clearly separated by the
pytest
annotations. - System tests are automatic tests that use external systems like Google Cloud Platform. These tests are intended for an end-to-end DAG execution.
For details on running different types of Airflow tests, see TESTING.rst.
When developing features, you may need to persist information to the metadata database. Airflow has Alembic built-in module to handle all schema changes. Alembic must be installed on your development machine before continuing with migration.
# starting at the root of the project
$ pwd
~/airflow
# change to the airflow directory
$ cd airflow
$ alembic revision -m "add new field to db"
Generating
~/airflow/airflow/migrations/versions/12341123_add_new_field_to_db.py
airflow/www/
contains all yarn-managed, front-end assets. Flask-Appbuilder
itself comes bundled with jQuery and bootstrap. While they may be phased out
over time, these packages are currently not managed with yarn.
Make sure you are using recent versions of node and yarn. No problems have been found with node>=8.11.3 and yarn>=1.19.1
Make sure yarn is available in your environment.
To install it on macOS:
- Run the following commands (taken from this source):
brew install node --without-npm
brew install yarn
yarn config set prefix ~/.yarn
- Add
~/.yarn/bin
to yourPATH
so that commands you install globally are usable. - Set up your
.bashrc
file and thensource ~/.bashrc
to reflect the change.
export PATH="$HOME/.yarn/bin:$PATH"
- Install third party libraries defined in
package.json
by running the following commands within theairflow/www/
directory:
# from the root of the repository, move to where our JS package.json lives
cd airflow/www/
# run yarn install to fetch all the dependencies
yarn install
These commands install the libraries in a new node_modules/
folder within
www/
.
- Should you add or upgrade an node package, you should run:
yarn add --dev <package>
for packages needed in development oryarn add <package>
for packages used by the code
and push the newly generated package.json
and yarn.lock
file so that we
get a reproducible build. See the Yarn docs for more info
To parse and generate bundled files for Airflow, run either of the following commands:
# Compiles the production / optimized js & css
yarn run prod
# Starts a web server that manages and updates your assets as you modify them
yarn run dev
We try to enforce a more consistent style and follow the JS community guidelines.
Once you add or modify any javascript code in the project, please make sure it follows the guidelines defined in Airbnb JavaScript Style Guide.
Apache Airflow uses ESLint as a tool for identifying and reporting on patterns in JavaScript. To use it, run any of the following commands:
# Check JS code in .js and .html files, and report any errors/warnings
yarn run lint
# Check JS code in .js and .html files, report any errors/warnings and fix them if possible
yarn run lint:fix
Typically, you start your first contribution by reviewing open tickets at Apache JIRA.
For example, you want to have the following sample ticket assigned to you: AIRFLOW-5934: Add extra CC: to the emails sent by Aiflow.
In general, your contribution includes the following stages:
- Make your own fork of the Apache Airflow main repository.
- Create a local virtualenv, initialize the Breeze environment, and install pre-commit framework. If you want to add more changes in the future, set up your own Travis CI fork.
- Join devlist and set up a Slack account.
- Make the change and create a Pull Request from your fork.
- Ping @ #development slack, comment @people. Be annoying. Be considerate.
From the apache/airflow repo, create a fork:
Configure the Docker-based Breeze development environment and run tests.
You can use the default Breeze configuration as follows:
Install the latest versions of the Docker Community Edition and Docker Compose and add them to the PATH.
Enter Breeze:
./breeze
Breeze starts with downloading the Airflow CI image from the Docker Hub and installing all required dependencies.
Enter the Docker environment and mount your local sources to make them immediately visible in the environment.
Create a local virtualenv, for example:
mkvirtualenv myenv --python=python3.6
- Initialize the created environment:
./breeze --initialize-local-virtualenv
- Open your IDE (for example, PyCharm) and select the virtualenv you created as the project's default virtualenv in your IDE.
For effective collaboration, make sure to join the following Airflow groups:
- Mailing lists:
- Developer’s mailing list mailto:dev-subscribe@airflow.apache.org (quite substantial traffic on this list)
- All commits mailing list: mailto:commits-subscribe@airflow.apache.org (very high traffic on this list)
- Airflow users mailing list: mailto:users-subscribe@airflow.apache.org (reasonably small traffic on this list)
- Issues on Apache’s JIRA
- Slack (chat)
Update the local sources to address the JIRA ticket.
For example, to address this example JIRA ticket, do the following:
- Read about email configuration in Airflow.
- Find the class you should modify. For the example ticket, this is email.py.
- Find the test class where you should add tests. For the example ticket, this is test_email.py.
- Modify the class and add necessary code and unit tests.
- Run the unit tests from the IDE or local virtualenv as you see fit.
- Run the tests in Breeze.
- Run and fix all the static checks. If you have
pre-commits installed,
this step is automatically run while you are committing your code. If not, you can do it manually
via
git add
and thenpre-commit run
.
Rebase your fork, squash commits, and resolve all conflicts.
Re-run static code checks again.
Create a pull request with the following title for the sample ticket:
[AIRFLOW-5934] Added extra CC: field to the Airflow emails.
Make sure to follow other PR guidelines described in this document.
Note that committers will use Squash and Merge instead of Rebase and Merge when merging PRs and your commit will be squashed to single commit.