Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create upstream_tasks parameter for dependencies independent of data transfers #585

Merged
merged 28 commits into from
Sep 8, 2022

Conversation

dimberman
Copy link
Collaborator

@dimberman dimberman commented Jul 28, 2022

Description

What is the current behavior?

Currently there is "native" way to set dependencies between tasks without passing either a table or a dataframe between said tasks. This is especially apparent when chaining raw_sql functions such as here:

#569

In order to have a solution that works we need somehting that feels pythonic and doesn't rely on traditional airflow >> operators

closes: #569

What is the new behavior?

This PR adds in the ability to set dependencies using a parameter upstream_tasks. This parameter can be used at runtime to set deps without passing data. Notably we don't want to actually render the data (imagine a use-case where a task relies on 10 dataframe tasks)

Does this introduce a breaking change?

No

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary
  • Add documentation around upstream_tasks
  • solution should not render the data since the data is never used.

@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Merging #585 (a5a6e9f) into main (62303bd) will decrease coverage by 0.13%.
The diff coverage is 88.57%.

@@            Coverage Diff             @@
##             main     #585      +/-   ##
==========================================
- Coverage   93.26%   93.12%   -0.14%     
==========================================
  Files          43       45       +2     
  Lines        1855     1877      +22     
  Branches      232      237       +5     
==========================================
+ Hits         1730     1748      +18     
- Misses         97      100       +3     
- Partials       28       29       +1     
Impacted Files Coverage Δ
...sdk/src/astro/sql/operators/upstream_task_mixin.py 69.23% <69.23%> (ø)
python-sdk/src/astro/sql/operators/append.py 100.00% <100.00%> (ø)
...thon-sdk/src/astro/sql/operators/base_decorator.py 93.13% <100.00%> (+0.13%) ⬆️
...ython-sdk/src/astro/sql/operators/base_operator.py 100.00% <100.00%> (ø)
python-sdk/src/astro/sql/operators/cleanup.py 97.77% <100.00%> (+0.02%) ⬆️
python-sdk/src/astro/sql/operators/dataframe.py 93.24% <100.00%> (+0.18%) ⬆️
python-sdk/src/astro/sql/operators/drop.py 100.00% <100.00%> (ø)
python-sdk/src/astro/sql/operators/export_file.py 93.93% <100.00%> (ø)
python-sdk/src/astro/sql/operators/load_file.py 97.14% <100.00%> (ø)
python-sdk/src/astro/sql/operators/merge.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimberman, I believe this PR does not close the original problem reported by #585, which is the comments at the beginning of the SQL statement were not working.

It does solve the issue of chaining run_raw_sql statements and operators which return None in the XComArg. We should probably log a Github issue for this and link it to this PR.

@@ -392,3 +392,41 @@ or

In all scenarios, even if the user gives a non-temporary table, only temporary
tables will actually be deleted.

## Tying Astro SDK decorators to traditional Airflow Operators
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the getting started tutorial the best place for this documentation?

I feel it could be best placed within our Sphinx documentation as a reference page. I believe the idea of the tutorial was to walk users through a first example of the Python SDK without introducing too many possibilities. Could you confirm this, @mikeshwe ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana this is one of the most common questions we get asked (how can I use astro-sdk with traditional airflow operators) so it seemed like the getting started page was a good place to do it. Glad to discuss other options though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fair. 👍

docs/getting-started/GETTING_STARTED.md Outdated Show resolved Hide resolved
src/astro/sql/operators/append.py Outdated Show resolved Hide resolved
src/astro/sql/operators/upstream_tasks.py Outdated Show resolved Hide resolved
@dimberman dimberman marked this pull request as ready for review August 11, 2022 19:21
src/astro/sql/operators/drop.py Outdated Show resolved Hide resolved
src/astro/sql/operators/export_file.py Outdated Show resolved Hide resolved
src/astro/sql/operators/load_file.py Outdated Show resolved Hide resolved
src/astro/sql/operators/merge.py Outdated Show resolved Hide resolved
Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@dimberman dimberman merged commit af802fd into main Sep 8, 2022
@dimberman dimberman deleted the setup-dependencies branch September 8, 2022 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability set task dependency without a data dependency
3 participants