Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openlineage support - Add Extractor for AppendOperator #899

Closed
8 tasks
kaxil opened this issue Sep 23, 2022 · 0 comments · Fixed by #1038
Closed
8 tasks

Openlineage support - Add Extractor for AppendOperator #899

kaxil opened this issue Sep 23, 2022 · 0 comments · Fixed by #1038
Assignees
Labels
feature New feature or request priority/high High priority product/python-sdk Label describing products
Milestone

Comments

@kaxil
Copy link
Collaborator

kaxil commented Sep 23, 2022

Please describe the feature you'd like to see
We should be able to extract open lineage info from the AppendOperator.

Describe the solution you'd like

Acceptance Criteria

  • Post the screenshot of how it looks in the Openlineage/Marquez UI
  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
@kaxil kaxil added feature New feature or request priority/high High priority labels Sep 23, 2022
@kaxil kaxil added this to the 1.2.0 milestone Sep 23, 2022
@kaxil kaxil added the product/python-sdk Label describing products label Sep 23, 2022
pankajastro added a commit that referenced this issue Oct 14, 2022
# Description
Added methods to collect the lineage for the Append operator.
Expose `get_openlineage_facets` on the operator which will be used by
PythonExtractor once task execution is complete. Exposed
`openlineage_dataset_name` to get dataset qualified name and
`openlineage_dataset_namespace` to get dataset namespace on BaseTable
object. Since Table is an input/output for astro-sdk operator so I feel
this is the right place to expose it. The BaseTable
`openlineage_dataset_name` and `openlineage_dataset_namespace` depend on
table type construct the right name and namespace of dataset by calling
astro-sdk databases class `openlineage_dataset_name` and
`openlineage_dataset_namespace` method at runtime.


## What is the current behavior?
We do not collect lineage for the Append operator

closes: #899 


## What is the new behavior?
Add lineage support for the Append operator


## Does this introduce a breaking change?
No

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request priority/high High priority product/python-sdk Label describing products
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants