Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: build column lineage using sqlglot #46

Merged
merged 17 commits into from
Jun 30, 2023
Merged

feat: build column lineage using sqlglot #46

merged 17 commits into from
Jun 30, 2023

Conversation

eyelesbarrow
Copy link
Contributor

@eyelesbarrow eyelesbarrow commented Jun 23, 2023

This extracts and build column level lineage data from a SAP Hana db using sqlglot.
The column lineage is built from sqlglot nodes, which contains upstream and downstream data, before being transformed into workunits to be sent to datahub.
At this point, code is still WIP, with testing and experimentation for edge cases still ongoing.

eyelesbarrow and others added 5 commits June 12, 2023 17:51
* feat: add notes on sap hana queries and file-based lineage yaml file

* feat: add new datasource and annotations

* fix: added logger.debug, removed references to internal sap hana dev, fixed white spaces

* added new docker compose

* feat: add integration tests

* fixing the check_golden_file

* feat: add integration tests

* feat: after task fix

* feat: adjust test for integration

* feat: adjust test for integration, edited pypro and test_hana

* feat: adjusted the decorators

* Update pyproject.toml

* Update pyproject.toml

* adding new test and headers

* changes to test_hana and other files

* added headers to test_helpers and other files

* Update ingestion.py to remove the constraints for SYS

* Update datahub_sap_hana/ingestion.py

Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>

* Update datahub_sap_hana/ingestion.py

Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>

* remove the execution block in ingestion.py

* Update hana_recipe.yaml to change sink type to console

* Update README.md

* removed query

* Update hana_to_file.yml

* Update the testing section

* Update pyproject.toml

Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>

* Update notes.md

* Update notes.md

* Update docker-compose.yml

* Update test_hana.py

* Update docker-compose.yml

* Update tests/README.md

Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>

* Update tests/docker-compose.yml

Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>

* changes to filenames, deleted notes.md

* changes to filenames

* changed file names

* ran task fix

* added deepdiff

---------

Co-authored-by: kay_alave <kay.alave@gmail.com>
Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>
Copy link
Member

@LucasRoesler LucasRoesler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the first round of suggestions that are more about project logistics and file structure. I will read through the new ingestion logic now

LICENSE Outdated Show resolved Hide resolved
datahub_sap_hana/hana_recipe.yaml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tests/test_helpers/click_helpers.py Outdated Show resolved Hide resolved
datahub_sap_hana/col_lineage/sqlglot_test.py Outdated Show resolved Hide resolved
Copy link
Member

@LucasRoesler LucasRoesler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most important thing you can do first is to add doc strings to all of your functions.

datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
schema.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
schema.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
datahub_sap_hana/ingestion.py Outdated Show resolved Hide resolved
kay_alave and others added 5 commits June 26, 2023 15:39
Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
This adds two major features

1. the inspector is now described as a protocol, which allows us to
   define and use an inspector with caching
2. using this caching, we now ensure that the column names used in the
   lineage use casing matching the value returned from the insepctor.

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
LICENSE Outdated Show resolved Hide resolved
sap_hana_results.json Outdated Show resolved Hide resolved
console_recipe.yaml Outdated Show resolved Hide resolved
@eyelesbarrow eyelesbarrow changed the title Test sqlglot feat: build column lineage using sqlglot Jun 30, 2023
@eyelesbarrow eyelesbarrow marked this pull request as ready for review June 30, 2023 08:48
Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
@LucasRoesler LucasRoesler merged commit fa4cc77 into main Jun 30, 2023
@LucasRoesler LucasRoesler deleted the test_sqlglot branch June 30, 2023 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants