-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: build column lineage using sqlglot #46
Conversation
* feat: add notes on sap hana queries and file-based lineage yaml file * feat: add new datasource and annotations * fix: added logger.debug, removed references to internal sap hana dev, fixed white spaces * added new docker compose * feat: add integration tests * fixing the check_golden_file * feat: add integration tests * feat: after task fix * feat: adjust test for integration * feat: adjust test for integration, edited pypro and test_hana * feat: adjusted the decorators * Update pyproject.toml * Update pyproject.toml * adding new test and headers * changes to test_hana and other files * added headers to test_helpers and other files * Update ingestion.py to remove the constraints for SYS * Update datahub_sap_hana/ingestion.py Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com> * Update datahub_sap_hana/ingestion.py Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com> * remove the execution block in ingestion.py * Update hana_recipe.yaml to change sink type to console * Update README.md * removed query * Update hana_to_file.yml * Update the testing section * Update pyproject.toml Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com> * Update notes.md * Update notes.md * Update docker-compose.yml * Update test_hana.py * Update docker-compose.yml * Update tests/README.md Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com> * Update tests/docker-compose.yml Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com> * changes to filenames, deleted notes.md * changes to filenames * changed file names * ran task fix * added deepdiff --------- Co-authored-by: kay_alave <kay.alave@gmail.com> Co-authored-by: Lucas Roesler <roesler.lucas@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the first round of suggestions that are more about project logistics and file structure. I will read through the new ingestion logic now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most important thing you can do first is to add doc strings to all of your functions.
Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
This adds two major features 1. the inspector is now described as a protocol, which allows us to define and use an inspector with caching 2. using this caching, we now ensure that the column names used in the lineage use casing matching the value returned from the insepctor. Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
This extracts and build column level lineage data from a SAP Hana db using sqlglot.
The column lineage is built from sqlglot nodes, which contains upstream and downstream data, before being transformed into workunits to be sent to datahub.
At this point, code is still WIP, with testing and experimentation for edge cases still ongoing.