Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column lineage graph endpoint #2124

Merged
merged 1 commit into from
Oct 7, 2022

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Sep 16, 2022

Signed-off-by: Pawel Leszczynski leszczynski.pawel@gmail.com

Problem

PR #2096 allows storing in database column-lineage information from the events. In this PR we expose column lineage through a graph endpoint according to the proposal (https://github.com/MarquezProject/marquez/blob/main/proposals/2045-column-lineage-endpoint.md)

Closes: #2114

Solution

  • Another NodeType DATASET_FIELD is added,
  • column-lineage endpoint returns serialized Lineage objects similar to a currently existing lineage endpoint,

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • [] Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Sep 16, 2022
@codecov
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Merging #2124 (8e66689) into main (b9abb19) will increase coverage by 0.51%.
The diff coverage is 91.83%.

@@             Coverage Diff              @@
##               main    #2124      +/-   ##
============================================
+ Coverage     75.82%   76.33%   +0.51%     
- Complexity     1063     1099      +36     
============================================
  Files           209      214       +5     
  Lines          5013     5139     +126     
  Branches        403      407       +4     
============================================
+ Hits           3801     3923     +122     
+ Misses          763      762       -1     
- Partials        449      454       +5     
Impacted Files Coverage Δ
api/src/main/java/marquez/db/ColumnLineageDao.java 100.00% <ø> (ø)
api/src/main/java/marquez/db/DatasetFieldDao.java 100.00% <ø> (ø)
.../src/main/java/marquez/service/DelegatingDaos.java 0.00% <ø> (ø)
...main/java/marquez/service/models/LineageEvent.java 83.56% <75.00%> (-1.52%) ⬇️
...c/main/java/marquez/api/ColumnLineageResource.java 80.00% <80.00%> (ø)
...i/src/main/java/marquez/service/models/NodeId.java 64.48% <80.00%> (+2.38%) ⬆️
api/src/main/java/marquez/db/OpenLineageDao.java 95.53% <86.66%> (+0.24%) ⬆️
...arquez/db/mappers/ColumnLineageNodeDataMapper.java 90.00% <90.00%> (ø)
...ain/java/marquez/service/ColumnLineageService.java 97.14% <97.14%> (ø)
api/src/main/java/marquez/MarquezContext.java 85.71% <100.00%> (+0.78%) ⬆️
... and 7 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pawel-big-lebowski pawel-big-lebowski force-pushed the column-lineage-graph-endpoint branch 3 times, most recently from d47f846 to f79a812 Compare September 21, 2022 09:07
@pawel-big-lebowski pawel-big-lebowski force-pushed the column-lineage-graph-endpoint branch 2 times, most recently from ad91189 to 5260505 Compare September 28, 2022 09:19
@boring-cyborg boring-cyborg bot added the docs label Sep 28, 2022
@pawel-big-lebowski pawel-big-lebowski changed the title add column lineage graph endpoint Column lineage graph endpoint Sep 28, 2022
@@ -88,4 +95,59 @@ void doUpsertColumnLineageRow(
},
value = "values")
List<ColumnLineageRow> rows);

@SqlQuery(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important piece of the PR: recursive query to extract column-lineage graph.
Only column_lineage table is used and joined to obtained graph nodes.
Other tables are only used to enrich found nodes.

Base automatically changed from add-column-level-lineage to main September 30, 2022 09:23
@@ -533,8 +533,8 @@ public static class ColumnLineageOutputColumn extends BaseJsonModel {
@ToString
public static class ColumnLineageInputField extends BaseJsonModel {

@NotNull private String datasetNamespace;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@pawel-big-lebowski pawel-big-lebowski merged commit 496566e into main Oct 7, 2022
@pawel-big-lebowski pawel-big-lebowski deleted the column-lineage-graph-endpoint branch October 7, 2022 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Column lineage graph endpoint
3 participants