Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that caused a single run event to create multiple jobs #2162

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

collado-mike
Copy link
Collaborator

Signed-off-by: Michael Collado collado.mike@gmail.com

Problem

As described in #2158 a single run may send multiple run events, where the start event contains a parent run and subsequent events do not. In that case, the subsequent jobs will create a new job without a parent, causing duplicate jobs with the same name.
Closes: #2158

Solution

During OpenLineage event handling, check to see if a run with the given id already exists. If it does, do not create a new job (or parent), but simply use the job that is already associated with the run.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

Signed-off-by: Michael Collado <collado.mike@gmail.com>
@boring-cyborg boring-cyborg bot added the api API layer changes label Oct 5, 2022
@codecov
Copy link

codecov bot commented Oct 5, 2022

Codecov Report

Merging #2162 (05573b8) into main (67e9249) will increase coverage by 0.03%.
The diff coverage is 96.00%.

@@             Coverage Diff              @@
##               main    #2162      +/-   ##
============================================
+ Coverage     75.78%   75.82%   +0.03%     
- Complexity     1061     1063       +2     
============================================
  Files           209      209              
  Lines          5006     5013       +7     
  Branches        403      403              
============================================
+ Hits           3794     3801       +7     
  Misses          763      763              
  Partials        449      449              
Impacted Files Coverage Δ
api/src/main/java/marquez/db/OpenLineageDao.java 95.29% <96.00%> (+0.07%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@mobuchowski mobuchowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@collado-mike collado-mike merged commit b9abb19 into main Oct 5, 2022
@collado-mike collado-mike deleted the fix/align_parents_for_lineage_events branch October 5, 2022 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lineage data generated from PythonOperator results in 500 Error
2 participants