Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenLineageDao to handle airflow run uuid conflicts #2097

Merged
merged 2 commits into from
Sep 1, 2022

Conversation

collado-mike
Copy link
Collaborator

Problem

As described in OpenLineage/OpenLineage#1056, the OpenLineage Airflow integration has been generating conflicting UUIDs based on the DAG name and the DagRun id without accounting for different namespaces. In Marquez installations that have multiple Airflow deployments with duplicated DAG names, we generate jobs whose parents have the wrong namespace.

Solution

While the real root cause fix is in the OpenLineage repo, this fix alleviates the problem for Airflow installations that will continue to publish events with the older OpenLineage library. This checks the namespace of the parent run and verifies that it matches the namespace in the ParentRunFacet. If not, it generates a new parent run id that will be written with the correct namespace. A new test verifies this behavior

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

Signed-off-by: Michael Collado <collado.mike@gmail.com>
… have parents

Signed-off-by: Michael Collado <collado.mike@gmail.com>
@boring-cyborg boring-cyborg bot added the api API layer changes label Sep 1, 2022
@collado-mike collado-mike requested review from fm100 and removed request for wslulciuc September 1, 2022 17:03
@collado-mike collado-mike merged commit 27b54ed into main Sep 1, 2022
@collado-mike collado-mike deleted the fix/airflow_run_uuid_conflicts branch September 1, 2022 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants