-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support inputFacets and outputFacets #2417
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2417 +/- ##
============================================
+ Coverage 83.60% 83.85% +0.24%
- Complexity 1213 1234 +21
============================================
Files 231 233 +2
Lines 5520 5629 +109
Branches 266 269 +3
============================================
+ Hits 4615 4720 +105
- Misses 762 766 +4
Partials 143 143
... and 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
857f23e
to
ede31df
Compare
ede31df
to
d876158
Compare
@Nullable final Map<String, Object> facets) { | ||
@Nullable final Map<String, Object> facets, | ||
@Nullable final List<RunDatasetFacets> inputFacets, | ||
@Nullable final List<RunDatasetFacets> outputFacets) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class Run
defines a list of inputVersions
and inputVersions
as DatasetVersionId
entries. Given that a class DatasetVersionId
is needed for RunDatasetFacets
(along with the in/out facets) can we define a class DatasetVersionIdAndFacets
as:
class DatasetVersionIdAndFacets {
final DatasetVersionId datasetVersionId;
final ImmutableMap<String, Object> facets;
}
Then, update Run
to use DatasetVersionIdAndFacets
for inputVersions
and outputVersions
?
public class Run {
.
final List<DatasetVersionIdAndFacets> inputVersions;
List<DatasetVersionIdAndFacets> outputVersions;
.
.
It just seems RunDatasetFacets
is really a DatasetVersionId
with facets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind also updating the spec/openapi.yml
with the additional properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wslulciuc I agree that merging these two entries is fine, but I would like to avoid And
in class name. Could we stay within RunDatasetFacets
which contain both dataset version id's and facets? I've modified the code to return Run in that form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wslulciuc can we proceed with PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Sorry, I can't hide it. I've been heads down focusing on other things and this PR totally slipped off my radar. Ok, so final thoughts here: we'll want to avoid being overly specific and use facets in RunDatasetFacets
. We'll mostly likely add additional properties other than just datasetVersionId
and facets
, so let's just go with either:
DatasetVersion
Or, we can be more specific (as in/out dataset versions will most likely contain different properties at some point):
InputDatasetVersion
OutDatasetVersion
Thoughts?
ce7d31e
to
3264a59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawel-big-lebowski, approving as to not block the PR based on our offline discussion (i.e. you're cool with my suggested naming). Great work, and merge when ready 👍
e5d48fe
to
7b52941
Compare
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
7b52941
to
feba65f
Compare
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com> Signed-off-by: Xavier-Cliquennois <xavier.cliquennois@wearegraphite.io>
Signed-off-by: Pawel Leszczynski leszczynski.pawel@gmail.com
Problem
Marquez does not support
inputFacets
andoutputFacets
sent in Ol event. They're not saved nor exposed in api.Closes: #2320
Solution
We do have
dataset_facets
table designed for this but we don't storeinputFacets
andoutputFacets
there.Although
inputFacets
andoutputFacets
contained within Openlineage event as dataset properties, we want to expose them inapi
as part of a run bcz they describe specific runs rather than datasets or dataset's versions.Checklist
CHANGELOG.md
with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary).sql
database schema migration according to Flyway's naming convention (if relevant)