Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12883] Add coder for ReadableFileCoder that supports MetadataCoderV2 #15510

Merged
merged 8 commits into from
Oct 6, 2021

Conversation

brachi-wernick
Copy link
Contributor

@brachi-wernick brachi-wernick commented Sep 14, 2021

Currently, there is a separated coder for Metadata which cares about Metadata#lastModifiedMillis().
The issue is that the ReadableFileCoder always uses the default Metadata coder.
We need to create a similar coder for ReadableFileCoder: ReadableFileCoderV2 which just use the corresponding MetadataCoderV2.

Without doing so lastModifiedMillis will remain zero.

There is also no way to create custom coder in our side since FileIO.ReadableFile has no public method to initiate new instance of it.

Also check this thread in StackOverflow

brachipa added 4 commits September 14, 2021 23:12
… add-coder-readable-file-v2

# Conflicts:
#	sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ReadableFileCoderV2.java
@pabloem
Copy link
Member

pabloem commented Sep 16, 2021

Maybe it makes better sense to ensure the MetadataCoder will actually encode+decode the metadata's lastModifiedMillis field? Why did you choose to not make that change instead? I think that would work best.

cc: @reuvenlax

@pabloem pabloem self-requested a review September 16, 2021 17:49
@brachi-wernick
Copy link
Contributor Author

brachi-wernick commented Sep 17, 2021

Maybe it makes better sense to ensure the MetadataCoder will actually encode+decode the metadata's lastModifiedMillis field? Why did you choose to not make that change instead? I think that would work best.

cc: @reuvenlax

@pabloem why initially this separation was needed between the coders? MetadataCoder and MetadataCoderV2, I just keep this style in the ReadableFileCoder.

@pabloem
Copy link
Member

pabloem commented Sep 23, 2021

good observation - here's the PR for the first V2 coder (#6914)

@robertwb do you have thoughts on adding a new coder version vs the need to preserve compatibility here?

@robertwb
Copy link
Contributor

I think the only way to preserver backwards compatibility here is to add a new coder and also add an opt-in mechanism for users to select this new coder. It may be worth considering making this coder more extensible so this doesn't happen in the future (e.g. storing the values as a key-value map and ignoring unknown fields, which is less efficient but probably doesn't matter for this coder vs. the chances that we'll want to add a new field in the future.)

@brachi-wernick
Copy link
Contributor Author

Creating more extendable coder is definitely better, I would suggest the bellow:

  1. Add MetadataCoder the ability to get list of fields names needed to be encoded/decode.
  2. ReadableFileCoder will get MetadataCoder instance in its constructor parameters. to make the configuration and adjustments only in one place(MetadataCoder)

We can make this also in 2 phases-PRs, (start with number 2 just to solve the current issue and then enhance MetadataCoder for next future fields)

@pabloem
Copy link
Member

pabloem commented Oct 4, 2021

alright taking a look...

import org.apache.beam.sdk.coders.VarIntCoder;
import java.util.Collections;
import java.util.List;
import org.apache.beam.sdk.coders.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this asterisk-based import is causing the PreCommit to fail (we try to avoid this sort of import) - can you import only the classes you need?

@pabloem
Copy link
Member

pabloem commented Oct 6, 2021

LGTM. @brachi-wernick are you able to follow up with change #1?

@pabloem pabloem merged commit 2028e6c into apache:master Oct 6, 2021
@brachipa
Copy link

brachipa commented Oct 7, 2021

are you able to follow up with change #1?

I will work on it and open a diff PR for it and link it to that PR thanks!

@brachipa
Copy link

LGTM. @brachi-wernick are you able to follow up with change #1?

In case you missed it, I submitted a PR #15699 for change 1.

dmitriikuzinepam pushed a commit to dmitriikuzinepam/beam that referenced this pull request Nov 2, 2021
…bleFileCoder that supports MetadataCoderV2

* add coder readable files

* add coder readable files

* :sdks:java:core:spotlessApply

* [BEAM-12883] add ability tp set custom MetadataCoder for ReadableFileCoder

* [BEAM-12883] add ability tp set custom MetadataCoder for ReadableFileCoder

* [BEAM-12883] add ability t0 set custom MetadataCoder using StructuredCoder

* [BEAM-12883] remove asterisk-based import

Co-authored-by: brachipa <brachipa@moonactive.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants