-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: Inconsistency around files_count #5338
Spec: Inconsistency around files_count #5338
Conversation
978fa31
to
1667773
Compare
@Fokko, sorry, but I think I was wrong in my initial review. I think instead of renaming these, we actually want to rename the fields that we write so that they match the spec. Because we use the same column for both Looks like the problem here is that the original This is what I meant by this comment:
I'm not sure why I thought the constant renames looked good though, since that would rename the constants to match the out-of-date manifest field names, which we should update instead. |
3dd4b02
to
95e8c27
Compare
95e8c27
to
b3eaf0c
Compare
922bb49
to
4c65ccb
Compare
Thanks for the review @rdblue 🙌 |
When building the Manifest mappers for Python, @rdblue noticed that the
added_data_files_count
should beadded_files_count
according to the spec.However, this field is written in Java as
added_data_files_count
to Avro by the Java implementation:iceberg/api/src/main/java/org/apache/iceberg/ManifestFile.java
Lines 44 to 49 in 8104769
Luckily this doesn't affect the reading/writing because it is position based.
However, it is confusing. I think we should resolve this. We could either do this by changing the Java impl, which probably works, but we could also change the spec. I know that this isn't something lightweight, but we could take it into consideration.
I think we should also update the references in the code to
{added,existing,deleted}_data_files_count
to make everything consistent and avoid confusion in the future.Closes #8684