-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 'ColumnPath not found' error reading Parquet files with nested REPEATED fields #5102
Fix 'ColumnPath not found' error reading Parquet files with nested REPEATED fields #5102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review, this looks good. Left a minor improvement suggestion.
FYI @sunchao as you're likely more familiar with this code than I am
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late LGTM too, thanks @tustvold for the ping.
@@ -274,12 +274,12 @@ impl TreeBuilder { | |||
row_group_reader, | |||
)?; | |||
|
|||
Reader::RepeatedReader( | |||
return Ok(Reader::RepeatedReader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of returning early here we could also just remove the path.pop()
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will that not change the path passed to reader_tree above? Does that not matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing path.pop()
early on and the recursive call to reader_tree()
started failing because it did not find the fields it expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, nvm then 😂
Which issue does this PR close?
Closes #5064.
Rationale for this change
reader_tree()
maintains a stack of field names as it recurses through nested schemas. When it encounters aREPEATED
field, it performs two pops for only one push and ends up being off by one. The change ensures the counts of pushes and pops match.Are there any user-facing changes?
No