fix: only read the data file's fields from the page table and not the whole dataset's fields #2095
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During the refactor to split things into lance-file and lance-table I introduced a bug. There were a number of code paths for opening a file reader and I consolidated them into one. However, when we were reading one data file from several data files, we ended up trying to load too many fields from the page table (more fields than exist in the page table).
Most of the time, this was ok. The read would read past the end of the page table and grab part of the schema (effectively garbage data). We never actually accessed this data and so we would get away with it. However, if the fragment schema was considerably larger than the data file schema, then the read would go past the end of the file and cause an error.