fix: only read the data file's fields from the page table and not the whole dataset's fields #2095

westonpace · 2024-03-20T13:11:45Z

During the refactor to split things into lance-file and lance-table I introduced a bug. There were a number of code paths for opening a file reader and I consolidated them into one. However, when we were reading one data file from several data files, we ended up trying to load too many fields from the page table (more fields than exist in the page table).

Most of the time, this was ok. The read would read past the end of the page table and grab part of the schema (effectively garbage data). We never actually accessed this data and so we would get away with it. However, if the fragment schema was considerably larger than the data file schema, then the read would go past the end of the file and cause an error.

codecov-commenter · 2024-03-20T13:40:59Z

Codecov Report

Attention: Patch coverage is 88.78505% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 80.68%. Comparing base (faed71b) to head (bbd219b).

Files	Patch %	Lines
rust/lance-file/src/reader.rs	90.32%	1 Missing and 8 partials ⚠️
rust/lance/src/index/vector/hnsw.rs	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2095      +/-   ##
==========================================
- Coverage   80.69%   80.68%   -0.01%     
==========================================
  Files         160      160              
  Lines       47022    47112      +90     
  Branches    47022    47112      +90     
==========================================
+ Hits        37946    38014      +68     
- Misses       6922     6928       +6     
- Partials     2154     2170      +16

Flag	Coverage Δ
unittests	`80.68% <88.78%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Fix a bug where we would attempt to read past the end of the page table

bbd219b

westonpace requested review from BubbleCal and eddyxu March 20, 2024 13:41

BubbleCal approved these changes Mar 20, 2024

View reviewed changes

wjones127 approved these changes Mar 20, 2024

View reviewed changes

westonpace merged commit 69c1a9d into lancedb:main Mar 20, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: only read the data file's fields from the page table and not the whole dataset's fields #2095

fix: only read the data file's fields from the page table and not the whole dataset's fields #2095

westonpace commented Mar 20, 2024

codecov-commenter commented Mar 20, 2024

fix: only read the data file's fields from the page table and not the whole dataset's fields #2095

fix: only read the data file's fields from the page table and not the whole dataset's fields #2095

Conversation

westonpace commented Mar 20, 2024

codecov-commenter commented Mar 20, 2024

Codecov Report