Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does csv_index::RandomAccessSimple store approximate or exact byte offsets? #375

Open
gavinwahl opened this issue Sep 5, 2024 · 1 comment
Labels

Comments

@gavinwahl
Copy link

One piece of documentation says csv_index::RandomAccessSimple stores indices to byte offsets corresponding to the start of records, while another piece of documentation says it stores /approximate/ offsets. Which is it? If approximate, how would an approximate index be used to locate the actual start of a record?

exact: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/lib.rs#L19
approximate: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/simple.rs#L14

@BurntSushi
Copy link
Owner

BurntSushi commented Sep 6, 2024

I can't remember, sadly. I think what that's referring to is that there may be cases where the byte offset is before what a human might consider to be the start of a CSV record when reading the CSV data, but that the byte offset is still correct assuming you use the csv crate (or its underlying csv-core implementation) to read the record for that position. This might sound weird, and that's because it is. For example, csv-core ignores empty lines, so if you have:


foo,bar,baz

Then there are technically 2 valid byte offsets for the start of the foo,bar,baz record: 0 or 1 (assuming \n record delimiters). I think the language in the docs is just being a bit sneaky about not guaranteeing one or the other. It should (but doesn't) mention the fundamental invariant though: if you seek to that byte offset in the data and start the csv reader at that point, then you'll get the corresponding ith record.

@BurntSushi BurntSushi added the doc label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants