You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One piece of documentation says csv_index::RandomAccessSimple stores indices to byte offsets corresponding to the start of records, while another piece of documentation says it stores /approximate/ offsets. Which is it? If approximate, how would an approximate index be used to locate the actual start of a record?
I can't remember, sadly. I think what that's referring to is that there may be cases where the byte offset is before what a human might consider to be the start of a CSV record when reading the CSV data, but that the byte offset is still correct assuming you use the csv crate (or its underlying csv-core implementation) to read the record for that position. This might sound weird, and that's because it is. For example, csv-core ignores empty lines, so if you have:
foo,bar,baz
Then there are technically 2 valid byte offsets for the start of the foo,bar,baz record: 0 or 1 (assuming \n record delimiters). I think the language in the docs is just being a bit sneaky about not guaranteeing one or the other. It should (but doesn't) mention the fundamental invariant though: if you seek to that byte offset in the data and start the csv reader at that point, then you'll get the corresponding ith record.
One piece of documentation says csv_index::RandomAccessSimple stores indices to byte offsets corresponding to the start of records, while another piece of documentation says it stores /approximate/ offsets. Which is it? If approximate, how would an approximate index be used to locate the actual start of a record?
exact: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/lib.rs#L19
approximate: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/simple.rs#L14
The text was updated successfully, but these errors were encountered: