Hazard with `ReadBytesExt` methods and continuing after errors #208

eric-seppanen · 2024-08-02T21:04:00Z

I don't think this is a bug in byteorder. But I thought it demonstrates an interesting hazard that might be worth documenting for others.

Some byteorder::ReadBytesExt methods rely on std::io::Read::read_exact, which has a warning about the error case:

If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.

... If this function returns an error, it is unspecified how many bytes it has read, but it will never read more than would be necessary to completely fill the buffer.

This means that e.g. byteorder::read_u16 has the same hazard: if it encounters the end of the reader, the reader is now in an unspecified state.

So code like this is wrong:

pub fn parse_packet(packet: &[u8]) -> Vec<u16> {
    let mut result = Vec::new();
    let mut reader = Cursor::new(packet);

    // Read as many u16s as we can
    while let Ok(word) = reader.read_u16::<LittleEndian>() {
        result.push(word);
    }
    // If there was a byte left over, take that too.
    if let Ok(odd_end_val) = reader.read_u8() {
        result.push(odd_end_val.into())
    }
    result
}

I recently stumbled over code that did exactly this, because the standard library changed its (unspecified) behavior. In older versions, the function above behaved as expected (remainder bytes stayed in the reader after the error). In rust 1.80.0, the remainder bytes are consumed. The following unit test demonstrates (comment out the assert for your version):

#[test]
fn test_odd() {
    let input = [1, 2, 3];
    let output = parse_packet(&input);

    // Rust <= 1.79
    assert_eq!(output, [0x0201, 0x0003]);

    // Rust 1.80
    assert_eq!(output, [0x0201]);
}

The byteorder::ReadBytesExt methods are documented to track the read_exact error behavior:

Errors

This method returns the same errors as Read::read_exact.

But maybe more should be done? I'm not sure how often this pattern might exist in the wild. Maybe the docs should be extended to warn people that the input stream can't be relied on after an error is returned?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2024-08-02T21:27:41Z

Thanks for the write-up.

I'm fine adding a note to the docs. Patches welcome.

I'm not sure byteorder itself should be trying anything else here. I'm not sure there's anything else to be done. And in general, I consider it bad juju to keep doing reads while ignoring errors anyway.

eric-seppanen · 2024-08-02T21:37:52Z

I agree for io::Read impls in general.

I also have sympathy for anyone that missed this detail, because a Cursor over a slice looks infallible but still has this "unspecified state" constraint. A Cursor along with byteorder::ReadBytesExt is an temptingly easy way to slap together a decoder of a simple binary protocol.

I think that perhaps rust-lang/rust#125123 is the change that modified the example's behavior under 1.80.0. The change occurs between nightly-2024-05-21 and nightly-2024-05-22.

std::io::Read::read_exact documentation says that if an error occurs, the state of the reader is unspecified: it may have consumed some number of bytes (between zero and the size of the buffer). This makes it unwise to continue reading after an error, since it's not possible to know where the read begins. A caller may be surprised by this issue, for example by calling read_u16() until it fails, then calling read_u8() to collect a remainder byte. This is not guaranteed to work. This was specifically observed to behave one way on rust 1.79.0, and then a different way in rust 1.80.0 when using std::io::Cursor as the reader. Closes BurntSushi#208 (Documents the problem, which is the best we can do.)

workingjubilee · 2024-08-03T01:34:07Z

If the new behavior of Cursor's implementation seems more inconvenient for actual usage, I am happy to review another PR to adjust such implementation details. Unspecified is unspecified, after all. But if we do so then we should at least internally document our apparent preference.

( It also seems likely to me that Read has been implemented "both ways" by various types in the ecosystem, so this definitely can't be expected to cut in a particular direction for Read in general. )

eric-seppanen · 2024-08-03T02:18:16Z

@workingjubilee I don't think Cursor should change.

I ran my test against a std::io::File and the new Cursor behavior matches the File behavior: trying to read consumes some bytes. So it was the old Cursor behavior that was a bit weird, and relying on that behavior was wrong.

workingjubilee · 2024-08-03T02:23:49Z

Thanks for the additional context! That's useful to know.

eric-seppanen mentioned this issue Aug 2, 2024

ReadBytesExt: document the danger of reading after an error #209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hazard with `ReadBytesExt` methods and continuing after errors #208

Hazard with `ReadBytesExt` methods and continuing after errors #208

eric-seppanen commented Aug 2, 2024 •

edited

Loading

Errors

BurntSushi commented Aug 2, 2024

eric-seppanen commented Aug 2, 2024 •

edited

Loading

workingjubilee commented Aug 3, 2024 •

edited

Loading

eric-seppanen commented Aug 3, 2024

workingjubilee commented Aug 3, 2024

Hazard with ReadBytesExt methods and continuing after errors #208

Hazard with ReadBytesExt methods and continuing after errors #208

Comments

eric-seppanen commented Aug 2, 2024 • edited Loading

Errors

BurntSushi commented Aug 2, 2024

eric-seppanen commented Aug 2, 2024 • edited Loading

workingjubilee commented Aug 3, 2024 • edited Loading

eric-seppanen commented Aug 3, 2024

workingjubilee commented Aug 3, 2024

Hazard with `ReadBytesExt` methods and continuing after errors #208

Hazard with `ReadBytesExt` methods and continuing after errors #208

eric-seppanen commented Aug 2, 2024 •

edited

Loading

eric-seppanen commented Aug 2, 2024 •

edited

Loading

workingjubilee commented Aug 3, 2024 •

edited

Loading