Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Unsound: 1 out-of-bound read and 5 unaligned memory access. #52

Closed
YoshikiTakashima opened this issue Aug 13, 2020 · 8 comments

Comments

@YoshikiTakashima
Copy link
Contributor

Hello.

I'm Yoshiki, a PhD student at CMU.

We are testing a tool to automatically generate test cases from API data and existing tests.

A few of our generated test cases were reported as "unsound" by Miri, mostly due to unaligned or out-of-bound memory. I've attached a Tarball that contains the test cases that induce this behavior.

Please note that, because the framework leverages existing tests as templates, some of the test cases overlap with existing test cases for the library. In particular,

decode(BIG5, b"", &"");//LAYER:0

also shows up in the manually written test cases.

In case this is intended behavior, or you would prefer if I focused on other parts of the code, please let me know.

Thanks.
~Yoshiki

@hsivonen
Copy link
Owner

Thank you!

My results so far:

  • I can't reproduce the out_of_bound case locally in Miri.
  • I have stepped through cases 118, 140, and 596 in x86 debug executions, and the computations were OK.

Based on reading the error messages and based on my previous discussion with @RalfJung, I believe these to be instances of the Miri issue mentioned in the error messages.

Due to predating align_to in the standard library, encoding_rs does its own alignment math. As I understand it, the Miri errors are based on pointers carrying alignment metadata in the Miri execution as opposed to Miri actually checking whether pointer math has been done correctly when casting from a pointer with smaller alignment to a pointer with a larger alignment.

Also, as I understand it, migrating to align_to would result in the interesting code paths getting tested but would result in Miri taking the least interesting code path.

Does this analysis look correct to you?

@RalfJung
Copy link
Contributor

@hsivonen what you say makes sense for unaligned accesses, but should not lead to out-of-bounds errors.

I should really prioritize fixing that alignment problem...

@YoshikiTakashima
Copy link
Contributor Author

Hi @hsivonen.

Thank you for the analysis on alignment. I agree with you that these are safe behaviors given the nature of the library.

With regard to out-of-bound, I should correct myself that what MIRI reports is not a read per-se, but rather creating a pointer that is outside the bounds of allocated memory. If you run the cargo miri test inside the attached tarball, you should get a long trace and an error that looks like this.

error: Undefined Behavior: inbounds test failed: pointer must be in-bounds at offset 10, but is outside bounds of alloc93891 which has size 2

Is the out_of_bound() test case passing on your side?

Since decode(BIG5, b"", &""); also shows up in the manually written test cases, I was able to replicate the identical behavior by cloning the latest version of this repo, and running cargo miri test inside the cloned directory. Although, in this case, the memory offset will be 8 and you will have to wait about 15 seconds for MIRI to reach that specific test case.

For your information, my toolchain versions are:

$ cargo --version
cargo 1.46.0-nightly (fede83ccf 2020-07-02)
$ cargo miri --version
miri 0.1.0 (fd81012 2020-06-28)

and the tarball imports encoding_rs = "0.8.23".

Finally, thank you for the help @RalfJung.

Thanks,
~Yoshiki

@RalfJung
Copy link
Contributor

Due to predating align_to in the standard library, encoding_rs does its own alignment math. As I understand it, the Miri errors are based on pointers carrying alignment metadata in the Miri execution as opposed to Miri actually checking whether pointer math has been done correctly when casting from a pointer with smaller alignment to a pointer with a larger alignment.

Since this issue was reported, Miri has been changed to properly support hand-rolled alignment math, at the expense of having to run alignment-related tests a dozen times or so to be sure that they do not pass by chance.

@RalfJung
Copy link
Contributor

RalfJung commented Aug 22, 2020

@hsivonen I can reproduce an OOB error, though it seems to be slightly different. With latest Miri, when I run cargo +miri miri test -- -- big5 for this crate, I get

error: Undefined Behavior: inbounds test failed: pointer must be in-bounds at offset 8, but is outside bounds of alloc316674 which has size 6
    --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mut_ptr.rs:232:18
     |
232  |         unsafe { intrinsics::offset(self, count) as *mut T }
     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ inbounds test failed: pointer must be in-bounds at offset 8, but is outside bounds of alloc316674 which has size 6
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
             
     = note: inside `std::ptr::mut_ptr::<impl *mut u16>::offset` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mut_ptr.rs:232:18
     = note: inside `std::ptr::mut_ptr::<impl *mut u16>::add` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mut_ptr.rs:643:18
note: inside `ascii::ascii_to_basic_latin` at src/ascii.rs:214:29
    --> src/ascii.rs:214:29
     |
214  |                         if (dst.add(src_until_alignment) as usize) & ALU_ALIGNMENT_MASK != 0 {
     |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
1475 |         basic_latin_alu!(ascii_to_basic_latin, u8, u16, ascii_to_basic_latin_stride_alu);
     |         --------------------------------------------------------------------------------- in this macro invocation
note: inside `handles::Utf16Destination::copy_ascii_from_check_space_astral` at src/handles.rs:693:17
    --> src/handles.rs:693:17
     |
693  |                 ascii_to_basic_latin(src_remaining.as_ptr(), dst_remaining.as_mut_ptr(), length)
     |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `big5::Big5Decoder::decode_to_utf16_raw` at src/macros.rs:187:19
    --> src/macros.rs:187:19
     |
187  |               match dest.$copy_ascii(&mut $source) {
     |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     | 
    ::: src/big5.rs:76:5
     |
76   | /     ascii_compatible_two_byte_decoder_functions!(
77   | |         {
78   | |             // If lead is between 0x81 and 0xFE, inclusive,
79   | |             // subtract offset 0x81.
...    |
163  | |         check_space_astral,
164  | |         false);
     | |_______________- in this macro invocation
note: inside `variant::VariantDecoder::decode_to_utf16_raw` at src/variant.rs:130:48
    --> src/variant.rs:130:48
     |
130  |             VariantDecoder::Big5(ref mut v) => v.decode_to_utf16_raw(src, dst, last),
     |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `Decoder::decode_to_utf16_checking_end` at src/macros.rs:1613:39
    --> src/macros.rs:1613:39
     |
1613 |           let (result, read, written) = self.variant
     |  _______________________________________^
1614 | |                                           .$decode_to_utf_raw(src, dst, last);
     | |_____________________________________________________________________________^
     | 
    ::: src/lib.rs:4194:5
     |
4194 | /     public_decode_function!(/// Incrementally decode a byte stream into UTF-16
4195 | |                             /// _without replacement_.
4196 | |                             ///
4197 | |                             /// See the documentation of the struct for
...    |
4208 | |                             decode_to_utf16_checking_end_with_offset,
4209 | |                             u16);
     | |_________________________________- in this macro invocation
note: inside `Decoder::decode_to_utf16_without_replacement` at src/macros.rs:1303:28
    --> src/macros.rs:1303:28
     |
1303 |                       return self.$decode_to_utf_checking_end(src, dst, last);
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     | 
    ::: src/lib.rs:4194:5
     |
4194 | /     public_decode_function!(/// Incrementally decode a byte stream into UTF-16
4195 | |                             /// _without replacement_.
4196 | |                             ///
4197 | |                             /// See the documentation of the struct for
...    |
4208 | |                             decode_to_utf16_checking_end_with_offset,
4209 | |                             u16);
     | |_________________________________- in this macro invocation
note: inside `Decoder::decode_to_utf16` at src/lib.rs:4159:43
    --> src/lib.rs:4159:43
     |
4159 |               let (result, read, written) = self.decode_to_utf16_without_replacement(
     |  ___________________________________________^
4160 | |                 &src[total_read..],
4161 | |                 &mut dst[total_written..],
4162 | |                 last,
4163 | |             );
     | |_____________^
note: inside `testing::decode_to_utf16_with_boundary` at src/testing.rs:112:13
    --> src/testing.rs:112:13
     |
112  |             decoder.decode_to_utf16(tail, &mut dest[total_written..], true);
     |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `testing::decode_to_utf16_impl` at src/testing.rs:79:9
    --> src/testing.rs:79:9
     |
79   |         decode_to_utf16_with_boundary(encoding, head, tail, expect);
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `testing::decode_without_padding_impl` at src/testing.rs:40:5
    --> src/testing.rs:40:5
     |
40   |     decode_to_utf16_impl(encoding, bytes, &utf16_from_utf8(expect)[..], padding);
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `testing::decode` at src/testing.rs:25:9
    --> src/testing.rs:25:9
     |
25   |         decode_without_padding_impl(encoding, &vec[..], &string[..], i);
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `big5::tests::decode_big5` at src/big5.rs:272:9
    --> src/big5.rs:272:9
     |
272  |         decode(BIG5, bytes, expect);
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `big5::tests::test_big5_decode` at src/big5.rs:285:9
    --> src/big5.rs:285:9
     |
285  |         decode_big5(&[0x61u8, 0x62u8], &"\u{0061}\u{0062}");
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at src/big5.rs:280:5
    --> src/big5.rs:280:5
     |
280  | /     fn test_big5_decode() {
281  | |         // Empty
282  | |         decode_big5(b"", &"");
283  | |
...    |
357  | |         decode_big5(&[0x61u8, 0x81u8], &"\u{0061}\u{FFFD}");
358  | |     }
     | |_____^

@hsivonen
Copy link
Owner

@RalfJung @YoshikiTakashima Thank you. The OOB case indeed materialized an OOB pointer though didn't dereference it. Fixed in #53 . I'm leaving this issue open to remind me to check the SIMD code for the same pattern.

@YoshikiTakashima
Copy link
Contributor Author

@hsivonen Got it.

Big thanks to @bjorn3, who suggested the fix.

@hsivonen
Copy link
Owner

@RalfJung @YoshikiTakashima Thank you. The OOB case indeed materialized an OOB pointer though didn't dereference it. Fixed in #53 . I'm leaving this issue open to remind me to check the SIMD code for the same pattern.

For clarity, closing this issue and spinning revision of the SIMD code into a distinct issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants