-
Notifications
You must be signed in to change notification settings - Fork 81
Unexpected key alterations when writing in LMDB can be caused by an undefined behavior #261
Comments
Don't transmute pointers to integers. Just cast them with There's a chance transmuting pointers to integers could cause LLVM miscompilations (if this turns out to be the cause please LMK!) |
Thanks for the answer! That was our original code actually but I tried to use a transmute hoping it would change something... It didn't 😔 EDIT: I patched heed to use |
Hey @RalfJung, Can I ask you for a little bit of help on this one? Do you have time to help us? I am sure the bug is coming from an undefined behavior somewhere, maybe in the You can easily reproduce the bug by cloning the |
a watchpoint in gdb should catch the change no ? |
Yes, it was a good idea! |
I'm afraid I don't really have the time to dig into large foreign codebases like this. I'm happy to answer specific questions such as "is this small self-contained bit of code UB under these circumstances". :) But it looks like you got a lead, so I hope you can figure it out that way. :D |
LMDB can modify the position of the cursor in case of a failure ("The cursor is positioned at the new item, or on failure usually near it." docs). However, in this case |
Yes, thanks for your investigation!
From the function |
Thanks for those pointers into the LMDB code base. I have a theory as to why this is happening and why this is undefined behavior on the Rust side. When the call to It ends up calling As per the comment related to line 6915, LMDB chooses to shorten the data entry in the B+ tree to reclaim unused space on the leaf page (2 bytes). It does this by first deleting the entry with key "and" (the call to Now comes the tricky part. The instances of the So where does the UB then come from? Let's take a look at the method signature of (Bear with me, I'm getting to the part how this causes UB :-) ). Next, let look at line pub unsafe fn into_val(value: &[u8]) -> ffi::MDB_val {
ffi::MDB_val {
mv_data: value.as_ptr() as *mut libc::c_void,
mv_size: value.len(),
}
}
If we had to pinpoint the single point where this goes wrong, then where would it be? I don't believe LMDB is at fault. The documentation at The sequence of events that eventually triggers the UB, is that it takes an immutable reference reference
As to why LMDB modifies the key, it's because the entire node for that particular key is removed due to the shorter data value that is being upserted (and then a new shorter node is re-inserted, but the program is then already in a state of UB, so all bets are off). I hope this analysis helps! |
@Pointerbender I'm not too familiar with the LMDB API, but what you are describing should only be a problem if the Specifically, there's this line: If it's the case that If |
Hello,
I was trying to reproduce this bug meilisearch/transplant#237, so I made a simple binary running only the necessary command to reproduce the bug, you can find it here: https://github.com/irevoire/milli_bug. I use this repository to run all of the commands below.
I also inserted a lot of debug print in heed (the LMDB Rust wrapper) so, as you can see in the
Cargo.toml
, I'm using my own version of heed that you can find here: https://github.com/irevoire/heed, on the "main" branch.So basically, to summarize what I found, at some point, milli tries to insert an entry with a key equal to "and" by calling
heed::mdb::lmdb_ffi::mdb_cursor_put
, but in the end, the&[u8]
milli gave tomdb_cursor_put
got modified and contains "toi". I also find out that this "toi" string is a view inside of the word "antoine", I didn't check but it is likely that it is in the same memory allocation.It seems like the
Mdb_val
struct, which is the equivalent of a slice, i.e. a pointer and a length, is modified in a way that it moves the original pointer forward and makes us read memory at the wrong address, making us read "toi" instead of "and", but we keep the correct length. Also, I checked this recent Reddit post about a similar bug, I tried to patch theRwCursor::put_current
method, by putting the&mut
inside of theunsafe
block but it didn't change anything, at least it didn't fix the bug.@Kerollmops already tried to run
cargo miri
on the heed project but, unfortunately, as it is a wrapper on top of LMDB andmiri
doesn't check memory mapping correctness, it can't be checked with this great tool.If you want to reproduce the bug, you can clone my repository and run
cargo build
. Then runrust-gdb target/debug/milli_bug
then:b lmdb_ffi.rs:118
r
c
key
:x/s (*key).mv_data
, the value should beand
.mdb_cursor_put
:n 5
key
:x/s key.data_ptr
, the value should betoi
.This is not supposed to be possible since LMDB does not modify this variable in place, at least that's what we think, but LMDB is a well-battle-tested library.
The text was updated successfully, but these errors were encountered: