-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out a story for non-determinism and external communication #653
Comments
So, here are my thoughts on that: Miri will have two modes. One is deterministic, that's the current mode. Then there is the non-deterministic mode. In non-deterministic mode, there is a seed that is used as the source for all non-determinism in Miri -- IOW, Miri has an RNG that it seeds and then uses whenever the program makes a non-deterministic choice. In non-deterministic mode, when the Ideally, we'd have the following invariants:
Some problems / open questions:
(I should add that I personally thing test suites should always use RNGs with a fixed seed, to avoid spurious failures. But (a) clearly many people have a different opinion, and (b) there are other sources of non-determinism that we definitely want to support, so we might as well support this one too.) |
I think we could just seed the time with the random number generator and just increment by a random nonzero value every time
Seems reasonable
We can just always choose a random seed if none is supplied, and emit a message before running the code that dumps the seed and informs the user how to rerun with said seed and a disclaimer mentioning that it's now nondeterministic and might accidentally skip over certain kinds of UB. |
That seems extremely confusing. And people will likely ask for file system or network access eventually. I feel like accessing the clock falls into the same category. |
Both for the clock, the random number generator, etc. I think miri needs to provide two things. First, the ability to initialize these in such a way that the results are "deterministic". Second, the ability to choose a non-deterministic "source" that miri uses to initialize these. E.g. by default miri's clock can always start at zero, but maybe an user wants to initialize the clock from the system random device to use miri as a "fuzzing" tool (independently of whether that's a good or bad idea). If miri finds an issue, it would be helpful for the user to input whatever seed miri obtained from the random device - that is, miri needs to both output this information on error, and allow the user to somehow input it. The seeds for the different sources of non-determinism are not stable API-wise. E.g. different PRNGs have differently-sized seeds, and miri should not be stuck on a particular choice of PRNG forever. |
I disagree for the clock. The clock falls into the category "communication with the external world", which is different from "pure" non-determinism. I think we should not conflate the two.
I don't understand the point of this comment. Nothing I said sticks Miri to any particular choice of PRNG. And what do seed sizes have to do with anything? |
If I get an error and want to reproduce, I need to be able to pass the seed used by miri for its own internal PRNG, e.g., Later, miri gets updated to use a different PRNG, which requires a 16 element seed. This breaks my CI because I'm not passing |
All RNGs can be seeded from a byte slice, right? I was just going to suggest |
Yes.
Maybe use |
Actually this brings me to another concern I forgot to mention: if people run crypto code in Miri and expect that to be secure, we have a problem. Given that we want to start everything from a fixed seed, the randomness from If we accept that Miri shouldn't run crypto, it's really not a big deal to initialize with 64bits instead of more. I think |
Part of rust-lang#653 This allows us to properly implement getrandom(), which unlocks the default HashMap type (e.g. HashMap<K, V>) with RandomState) This commit adds a new '-Zmiri-seed=<seed>' option. When present, this option takes a 64-bit hex value, which is used as the seed to an internal PRNG. This PRNG is used to implement the 'getrandom()' syscall. When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled. To allow using 'rand' properly, I bumped the patch versions of a few dependencies, which brings them up to the current version of the 'rand' crate
Part of rust-lang#653 This allows us to properly implement getrandom(), which unlocks the default HashMap type (e.g. HashMap<K, V>) with RandomState) This commit adds a new '-Zmiri-seed=<seed>' option. When present, this option takes a 64-bit hex value, which is used as the seed to an internal PRNG. This PRNG is used to implement the 'getrandom()' syscall. When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.
Part of rust-lang#653 This allows us to properly implement getrandom(), which unlocks the default HashMap type (e.g. HashMap<K, V>) with RandomState) This commit adds a new '-Zmiri-seed=<seed>' option. When present, this option takes a 64-bit hex value, which is used as the seed to an internal PRNG. This PRNG is used to implement the 'getrandom()' syscall. When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.
Part of rust-lang#653 This allows us to properly implement getrandom(), which unlocks the default HashMap type (e.g. HashMap<K, V>) with RandomState) This commit adds a new '-Zmiri-seed=<seed>' option. When present, this option takes a 64-bit hex value, which is used as the seed to an internal PRNG. This PRNG is used to implement the 'getrandom()' syscall. When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.
Seeding RNGs with system entropy now works in Miri. However, the question remains what our story shall be for other kinds of interaction with the OS, like accessing the file system or querying the current time. |
I've opened an issue in the UCGs to track up to which level floating-point arithmetic should be deterministic: rust-lang/unsafe-code-guidelines#123 E.g. always non-deterministic vs some degree of determinisms going from "always deterministic" to "for the same inputs, host, target, compiler options, and toolchain", or "for the same input, host, target, and compiler options (only - independent of toolchain)" or "for the same input, host, target, and default compiler options (independent of toolchain - some compiler options might introduce non-determinism, excluding optimization levels)" or "same as before, but where optimization level is allowed to introduce non-determinism", etc. I think it would be interesting to hear what the limits would be with respect to what would be feasible to implement in miri. |
What's the story in regard to pointer math, considering that is also non-deterministic? I've been playing around with |
That only somewhat solves the issue. While miri does accept this, it returns usize::max_value() indicating that it failed aligning the pointer, which probably doesn't solve the problem. |
That would be #224.
Well, basically anything someone is willing to actually implement. ;) The moment things get symbolic, though, everything becomes very painful. We did that for allocations because we have to anyways for CTFE; not sure if this is worth the cost for floating point. |
So, does that mean that it would be "ok" (just hard to implement), for the results of floating-point math to start depending on LLVM's inlining heuristics? That would mean that a "good" miri implementation would need to replicate the behavior of LLVM's optimization pipeline, optimizing the MIR accordingly, and evaluating the MIR after LLVM optimizations, including backend optimizations, and would probably need to do so for the different LLVM versions that Rust supports today (from LLVM 6 to LLVM 9). I suppose this would need a MIR -> LLVM-IR -> opt (with backend opts) -> LLVM-IR -> MIR pass in miri or similar (preserving types, lifetimes, etc.). |
I don't think that would be a reasonable language specification. But discussing language spec is outside the scope of a Miri issue. that would be a UCG discussion. I think proper handling of floating point needs some foundational work to have a spec before we can implement anything reasonable in Miri. Incidentally, some of my colleagues at MPI are working on exactly that. ;) |
Closing in favor of #800: the discussion here mixed up the non-determinism story (which we are close to resolving, with intptrcast becoming the default mode soon) and external communication. Also, RNGs (which the issue here was originally about) have worked for quite some time already now. |
Playground:
yet many tests use a random number generator. We might want to intercept the FFI calls involved in this, and do something with them (probably forward them? no idea).
The text was updated successfully, but these errors were encountered: