Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out a story for non-determinism and external communication #653

Closed
gnzlbg opened this issue Mar 7, 2019 · 19 comments
Closed

Figure out a story for non-determinism and external communication #653

gnzlbg opened this issue Mar 7, 2019 · 19 comments
Labels
A-interpreter Area: affects the core interpreter C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@gnzlbg
Copy link

gnzlbg commented Mar 7, 2019

Playground:

extern crate rand;

pub fn foo() {
    let _ = rand::thread_rng();
}

fn main() {
    foo();
}

yet many tests use a random number generator. We might want to intercept the FFI calls involved in this, and do something with them (probably forward them? no idea).

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

Yeah -- this is blocked on having some flag and infrastructure for non-determinism.

Let's re-purpose this issue to discuss that.

Related issues: #224 #641

@RalfJung RalfJung changed the title rand::thread_rng() fails under miri Figure out a story for non-determinism Mar 8, 2019
@RalfJung RalfJung mentioned this issue Mar 8, 2019
10 tasks
@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

So, here are my thoughts on that: Miri will have two modes. One is deterministic, that's the current mode. Then there is the non-deterministic mode. In non-deterministic mode, there is a seed that is used as the source for all non-determinism in Miri -- IOW, Miri has an RNG that it seeds and then uses whenever the program makes a non-deterministic choice.

In non-deterministic mode, when the NR_GETRANDOM syscall is made, we generate a new random number from Miri's RNG and return that. We also (eventually) use this same RNG to pick base a address for allocations, so that we can support the full range of pointer arithmetic. If/when Miri supports running multiple threads, this same RNG will control the behavior of the scheduler.

Ideally, we'd have the following invariants:

  1. Miri with a fixed seed is still deterministic, meaning its behavior is fully reproducible (across platforms). However we do not guarantee that different versions of Miri will make the same program run the same way with a fixed seed! More non-deterministic choices might be added, so this guarantee of reproducibility only applies to a fixed version of Miri.
  2. When deterministic Miri runs a program without error, then the program will behave the same under non-deterministic Miri.
  3. Even stronger: the only case where deterministic and non-deterministic Miri differ in behavior is when deterministic Miri shows an Error in the "unsupported" category (see Improve error message: Unsupported operation vs. undefined behavior #417).

Some problems / open questions:

  • Can't call foreign function: clock_gettime #641 is about supporting clock_gettime, that would be a kind of non-determinism not governed by the RNG and hence would violate invariant (1). Should such operations be subject to yet another flag? We'd have three "levels" of Miri: deterministic, seeded non-determinism, and the last level would also allow interaction with the outside world and entirely give up on reproducibility.
  • Our current pointer equality test violates invariant (2). Maybe we should just declare pointer equality unsupported in almost all cases when running in deterministic mode?
  • What is the default mode? At least for cargo miri, probably the mode that supports most things should be default. However, note that non-determinism (and even more so external interaction) weakens the checks Miri does: just because it didn't find UB with one seed, doesn't mean there is no UB with other seeds.

(I should add that I personally thing test suites should always use RNGs with a fixed seed, to avoid spurious failures. But (a) clearly many people have a different opinion, and (b) there are other sources of non-determinism that we definitely want to support, so we might as well support this one too.)

@oli-obk
Copy link
Contributor

oli-obk commented Mar 8, 2019

is about supporting clock_gettime, that would be a kind of non-determinism not governed by the RNG and hence would violate invariant (1). Should such operations be subject to yet another flag?

I think we could just seed the time with the random number generator and just increment by a random nonzero value every time clock_gettime is called.

Maybe we should just declare pointer equality unsupported in almost all cases when running in deterministic mode?

Seems reasonable

However, note that non-determinism (and even more so external interaction) weakens the checks Miri does: just because it didn't find UB with one seed, doesn't mean there is no UB with other seeds.

We can just always choose a random seed if none is supplied, and emit a message before running the code that dumps the seed and informs the user how to rerun with said seed and a disclaimer mentioning that it's now nondeterministic and might accidentally skip over certain kinds of UB.

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

I think we could just seed the time with the random number generator and just increment by a random nonzero value every time clock_gettime is called.

That seems extremely confusing. And people will likely ask for file system or network access eventually. I feel like accessing the clock falls into the same category.

@gnzlbg
Copy link
Author

gnzlbg commented Mar 8, 2019

Both for the clock, the random number generator, etc. I think miri needs to provide two things.

First, the ability to initialize these in such a way that the results are "deterministic". Second, the ability to choose a non-deterministic "source" that miri uses to initialize these.

E.g. by default miri's clock can always start at zero, but maybe an user wants to initialize the clock from the system random device to use miri as a "fuzzing" tool (independently of whether that's a good or bad idea). If miri finds an issue, it would be helpful for the user to input whatever seed miri obtained from the random device - that is, miri needs to both output this information on error, and allow the user to somehow input it.

The seeds for the different sources of non-determinism are not stable API-wise. E.g. different PRNGs have differently-sized seeds, and miri should not be stuck on a particular choice of PRNG forever.

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

Both for the clock, the random number generator, etc. I think miri needs to provide two things.

First, the ability to initialize these in such a way that the results are "deterministic". Second, the ability to choose a non-deterministic "source" that miri uses to initialize these.

I disagree for the clock. The clock falls into the category "communication with the external world", which is different from "pure" non-determinism. I think we should not conflate the two.

The seeds for the different sources of non-determinism are not stable API-wise. E.g. different PRNGs have differently-sized seeds, and miri should not be stuck on a particular choice of PRNG forever.

I don't understand the point of this comment. Nothing I said sticks Miri to any particular choice of PRNG. And what do seed sizes have to do with anything?
Miri does not care about what PRNG your program uses. It will intercept FFI calls to, e.g., libc::syscall(NR_GETRANDOM, ..) and follow the contract of that syscall by filling in some bytes non-deterministically. It will use its own PRNG for that purpose, but that has nothing to do with which PRNG the interpreted program uses.

@gnzlbg
Copy link
Author

gnzlbg commented Mar 8, 2019

I don't understand the point of this comment.

If I get an error and want to reproduce, I need to be able to pass the seed used by miri for its own internal PRNG, e.g., [1, 2, 3, 4]. I want to test for this error, so I add cargo miri test --with-prng-seed=1,2,3,4 to my CI and call it a day.

Later, miri gets updated to use a different PRNG, which requires a 16 element seed. This breaks my CI because I'm not passing --with-prng-seed enough arguments.

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

All RNGs can be seeded from a byte slice, right? I was just going to suggest -Zmiri-seed=<hex string>. Or we might just make it a 64bit hex number and use seed_from_u64.

@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-interpreter Area: affects the core interpreter labels Mar 8, 2019
@gnzlbg
Copy link
Author

gnzlbg commented Mar 8, 2019

All RNGs can be seeded from a byte slice, right?

Yes.

Or we might just make it a 64bit hex number and use seed_from_u64.

Maybe use -Zmiri-seed-u64= for that ? rand recommends at least a 100-bit wide seed, but this often depends on the PRNG underneath.

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

Actually this brings me to another concern I forgot to mention: if people run crypto code in Miri and expect that to be secure, we have a problem. Given that we want to start everything from a fixed seed, the randomness from NR_GETRANDOM will not be suited for crypto.

If we accept that Miri shouldn't run crypto, it's really not a big deal to initialize with 64bits instead of more.

I think -Zmiri-seed-u64 exposes implementation details unnecessarily. The seed is going to be a hex string, most likely; we just have to decide whether we interpret this hex string as u64 (meaning it has a max length, but seed_from_u64 makes sure that the RNG will be well-seeded even if you use, say, 1) or whether we interpret it as [u8] and feed the entire thing to the seed directly (meaning you can properly seed arbitrarily-sized RNGs, but if you use 1 as the seed the randomness is going to be pretty bad).

Aaron1011 added a commit to Aaron1011/miri that referenced this issue Apr 7, 2019
Part of rust-lang#653

This allows us to properly implement getrandom(),
which unlocks the default HashMap type (e.g. HashMap<K, V>)
with RandomState)

This commit adds a new '-Zmiri-seed=<seed>' option. When present,
this option takes a 64-bit hex value, which is used as the seed
to an internal PRNG. This PRNG is used to implement the 'getrandom()'
syscall.

When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.

To allow using 'rand' properly, I bumped the patch versions of a
few dependencies, which brings them up to the current version
of the 'rand' crate
Aaron1011 added a commit to Aaron1011/miri that referenced this issue Apr 7, 2019
Part of rust-lang#653

This allows us to properly implement getrandom(),
which unlocks the default HashMap type (e.g. HashMap<K, V>)
with RandomState)

This commit adds a new '-Zmiri-seed=<seed>' option. When present,
this option takes a 64-bit hex value, which is used as the seed
to an internal PRNG. This PRNG is used to implement the 'getrandom()'
syscall.

When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.
Aaron1011 added a commit to Aaron1011/miri that referenced this issue Apr 7, 2019
Part of rust-lang#653

This allows us to properly implement getrandom(),
which unlocks the default HashMap type (e.g. HashMap<K, V>)
with RandomState)

This commit adds a new '-Zmiri-seed=<seed>' option. When present,
this option takes a 64-bit hex value, which is used as the seed
to an internal PRNG. This PRNG is used to implement the 'getrandom()'
syscall.

When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.
Aaron1011 added a commit to Aaron1011/miri that referenced this issue Apr 7, 2019
Part of rust-lang#653

This allows us to properly implement getrandom(),
which unlocks the default HashMap type (e.g. HashMap<K, V>)
with RandomState)

This commit adds a new '-Zmiri-seed=<seed>' option. When present,
this option takes a 64-bit hex value, which is used as the seed
to an internal PRNG. This PRNG is used to implement the 'getrandom()'
syscall.

When '-Zmiri-seed' is not passed, 'getrandom()' will be disabled.
@RalfJung
Copy link
Member

Seeding RNGs with system entropy now works in Miri. However, the question remains what our story shall be for other kinds of interaction with the OS, like accessing the file system or querying the current time.

@RalfJung RalfJung changed the title Figure out a story for non-determinism Figure out a story for non-determinism and external communication Apr 26, 2019
@gnzlbg
Copy link
Author

gnzlbg commented Apr 26, 2019

I've opened an issue in the UCGs to track up to which level floating-point arithmetic should be deterministic: rust-lang/unsafe-code-guidelines#123

E.g. always non-deterministic vs some degree of determinisms going from "always deterministic" to "for the same inputs, host, target, compiler options, and toolchain", or "for the same input, host, target, and compiler options (only - independent of toolchain)" or "for the same input, host, target, and default compiler options (independent of toolchain - some compiler options might introduce non-determinism, excluding optimization levels)" or "same as before, but where optimization level is allowed to introduce non-determinism", etc.

I think it would be interesting to hear what the limits would be with respect to what would be feasible to implement in miri.

@CryZe
Copy link

CryZe commented Apr 26, 2019

What's the story in regard to pointer math, considering that is also non-deterministic? I've been playing around with cargo miri test and noticed that some dependencies use pointer math to align pointers for some speedups (in particular memchr), which is where miri refuses to continue. Maybe there already is a solution to that, but at least the way it is written there, breaks miri (https://github.com/BurntSushi/rust-memchr/blob/master/src/fallback.rs#L64)

@bjorn3
Copy link
Member

bjorn3 commented Apr 26, 2019

@CryZe there is https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.align_offset.

@CryZe
Copy link

CryZe commented Apr 26, 2019

That only somewhat solves the issue. While miri does accept this, it returns usize::max_value() indicating that it failed aligning the pointer, which probably doesn't solve the problem.

@RalfJung
Copy link
Member

What's the story in regard to pointer math, considering that is also non-deterministic?

That would be #224.

I think it would be interesting to hear what the limits would be with respect to what would be feasible to implement in miri.

Well, basically anything someone is willing to actually implement. ;) The moment things get symbolic, though, everything becomes very painful. We did that for allocations because we have to anyways for CTFE; not sure if this is worth the cost for floating point.

@gnzlbg
Copy link
Author

gnzlbg commented Apr 29, 2019

Well, basically anything someone is willing to actually implement.

So, does that mean that it would be "ok" (just hard to implement), for the results of floating-point math to start depending on LLVM's inlining heuristics? That would mean that a "good" miri implementation would need to replicate the behavior of LLVM's optimization pipeline, optimizing the MIR accordingly, and evaluating the MIR after LLVM optimizations, including backend optimizations, and would probably need to do so for the different LLVM versions that Rust supports today (from LLVM 6 to LLVM 9). I suppose this would need a MIR -> LLVM-IR -> opt (with backend opts) -> LLVM-IR -> MIR pass in miri or similar (preserving types, lifetimes, etc.).

@RalfJung
Copy link
Member

So, does that mean that it would be "ok" (just hard to implement), for the results of floating-point math to start depending on LLVM's inlining heuristics?

I don't think that would be a reasonable language specification. But discussing language spec is outside the scope of a Miri issue. that would be a UCG discussion.

I think proper handling of floating point needs some foundational work to have a spec before we can implement anything reasonable in Miri. Incidentally, some of my colleagues at MPI are working on exactly that. ;)

@RalfJung
Copy link
Member

Closing in favor of #800: the discussion here mixed up the non-determinism story (which we are close to resolving, with intptrcast becoming the default mode soon) and external communication. Also, RNGs (which the issue here was originally about) have worked for quite some time already now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interpreter Area: affects the core interpreter C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

No branches or pull requests

5 participants