Removing per-thread file descriptors #13

newpavlov · 2019-03-15T10:58:36Z

I think we can replace TLS for several targets by using a mutable static RawFd initialized via std::sync::Once. I think it should be safe to read from /dev/urandom using a single file descriptor from several threads without any synchronization, but it's better to check it. A small experiment shows that reading a huge amount of random data in one thread does block other threads.

The text was updated successfully, but these errors were encountered:

dhardy · 2019-03-15T12:58:47Z

Interesting idea. I guess why not?

I wouldn't expect the kernel to care about high-performance simultaneous reading of /dev/urandom from multiple threads anyway, so I don't think performance or blocking would be (much) different either way, but as you say it does avoid the need for TLS on these platforms.

RawFd is type c_int so it should even be possible to use an atomic int — otherwise, I believe Cell should be preferred over mutable statics.

newpavlov · 2019-03-15T13:15:40Z

Yes, I thought about using atomic, but:

c_int size may be equal to 16 bits in some rare cases
Atomic i32 will be stable only from Rust 1.34
IIUC zero is a valid descriptor, usually it's used for stdin, but in some circumstances opening file could result in a zero fd.

I believe Cell should be preferred over mutable statics.

Can you elaborate? Either way we need a static, so why not make it mutable directly without Cell? Mutable static initialization even used as an example for Once::call_once.

One thing I am not sure about is to how process error from File::open("/dev/urandom"). Due to the limitations of Once I was thinking about the following option: adding to enum stored in the static Errored state, i.e. if attempt to open /dev/urandom failed first time, getrandom will always return errors. We could've used call_once_force, but unfortunately it's currently unstable. The most common error case will be probably exhausted file descriptors, so it may happen in practice.

dhardy · 2019-03-16T08:32:56Z

Firstly, AtomicUsize is already stable (though of course larger than necessary). The 0 state isn't a problem since you need to initialise with Once either way (to avoid opening multiple times); of course this means it's not necessary to use atomics.

Mutable statics vs Cell — as I understand, static mut is disliked, mostly due to it being easier to abuse, and redundant with UnsafeCell. There's actually a pre-RFC about removing static mut (2015).

IIRC it's impossible to Clone errors in general, though you could get around that by unwrapping the errno (on some platforms at least). The other potential issue with this approach is that it prevents retry on failure. An alternative might be to use a mutex to only allow one initialisation attempt at once, and return the error directly on failure, retrying on each call.

newpavlov · 2019-03-16T20:13:51Z

AtomicUsize is already stable (though of course larger than necessary)

I was more afraid of 16-bit targets, but looks like *NIXes supported by getrandom do not run on them, so indeed we could use AtomicIsize for now and migrate to AtomicI32 later. Then we could do the following: use two atomics, the first will store a current state (not-initialized, initializing, use syscall, use fd) and the second will store the file descriptor. We will need to be careful with orderings, but at the first glance it shouldn't be too complex.

dhardy · 2019-03-17T11:27:23Z

It depends whether the goal is to initialise at least once or exactly once; mutexes are better for the latter and the std lib is in the process of switching to the parking_lot light-weight mutexes, so this should have very little overhead for doing the init.

However, post-init, mutexes are not required, thus the following flow seems to be what we want:

Check whether initialised (atomic bool), then go to 2, or 3 if not
Read FD from global Cell; done
Wait for lock on global mutex, then
Check again; if ready, go to 2
Try to open file or fail
Update Cell and atomic and return FD

This is a bit complex, but so is Once internally. If we know there is no harm in opening the random device twice, we could skip the mutex, though even if this is safe, repeated sys-calls will have much more overhead than the mutex.

newpavlov · 2019-03-17T14:53:44Z

If we know there is no harm in opening the random device twice

Because we store descriptor and forget initial File, we will leak one descriptor in this scenario, which is undesirable.

I thought about a simpler algorithm:

const USE_SYSCALL: usize = 1 << 0;
const USE_FD: usize = 1 << 1;
const INIT: usize = 1 << 2;

let state = STATE.load(Ordering::Acquire);
if state & USE_SYSCALL != 0 {
    return use_syscall(buf)
} else if state & USE_FD != 0 {
    return use_fd(buf)
}

loop {
    let state = STATE.fetch_or(INIT, Ordering::SeqCst);
    
    return if state & USE_SYSCALL != 0 {
        use_syscall(buf)
    } else if state & USE_FD != 0 {
        use_fd(buf)
    } else if state & INIT == 0 {
        // this function initializes source, sets appropriate bit to 1 in the STATE
        // and fills the given `buf`. If during initialization an error happened,
        // it will set INIT bit to 0 and will return the error
        init_and_use(buf)
    } else {
        // we may want to sleep a short duration of time instead,
        // see: https://github.com/rust-lang/rust/issues/46774
        std::thread::yield_now();
        continue;
    };
}

I think we can change the first ordering from Acquire to Relaxed, but it's better to check with someone more experienced.

With this approach if several threads use uninitialized getrandom simultaneously, we may waste a certain number of loop cycles compared to the parking_lot approach, but I don't think it's a problem in this case.

dhardy · 2019-03-17T16:09:45Z

No, I don't think we need to worry about wasted CPU cycles too much, but using a hand-crafted spin-mutex (especially one without an RAII-style guard) feels like a step back from Rust's safety goals.

BTW the first state read + if logic is redundant, which does make this a very neat algorithm.

newpavlov · 2019-03-17T16:22:55Z

using a hand-crafted spin-mutex (especially one without an RAII-style guard) feels like a step back from Rust's safety goals

The same can be said about using raw file descriptors and NonZeroU32 instead of enum. Assuming correctness of the code I don't think it should pose any problems, after all it's just an implementation detail of low-level library.

The first read is not redundant, it uses a less restricted atomic operation, so I believe it should be more efficient.

ebarnard · 2019-06-25T20:06:36Z

@newpavlov I think that first ordering has to be Aquire depending on how you're storing the file descriptor.

I have an alternative implementation which races to be the first to store a file descriptor and avoids the spin-lock problem.

I think the orderings are correct but I would like someone with more experience to take a look.

static STATE: AtomicUsize = AtomicUsize::new(0);
static FD: AtomicUsize = AtomicUsize::new(0);

const USE_SYSCALL: usize = 1;
const USE_FD: usize = 2;

// `STATE` never becomes uninitialised so we can use a `Relaxed` load for the fast path.
let state = STATE.load(Ordering::Relaxed);
if state & USE_SYSCALL != 0 {
    return use_syscall(buf);
} else if state & USE_FD != 0 {
    // `FD` is either valid or 0 as we have not synchronised with the release-store to either
    // `STATE` or `FD`. If it is 0 we continue on to the slow path.
    let fd = FD.load(Ordering::Relaxed);
    if fd != 0 {
        return use_fd(fd, buf);
    }
}

// The acquire-laod synchronises with any prior release-store to `STATE` so `Relaxed` stores to
// `FD` are visible.
let state = STATE.load(Ordering::Aquire);

if state == USE_SYSCALL {
    use_syscall(buf)
} else if state == USE_FD {
    // The prior aquire-load of `STATE` guarantees that the preceding store to `FD` will be
    // visible so we can use a `Relaxed` load.
    let fd = FD.load(Ordering::Relaxed);
    use_fd(fd, buf)
} else {
    let syscall_available = use_syscall(buf);
    if syscall_available {
        // Release-store to `STATE` to synchronise with any subsequent aquire-loads of `STATE`.
        STATE.store(USE_SYSCALL, Ordering::Release);
    } else {
        let new_fd = open_fd();

        // We do not care about orderings here. Only one thread will successfully store to `FD` and
        // will then release-store to `STATE` so its write to `FD` becomes visible to other
        // threads. Therefore a `Relaxed` ordering is used.
        let current_fd = FD.compare_and_swap(0, new_fd, Ordering::Relaxed);

        if current_fd == 0 {
            // We have just initialised `FD` for the first time. Update `STATE` with a
            // release-store to synchronise with any subsequent aquire-loads of `STATE`.
            STATE.store(USE_FD, Ordering::Release);
            use_fd(new_fd, buf)
        } else {
            // `FD` has already been initialised by another thread. That thread will perform
            // the release-store to `STATE`. Close the file descriptor we have just opened and read
            // from the one stored in `FD`.
            close_fd(new_fd);
            use_fd(current_fd, buf)
        }
    }
};

ebarnard · 2019-06-25T20:10:23Z

Damn. Didn't see #25.

newpavlov mentioned this issue Mar 17, 2019

Remove TLS #14

Closed

josephlr mentioned this issue Jun 10, 2019

Remove TLS by using lazy_static #25

Merged

This was referenced Jun 23, 2019

Use getrandom crate for retrieving system entropy? rust-lang/rust#62079

Open

Use getrandom crate rust-lang/rust#62082

Closed

Use /dev/urandom on macOS #38

Closed

newpavlov closed this as completed in #25 Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing per-thread file descriptors #13

Removing per-thread file descriptors #13

newpavlov commented Mar 15, 2019 •

edited

Loading

dhardy commented Mar 15, 2019

newpavlov commented Mar 15, 2019 •

edited

Loading

dhardy commented Mar 16, 2019

newpavlov commented Mar 16, 2019

dhardy commented Mar 17, 2019

newpavlov commented Mar 17, 2019 •

edited

Loading

dhardy commented Mar 17, 2019

newpavlov commented Mar 17, 2019

ebarnard commented Jun 25, 2019

ebarnard commented Jun 25, 2019

Removing per-thread file descriptors #13

Removing per-thread file descriptors #13

Comments

newpavlov commented Mar 15, 2019 • edited Loading

dhardy commented Mar 15, 2019

newpavlov commented Mar 15, 2019 • edited Loading

dhardy commented Mar 16, 2019

newpavlov commented Mar 16, 2019

dhardy commented Mar 17, 2019

newpavlov commented Mar 17, 2019 • edited Loading

dhardy commented Mar 17, 2019

newpavlov commented Mar 17, 2019

ebarnard commented Jun 25, 2019

ebarnard commented Jun 25, 2019

newpavlov commented Mar 15, 2019 •

edited

Loading

newpavlov commented Mar 15, 2019 •

edited

Loading

newpavlov commented Mar 17, 2019 •

edited

Loading