Unnecessary sources of overhead in uniffi #244

thomcc · 2020-08-14T01:28:10Z

(Uh, this was way more than I intended to write about this 😓... I think my thoughts on the locking situation probably were worth writing down, since in general it helps explain what I believe the problem space to be, what lead me to the HandleMap approach, why locking on the other langs is tricky, etc. This was probably important to get down somewhere, so I'm glad I did)

I've shared this crate with a few people, but in general they respond negatively when they realize how much overhead uniffi introduces. (Filing this because a friend of mine finally got back to me about uniffi today, but I've heard it 3 or so times)

There are two main causes of overhead:

Serialization across the boundary.
Using ConcurrentHandleMap, when ConcurrentHandleMap requires a locking a std::sync::RwLock, a std::sync::Mutex, and performing a number of checks on the handle for validity.

Serialization

The 1st could be avoided for e.g. #[repr(C)] structs and enums with defined repr (C, iN, C+iN. note that this well defined for way more cases than you'd expect, thanks to the needs of cbindgen and webrender).

Passing and returning Vec<T> for ffi-safe T or String can be done by converting to a repr-c struct containing (pointer, length, capacity).

Hell, right now we use nul-terminated C-strings (I think), and really there's no real reason to do that over e.g. #[repr(C)] struct Str(*const c_char, usize);.

Locking

Various options.

Supporting the use non-std::sync locking (probably parking_lot) in ffi_support::ConcurrentHandleMap. This is essentially trivial.
A ffi_support::ConcurrentHandleMap which is really a RwLock<HandleMap<RwLock>> or RwLock<HandleMap<AtomicRefcell>> instead of a Mutex. Note that RwLock has higher over head than Mutex, so it would really only be good if we found a way to opt-out of the locking that happens on the non-rust side of the FFI. OTOH, AtomicRefcell is really only viable because we do all the locking on the other side.
There exist algorithms for writing a lock-free HandleMap-alikes*. This is not a good choice for uniffi, as we're using handles because they're more "obviously correct" than pointers, and lock-free data structures are almost never "obviously correct".
Finally, and what I think is a better safety vs complexity trade-off: just use pointers. (Note that you need to enforce T: Send for any type you do this with, and will be relying on locking happening in the caller. More on that in a sec). Continued below.

Using Pointers

This is what we used to do, but stopped because it was to hard to be sure about correctness, e.g. double-free and such, especially in the face of manually written bindings. Sanitizers help here (issue forthcoming), but it's also just way less of an issue if you can be sure mistakes don't happen becuase you're generating the code.

For anything super tricky, like shared ownership (e.g. cases where you'd have a handle owned by multiple types) https://github.com/Manishearth/triomphe/ is stellar, and contains ffi-safe Arc utilities extracted from code used in servo and firefox. That said I don't think we've ever needed or wanted this, it's better to just have the owner of the handle itself be shared.

Additionally, for T: Send + Sync, you still can never give a &mut T out, but you also don't need to. These types are thread safe, and don't need any locking. We even use this in a-s for the interrupt handles. This capability seems important, even if you don't care about the overhead.

Note that you do need to ensure that these pointers are only freed once though. Atomic types can help there though, and triomphe can for the hard cases. See how the destruciton of the interrupt handle was done.

Ensuring non-Rust locks are sufficient

So, locking. Since you'd be relying on the locks on the non-rust side of the FFI. You need be sure that either you lock is sufficiently powerful, or you prevent it being used in a way that would cause an issue. This can be tricky.

Lets look at Kotlin/Java here, since it's a good example and is illuminating:

@Synchronized in java/kotlin is not equivalent to Mutex, but is a recursive/reentrant mutex. As far as I can tell, there are no non-reentrant locks in java's stdlib, probably because if you don't care about "unique access" as a core part of your safety requirements... a reentrant mutex is just a mutex which deadlocks less.

Anyway, Rust cares, and so there's a big difference between a recursive mutex and a non-recursive one: While &Mutex<T> => &mut T is sound, a &RecursiveMutex<T> can only soundly give out a &T. This might seem like it's pointless, but RecursiveMutex<T> still upgrades T: Send to RecursiveMutex<T>: Send + Sync. You could give a Arc<RecursiveMutex<RefCell<T>>> to other threads, although in Rust doing so would seem very confused.

Anyway, While @Synchronized seems non-viable, java has a recursive mutex in it's concurrent.utils.locks package with enough functionality to easily build what we need: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html. This offers an isHeldByCurrentThread, which we just need to check before locking. That is, we'd do so if (lock.isHeldByCurrentThread()) { throw SomeException(...); } lock.lock() for all locking, and then we can soundly hand out &mut T` to Rust.

That said, we could build this on top of @Synchronized too by testing a flag after we acquired the lock. (Or, equivalently, reimplementing only the writer tracking of RefCell<T>)

For other targets... Swift has NSLock which is enough, and it can use pthread_mutex_t as well. No clue about Python but I always heard it didn't even have threads? No idea. If that's truly the case and you can't get a system mutex or anything, then you can do it with a flag or a refcell-alike.

Footnote: lock-free handlemaps

* Okay, so, as I mentioned it's a bad fit for uniffi, but I think lock-free concurrent handlemaps are a really cool data structure so I can't resist giving some links:

Tokio has one in here https://docs.rs/tokio/0.2.22/src/tokio/util/slab/mod.rs.html, but they very recently switched away from it for an implementation that's lower performance but can return memory back to the OS rather than always keeping it at the high-water-mark. They haven't cut a release with the new version, so maybe they're still going to modify it to make it lock free? No clue. Maybe it's not worth it.

Tokio's slab itself is a version of the algorithm described in https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf modified.

At my last job, we had one of these in our game engine based on the code in here: https://cbloomrants.blogspot.com/2013/05/05-08-13-lock-free-weak-reference-table.html (note that it looks like it's talking about a different thing but it's actually just a slightly more powerful version of our HandleMap/Handles, in that it's also reference counted — a lot of the implementation is in the source zip file). That said, as-is the algorithm requires an AtomicU128 on 64-bit platforms which rust doesn't expose even where it's available, like x86_64, (although there's a trick to work around this that works on most 64 bit architectures). Also, it looks like c++ game code, which some people find hard to read.

Anyway, yeah, I very strongly doubt that uniffi is interested in being the owner of a tricky concurrent data structure, no matter how interesting it may be.

┆Issue is synchronized with this Jira Task
┆Issue Number: UNIFFI-16

The text was updated successfully, but these errors were encountered:

rfk · 2020-08-18T00:09:07Z

Thanks for writing this up! Partly inspired by this issue, over in #248 I've tried to write up some ADRs to explain the current set of tradeoffs that have lead to these sources of overhead, and highlight places where we might revisit those choices over time.

rfk · 2020-12-20T04:45:52Z

Hell, right now we use nul-terminated C-strings (I think), and really there's no real reason to do that over
e.g. #[repr(C)] struct Str(*const c_char, usize);.

FWIW, we have now switched away from nul-terminated C strings and are doing something similar to the above ptr+len-in-a-struct approach. There's still a lot more copying than there needs to be, but it's a step in the right direction.

rfk · 2021-03-18T04:30:56Z

Additionally, for T: Send + Sync, you still can never give a &mut T out, but you also don't need to. These types are thread safe,
and don't need any locking. We even use this in a-s for the interrupt handles. This capability seems important, even if you
don't care about the overhead.

Oh hey, we did this! Or at least something vaguely-similarly-shaped anyway, ref ADR-0003.

Sorry, I just quite enjoy watching us slowly chip away at all the good ideas in this thread :-)

rfk · 2021-03-18T04:37:50Z

Ensuring non-Rust locks are sufficient

FWIW, my current thinking here is that we should go in exactly the opposite direction here, and insist that anything passed over the FFI is Sync + Send, and hence that any locking is managed internally in the Rust code. We might be able to provide some macros to make this easier in simple cases, but we've been bitten a bit recently by the hidden locking that happens inside the UniFFI bindings, so moving it somewhere where it's more clearly visible to developers seems like a win on balance.

rfk · 2021-05-17T10:38:44Z

Finally, and what I think is a better safety vs complexity trade-off: just use pointers. (Note that you need to enforce T: Send
for any type you do this with, and will be relying on locking happening in the caller. More on that in a sec).

Cross-referencing for anyone following this thread - we are actively working on a plan to replace our current use of "hamdle maps" with raw pointers. The initial plan is to wrap object instances in an Arc and use Arc::into_raw/Arc::from_raw for transiting the FFI boundary. We opted to start with the builtin Arc rather an external crate because there are some places where the Arc might end up in user-visible function signatures, but we'll see how it goes in practice.

Ref #430 for evolving details of the plan.

rfk mentioned this issue Dec 21, 2020

Experiment with using raw pointers for objects. #366

Closed

rfk mentioned this issue Mar 18, 2021

[discuss] A proposal for passing around object instances #419

Closed

rfk mentioned this issue May 20, 2021

rfk suggestions for ADR-0005 mhammond/uniffi-rs#2

Merged

Lonami mentioned this issue Oct 26, 2022

Fix some typos in ADRs and manual #1385

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary sources of overhead in uniffi #244

Unnecessary sources of overhead in uniffi #244

thomcc commented Aug 14, 2020 •

edited by data-sync-user

Loading

rfk commented Aug 18, 2020

rfk commented Dec 20, 2020

rfk commented Mar 18, 2021

rfk commented Mar 18, 2021

rfk commented May 17, 2021 •

edited

Loading

Unnecessary sources of overhead in uniffi #244

Unnecessary sources of overhead in uniffi #244

Comments

thomcc commented Aug 14, 2020 • edited by data-sync-user Loading

Serialization

Locking

Using Pointers

Ensuring non-Rust locks are sufficient

Footnote: lock-free handlemaps

rfk commented Aug 18, 2020

rfk commented Dec 20, 2020

rfk commented Mar 18, 2021

rfk commented Mar 18, 2021

rfk commented May 17, 2021 • edited Loading

thomcc commented Aug 14, 2020 •

edited by data-sync-user

Loading

rfk commented May 17, 2021 •

edited

Loading