Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SGX thread parker #123

Merged
merged 8 commits into from
Mar 22, 2019
Merged

Add SGX thread parker #123

merged 8 commits into from
Mar 22, 2019

Conversation

faern
Copy link
Collaborator

@faern faern commented Mar 18, 2019

Adds a ThreadParker implementation that is specific to SGX, so that platform does not need to use the generic fallback spin loop based implementation.

This is in preparation of rust-lang/rust#56410.

Ping @jethrogb who is familiar with the platform. I have read the documentation for wait and send and I have tried cross compiling this code with --target x86_64-fortanix-unknown-sgx to make sure it at least compiles. But I have not tested actually running it since I'm not sure how to do that. Please advise on the implementation if you have any feedback.

It would be good if someone with access to an SGX machine could try to run the benchmark tool in parking_lot before and after this PR. So we can get some rough numbers on performance of parking_lot compared to the standard library locks. See this comment: rust-lang/rust#56410 (comment)

Copy link

@jethrogb jethrogb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

core/src/lib.rs Show resolved Hide resolved
core/src/thread_parker/sgx.rs Outdated Show resolved Hide resolved
core/src/thread_parker/sgx.rs Outdated Show resolved Hide resolved
@jethrogb
Copy link

I tried to run the benchmark but it just hangs. I can investigate later.

@faern
Copy link
Collaborator Author

faern commented Mar 18, 2019

I tried to run the benchmark but it just hangs. I can investigate later.

Save the benchmarks until we have verified the implementation is correct then. How about the normal test suite?

@jethrogb
Copy link

Reverse SGX cfg check to solve experimental target_vendor

Wow, neat trick! NB. target_vendor was stabilized in 1.33.

@jethrogb
Copy link

All unit tests pass, except rwlock::tests::test_rwlock_recursive which calls the unsupported std::thread::sleep.

@faern
Copy link
Collaborator Author

faern commented Mar 18, 2019

Well, that's a really good start! I have now modified the test to not use thread::sleep on SGX, but rather just yield the thread 100 times in hopes this will cause the second thread to reach the write locking call. Not sure if 100 is a suitable number of times to issue the call, I guess running the tests a few times will have to determine that.

@faern
Copy link
Collaborator Author

faern commented Mar 18, 2019

If all tests pass but the benchmarks appear to "hang". Then maybe they are just incredibly slow and need some more time to complete? 🤔 If not, we might have a bug not caught by the tests.

@jethrogb
Copy link

But it says it's going to run for one second?

@faern
Copy link
Collaborator Author

faern commented Mar 18, 2019

But it says it's going to run for one second?

Yeah. But the last argument, being the testIterations argument will multiply that. If testIterations is 2 then each test will run for two seconds.

EDIT: Yes, it's supposed to be time limited. But you never know, when things act weird try weird things :) You said earlier that it might be sketchy to depend on Instant::now() for scheduling on SGX. So maybe whatever the benchmarks does to limit the time of each run does not work there for some reason.

.travis.yml Outdated Show resolved Hide resolved
@jethrogb
Copy link

jethrogb commented Mar 19, 2019

The hanging seems to be somehow related to sleep panicking. It's possible this is fixed as part of rust-lang/rust#59136. Alternatively, this could be rust-lang/rust#58042 (which looks like it might not get merged).

I was able to run the benchmarks by getting rid of the sleeps:

diff --git a/benchmark/src/mutex.rs b/benchmark/src/mutex.rs
index 519b2c6..2cd96cb 100644
--- a/benchmark/src/mutex.rs
+++ b/benchmark/src/mutex.rs
@@ -16,7 +16,7 @@ use std::cell::UnsafeCell;
 use std::sync::atomic::{AtomicBool, Ordering};
 use std::sync::{Arc, Barrier};
 use std::thread;
-use std::time::Duration;
+use std::time::{Duration, SystemTime};
 
 trait Mutex<T> {
     fn new(v: T) -> Self;
@@ -133,7 +133,10 @@ fn run_benchmark<M: Mutex<f64> + Send + Sync + 'static>(
         }));
     }
 
-    thread::sleep(Duration::from_secs(seconds_per_test as u64));
+    let start = SystemTime::now();
+    let max = Duration::from_secs(seconds_per_test as u64);
+    while start.elapsed().unwrap() < max {}
+
     keep_going.store(false, Ordering::Relaxed);
     threads.into_iter().map(|x| x.join().unwrap().0).collect()
 }
diff --git a/benchmark/src/rwlock.rs b/benchmark/src/rwlock.rs
index 0b724b5..23837c8 100644
--- a/benchmark/src/rwlock.rs
+++ b/benchmark/src/rwlock.rs
@@ -17,7 +17,7 @@ use std::cell::UnsafeCell;
 use std::sync::atomic::{AtomicBool, Ordering};
 use std::sync::{Arc, Barrier};
 use std::thread;
-use std::time::Duration;
+use std::time::{Duration, SystemTime};
 
 trait RwLock<T> {
     fn new(v: T) -> Self;
@@ -210,7 +210,10 @@ fn run_benchmark<M: RwLock<f64> + Send + Sync + 'static>(
         }));
     }
 
-    thread::sleep(Duration::new(seconds_per_test as u64, 0));
+    let start = SystemTime::now();
+    let max = Duration::from_secs(seconds_per_test as u64);
+    while start.elapsed().unwrap() < max {}
+
     keep_going.store(false, Ordering::Relaxed);
 
     let run_writers = writers

Here are some bad benchmarks (testIterations=10):

mutex - 66d67bb (master)

- Running with 2 threads
- 1 iterations inside lock, 0 iterations outside lock
- 1 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |  16676.166 kHz |  15104.691 kHz |   7120.573 kHz
std::sync::Mutex     |     97.798 kHz |     92.837 kHz |     31.932 kHz

mutex - e958443 (this PR)

- Running with 2 threads
- 1 iterations inside lock, 0 iterations outside lock
- 1 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |  17004.625 kHz |  16708.983 kHz |   8024.508 kHz
std::sync::Mutex     |     79.578 kHz |     69.929 kHz |     33.624 kHz

rwlock - 66d67bb (master)

- Running with 1 writer threads and 1 reader threads
- 1 iterations inside lock, 0 iterations outside lock
- 1 seconds per test
parking_lot::RwLock  - [write]  28628.331 kHz [read]    428.672 kHz
seqlock::SeqLock     - [write]  31885.639 kHz [read]  11868.928 kHz
std::sync::RwLock    - [write]    453.986 kHz [read]   4768.948 kHz

rwlock - e958443 (this PR)

- Running with 1 writer threads and 1 reader threads
- 1 iterations inside lock, 0 iterations outside lock
- 1 seconds per test
parking_lot::RwLock  - [write]  39197.440 kHz [read]    183.720 kHz
seqlock::SeqLock     - [write]  35924.117 kHz [read]   4215.011 kHz
std::sync::RwLock    - [write]    521.821 kHz [read]   7776.407 kHz

The RwLock read performance seems worrisome. Then again, this is on my dual core system where one core is just busy-wating.

@faern
Copy link
Collaborator Author

faern commented Mar 19, 2019

The RwLock read performance seems worrisome. Then again, this is on my dual core system where one core is just busy-wating.

Given that the same benchmark run also shows seqlock::SeqLock as 2.8x slower and std::sync::RwLock 1.6x faster, even though the code for those did not change, I do not trust the accuracy of that benchmark very much.

If you occupy 50% of your CPU resources with busy looping in the benchmark runner it's not very strange the results become a bit unreliable.

Can you run multiple runs and try to see some general trend/average?

@jethrogb
Copy link

I've run it multiple times, RwLock read is always very slow. I can try running this on a higher-core machine later

@faern
Copy link
Collaborator Author

faern commented Mar 19, 2019

Not sure if it will affect your results, but try running the benchmarks with --features nightly, since that will activate all the bells and whistles of parking_lot.

@Amanieu
Copy link
Owner

Amanieu commented Mar 20, 2019

This is because parking_lot prioritizes writers over readers when there is contention. If you do a benchmark with only reader threads then you will get much better numbers.

@faern faern force-pushed the add-sgx-thread-parker branch 2 times, most recently from 9b1508d to 9abfb1c Compare March 20, 2019 07:20
@faern
Copy link
Collaborator Author

faern commented Mar 20, 2019

@jethrogb Can you verify if the change I did to the test_rwlock_recursive test works reliably or not?

I have tried rebasing my libstd PR, #119, on top of this locally and ran the entire dist-various-2 test, which includes SGX. It passed. So unless we find more evidence this is slowing things down I'm starting to feel this is ready for review/merge.

I tried running SGX stuff myself, and the tests/benches. But my Rust build server is running in a Xen VM. And it seems not really possible without patching Xen?

// released to avoid blocking the queue for too long.
#[inline]
pub fn unpark(self) {
usercalls::send(EV_UNPARK, Some(self.0)).expect("send returned error");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement timeouts in the future, the send here can actually fail if the thread's wait timed out and it then exited. I assume that all wait calls are supposed to properly handle unexpected EV_UNPARK events.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the send here can actually fail if the thread's wait timed out and it then exited.

In that case, the event will actually be queued for the next thread that's going to run at that TCS. But I suppose once we implement SGX2, TCSes can be dynamic, so your point still holds then.

I assume that all wait calls are supposed to properly handle unexpected EV_UNPARK events.

Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will change this to just ignore any returned error. Or do you think I should check specifically for the InvalidInput error returned on invalid TCS and just ignore that ErrorKind?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always in favor of more specific error checking :)

Copy link
Collaborator Author

@faern faern Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now changed it to only panic on != InvalidInput. But if we decide on using debug_assert_eq in the other discussion, then the consistent thing here would be to only check this error when debug_assertions are enabled as well.

pub fn park(&self) {
while self.parked.load(Ordering::Acquire) {
let res = usercalls::wait(EV_UNPARK, WAIT_INDEFINITE).expect("wait returned error");
assert_eq!(res & EV_UNPARK, EV_UNPARK);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a debug_assert_eq!, for consistency with the other parkers and so that we avoid crashes on unexpected errors in production code. Same with the expect.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't be appropriate:

https://edp.fortanix.com/docs/api/fortanix_sgx_abi/struct.Usercalls.html#method.wait

Enclaves must not assume that this call only returns in response to valid events generated by the enclave. This call may return for invalid event sets, or before timeout has expired even though no event is pending.

I guess it doesn't say it explicitly, but even though those conditions mustn't be assumed, any deviation should be considered misbehavior by the operating environment and an abort is appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parking algorithm does not really care why or how we were woken up. All it needs to do is to be able to check self.parked and possibly go back to sleep.

So I guess ultimately this comes down to if we want parking_lot to fail fast and loud on any event we did not anticipate, or if it should just ignore them and carry on with its task. The second seems to be what it does on other platforms. So I'll change to that for now and you can debate which one should be the final version.

@jethrogb
Copy link

I have tried rebasing my libstd PR, #119, on top of this locally and ran the entire dist-various-2 test, which includes SGX. It passed.

SGX is a tier 2 target so the CI only builds, it doesn't run any tests.

I tried running SGX stuff myself, and the tests/benches. But my Rust build server is running in a Xen VM. And it seems not really possible without patching Xen?

The only hypervisor I'm aware of with production-ready SGX support is Hyper-V in Azure (only).

I'll do some more testing today.

@Amanieu Amanieu merged commit ba43375 into Amanieu:master Mar 22, 2019
@Amanieu
Copy link
Owner

Amanieu commented Mar 22, 2019

Thanks!

@faern faern deleted the add-sgx-thread-parker branch March 22, 2019 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants