-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for the combining lock #683
Conversation
03487ef
to
385a5ef
Compare
I happen to have another related documented code in case it helps: https://github.com/llvm/llvm-project/pull/101916/files P.S. MCS queue is not the correct name for MCS lock. I will need to correct that. |
This adds some documentation to make the combining lock easier to understand. This is working towards documenting the changes for the 0.7 release.
385a5ef
to
0211bb7
Compare
Thanks for the link. If you have any suggestions for making this clearer please do add them here. I'm thinking of refactoring the code a little bit more. |
@SchrodingerZhu @vishalgupta97 I'd be really keen to get your feedback on this PR. Does the markdown file make sense? |
Yes. I like the detailed document with performance data. One difference in my LLVM-libc patch was that if I grab the lock at the first place, I will try to call the templated lambda before spilling anything explicitly on stack. I think you have also refactored it in the same way in the latest commit. |
docs/combininglock.md
Outdated
The core idea of the combining lock is that it holds a queue of operations that need to be executed while holding the lock, | ||
and then the thread holding the lock can execute multiple threads' operations in a single lock acquisition. | ||
This can reduce the amount of cache misses as a single thread performs multiple updates, | ||
and thus overall the system will have fewer cache misses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache misses for shared data will go down, but there still be cache misses for bringing in the queue node (containing information about what operation to execute (function pointer)). Overall, the system might not have fewer cache misses.
|
||
* Back off strategies | ||
* Integration with Futex/Waitonaddress | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For queue: A->B->C
Before executing B, a prefetch for C's queue node can be issued, so that when the time comes to execute C, some of the latency can be hidden.
src/snmalloc/ds/combininglock.h
Outdated
|
||
// Notify the thread that we completed its work. | ||
// Note that this needs to be done last, as we can't read | ||
// curr->next after setting curr->status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the load (curr->next) is done, Line 159 and Line 164 can be reordered.
This can be bad for performance, however, with the MCS queue lock the threads spin on individual cache lines so some of the worst effects of [spinning are mitigated](https://pdos.csail.mit.edu/papers/linux:lock.pdf). | ||
However, for a more general purpose the combining lock should have a back off strategy to increase the time between checks the longer it waits. | ||
|
||
The second improvement would be to integrate the combining lock with the OS level support for waiting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Rust's synchronization routines are generally using is to spin 100
(a somehow chosen number) times before going into sleep, similar to the following:
int remaining_spins = SPIN_COUNT;
// Do spin polling for certain amount of time.
while (remaining_spins > 0) {
if (header.status.load(cpp::MemoryOrder::RELAXED) !=
RawLambdaLockHeader::WAITING)
break;
sleep_briefly();
remaining_spins--;
}
// If we used up all spins, we may need to go to sleep.
if (remaining_spins == 0) {
FutexWordType expected = RawLambdaLockHeader::WAITING;
if (header.status.compare_exchange_strong(
expected, RawLambdaLockHeader::SLEEPING,
cpp::MemoryOrder::ACQ_REL, cpp::MemoryOrder::RELAXED))
header.status.wait(RawLambdaLockHeader::SLEEPING);
}
However, that was for polling shared words. Not sure if exponential backoff or other schemes would become more applicable in MCS lock.
This adds some documentation to make the combining lock easier to understand. This is working towards documenting the changes for the 0.7 release.