Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for the combining lock #683

Merged
merged 4 commits into from
Oct 5, 2024

Conversation

mjp41
Copy link
Member

@mjp41 mjp41 commented Oct 2, 2024

This adds some documentation to make the combining lock easier to understand. This is working towards documenting the changes for the 0.7 release.

@SchrodingerZhu
Copy link
Contributor

SchrodingerZhu commented Oct 2, 2024

I happen to have another related documented code in case it helps:

https://github.com/llvm/llvm-project/pull/101916/files


P.S. MCS queue is not the correct name for MCS lock. I will need to correct that.

This adds some documentation to make the combining lock easier to understand.
This is working towards documenting the changes for the 0.7 release.
@mjp41
Copy link
Member Author

mjp41 commented Oct 2, 2024

I happen to have another related documented code in case it helps:

https://github.com/llvm/llvm-project/pull/101916/files

P.S. MCS queue is not the correct name for MCS lock. I will need to correct that.

Thanks for the link. If you have any suggestions for making this clearer please do add them here. I'm thinking of refactoring the code a little bit more.

@mjp41
Copy link
Member Author

mjp41 commented Oct 2, 2024

@SchrodingerZhu @vishalgupta97 I'd be really keen to get your feedback on this PR. Does the markdown file make sense?

@SchrodingerZhu
Copy link
Contributor

Yes. I like the detailed document with performance data.

One difference in my LLVM-libc patch was that if I grab the lock at the first place, I will try to call the templated lambda before spilling anything explicitly on stack. I think you have also refactored it in the same way in the latest commit.

The core idea of the combining lock is that it holds a queue of operations that need to be executed while holding the lock,
and then the thread holding the lock can execute multiple threads' operations in a single lock acquisition.
This can reduce the amount of cache misses as a single thread performs multiple updates,
and thus overall the system will have fewer cache misses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache misses for shared data will go down, but there still be cache misses for bringing in the queue node (containing information about what operation to execute (function pointer)). Overall, the system might not have fewer cache misses.


* Back off strategies
* Integration with Futex/Waitonaddress

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For queue: A->B->C
Before executing B, a prefetch for C's queue node can be issued, so that when the time comes to execute C, some of the latency can be hidden.


// Notify the thread that we completed its work.
// Note that this needs to be done last, as we can't read
// curr->next after setting curr->status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the load (curr->next) is done, Line 159 and Line 164 can be reordered.

This can be bad for performance, however, with the MCS queue lock the threads spin on individual cache lines so some of the worst effects of [spinning are mitigated](https://pdos.csail.mit.edu/papers/linux:lock.pdf).
However, for a more general purpose the combining lock should have a back off strategy to increase the time between checks the longer it waits.

The second improvement would be to integrate the combining lock with the OS level support for waiting.
Copy link
Contributor

@SchrodingerZhu SchrodingerZhu Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Rust's synchronization routines are generally using is to spin 100 (a somehow chosen number) times before going into sleep, similar to the following:

      int remaining_spins = SPIN_COUNT;
      // Do spin polling for certain amount of time.
      while (remaining_spins > 0) {
        if (header.status.load(cpp::MemoryOrder::RELAXED) !=
            RawLambdaLockHeader::WAITING)
          break;
        sleep_briefly();
        remaining_spins--;
      }
      // If we used up all spins, we may need to go to sleep.
      if (remaining_spins == 0) {
        FutexWordType expected = RawLambdaLockHeader::WAITING;
        if (header.status.compare_exchange_strong(
                expected, RawLambdaLockHeader::SLEEPING,
                cpp::MemoryOrder::ACQ_REL, cpp::MemoryOrder::RELAXED))
          header.status.wait(RawLambdaLockHeader::SLEEPING);
      }

However, that was for polling shared words. Not sure if exponential backoff or other schemes would become more applicable in MCS lock.

@mjp41 mjp41 merged commit c770769 into microsoft:main Oct 5, 2024
52 checks passed
@mjp41 mjp41 deleted the combininglockdoc branch November 19, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants