Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ABA primitives with LL/SC emulation? #50

Closed
nwf opened this issue May 13, 2019 · 7 comments
Closed

Replace ABA primitives with LL/SC emulation? #50

nwf opened this issue May 13, 2019 · 7 comments

Comments

@nwf
Copy link
Collaborator

nwf commented May 13, 2019

CHERI builds primarily on RISC platforms, specifically MIPS and RISC V, which have native LL/SC support. AFAICT, MIPS R4K promises relatively strong LL/SC operation (permitting loads and stores from the core having done the LL); RISC V promises little but presumably real implementations will also be relatively strong. Anyway, on these platforms, I think ds/aba.h is more of a hindrance than a help; the sole user, ds/mpmcstack.h is forced to phrase its operations in terms of CAS and so will always require ABA protection. While the existing non-X86 paths in ds/aba.h optimistically fall back to just a pointer-sized CAS, this fails to deliver effective ABA protection

I am unsure what the correct answer is, but I think, at least broadly, that mpmcstack should be rewritten in terms of a LL/SC primitive that can be provided by some platform-specific code. I haven't come up with a better answer than something like this, tho', which isn't great, so other thoughts are more than welcome. Notably, this approach, I think, relies very, very heavily on the compiler's ability to inline to produce good code.

  template<typename T, Construction c = RequiresInit>
  class LLSC
  {
    /* Cmp defined as in existing ds/aba */
  private:
   Cmp value;

  public:
    inline bool llsc(std::function<bool, T*> f, bool* out)
    {
      T t;
      // Load link t from this, or load t and the generation counter
      *out = f(&t);
      if (*out)
      {
        // Store conditional or CAS2 with incremented generation counter
      }
      else
      {
        return true; // will cause exit from calling loop, which can look at *out
      }
    }
  }
@mjp41
Copy link
Member

mjp41 commented May 13, 2019

We really require the only way to make a Cmp is Cmp read(), then I think the current interface may be fine for LL/SC. I.e. the

using Cmp = T*;

Is not good enough. But the idea is that you can only compare exchange with a previously read value. The previously read value would be LL, and then the compare exchange would be a SC.

I am unsure on the semantics of LL/SC around multiple LL operations, i.e. what would happen if a thread has multiple ABA operations in progress, then I am not sure this API would protect enough.

@nwf
Copy link
Collaborator Author

nwf commented May 14, 2019

Most implementations permit only a single LL/SC in flight at once, which is a bit of a travesty, but is what we have. That said, I have been hacking up a LL/SC implementation of ds/aba.h following your suggestion that load() is LL and will let you know how it goes.

@mjp41
Copy link
Member

mjp41 commented May 14, 2019

Great thanks.

We probably want the Debug version to store the address read from in the Cmp. So that we can check the exchange is on the right location.

In fact, it might be interesting to change the API a little

    struct Cmp
    {
      T* ptr;
      uintptr_t aba;  // For X86
      T** addr;  // ABA protected location read from

      bool compare_exchange(T* t) { ... }
    };

Then compare_exchange is guaranteed to be on the same location.

So effectively would do something like

    ABA a = ...;
    Cmp c = a.read();
    do
    {
       ....
    } while (c.exchange(new_v));

in the mpmcstack.

@mjp41
Copy link
Member

mjp41 commented Apr 2, 2020

@nwf Did you ever get any further on this? I realised the ARM implementation does not actually do the load-link/store-conditional, so is broken. I am about to work on this, but if you have thoughts please let me know.

@nwf
Copy link
Collaborator Author

nwf commented Apr 3, 2020

No, I think whatever I did down this path I long-since abandoned. The work for Cornucopia used some rather hackish inline assembler, as found in bb81acc from https://github.com/CTSRD-CHERI/snmalloc .

@mjp41
Copy link
Member

mjp41 commented Apr 3, 2020

@nwf, Thanks Wes, I think the new API I have implemented is roughly what we agreed here, so hopefully will match LL/SC semantics for ARM/MIPS/RISC V. I have a back up implementation using a lock, so any platform should work, and then we can make it non-blocking with custom implementations later.

@mjp41
Copy link
Member

mjp41 commented Apr 10, 2020

Done in #161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants