Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metal] Revise NodeManager's implementation due to weak memory order #2008

Merged
merged 3 commits into from
Oct 31, 2020

Conversation

k-ye
Copy link
Member

@k-ye k-ye commented Oct 30, 2020

It's turned out that #2000's approach didn't really work, due to Metal's weak memory order guarantee...

Problem

Previously, each pointer cell stores a NodeManager::ElemIndex::raw_. This is basically an index in the data_list. The index then gets mapped to a chunk in the list, then a slot within that chunk.

This design has involved two places that would require atomic operations:

  1. Reading or allocating the cell's value.
  2. Reading or allocating the chunk in data_list.
def allocate():
  while cell is not valid:
    if atomic_cas(&cell, 1):  # <-- 1st atomic, `1` means lock
      addr = atomically allocate from `data_list`  # <-- 2nd atomic
      store `addr` into `cell`  # now `cell` is valid

def get():
  idx = atomically read from cell
  atomically load `addr` from `data_list` using `idx`
  return `addr`

If allocate() is done in thread A, such that allocate() in another thread B sees that cell is already valid, due to the relaxed memory order, B's get() could still observe invalid addr...

Solution

Just store the allocated pointer offset (32-bit, ListManagerData::ReservedElemPtrOffs) into cell directly. This avoids the second lookup into data_list. Note that to reduce code change, NodeManager::ElemIndex is just an alias for ListManagerData::ReservedElemPtrOffs now.

Related issue = #1740, #1174

[Click here for the format server]


@k-ye k-ye changed the title [metal] Revise NodeManager/ListManager's implementation due to weak memory order [metal] Revise NodeManager's implementation due to weak memory order Oct 30, 2020
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! As always I fully trust your implementation.

Regarding the memory order issue, I find memory fences pretty useful in the CUDA backend. There are multiple levels of fences: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-fence-functions I guess Metal may have something similar. Not sure if it is related though.

@codecov
Copy link

codecov bot commented Oct 30, 2020

Codecov Report

Merging #2008 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2008   +/-   ##
=======================================
  Coverage   43.51%   43.51%           
=======================================
  Files          45       45           
  Lines        6264     6264           
  Branches     1109     1109           
=======================================
  Hits         2726     2726           
  Misses       3365     3365           
  Partials      173      173           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dea88d0...822bf6c. Read the comment docs.

@k-ye
Copy link
Member Author

k-ye commented Oct 30, 2020

Regarding the memory order issue, I find memory fences pretty useful in the CUDA backend.

Yep. Unfortunately, Metal's synchronization barriers are pretty rudimentary compared to CUDA's... The doc only mentions barriers that are for both execution and memory, and it's only scoped to threadgroups (~= a CUDA block)...

Screen Shot 2020-10-31 at 8 51 38

Thinking further, Metal clearly doesn't support more advanced memory orders, given that the only memory order it has in the atomics is metal::memory_order_relaxed...

@k-ye k-ye merged commit a878232 into taichi-dev:master Oct 31, 2020
@k-ye k-ye deleted the mtl-ptr-fix branch October 31, 2020 00:42
@yuanming-hu yuanming-hu mentioned this pull request Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants