[metal] Revise NodeManager's implementation due to weak memory order #2008

k-ye · 2020-10-30T14:02:36Z

It's turned out that #2000's approach didn't really work, due to Metal's weak memory order guarantee...

Problem

Previously, each pointer cell stores a NodeManager::ElemIndex::raw_. This is basically an index in the data_list. The index then gets mapped to a chunk in the list, then a slot within that chunk.

This design has involved two places that would require atomic operations:

Reading or allocating the cell's value.
Reading or allocating the chunk in data_list.

def allocate():
  while cell is not valid:
    if atomic_cas(&cell, 1):  # <-- 1st atomic, `1` means lock
      addr = atomically allocate from `data_list`  # <-- 2nd atomic
      store `addr` into `cell`  # now `cell` is valid

def get():
  idx = atomically read from cell
  atomically load `addr` from `data_list` using `idx`
  return `addr`

If allocate() is done in thread A, such that allocate() in another thread B sees that cell is already valid, due to the relaxed memory order, B's get() could still observe invalid addr...

Solution

Just store the allocated pointer offset (32-bit, ListManagerData::ReservedElemPtrOffs) into cell directly. This avoids the second lookup into data_list. Note that to reduce code change, NodeManager::ElemIndex is just an alias for ListManagerData::ReservedElemPtrOffs now.

Related issue = #1740, #1174

[Click here for the format server]

yuanming-hu

Thank you! As always I fully trust your implementation.

Regarding the memory order issue, I find memory fences pretty useful in the CUDA backend. There are multiple levels of fences: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-fence-functions I guess Metal may have something similar. Not sure if it is related though.

codecov · 2020-10-30T23:51:29Z

Codecov Report

Merging #2008 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2008   +/-   ##
=======================================
  Coverage   43.51%   43.51%           
=======================================
  Files          45       45           
  Lines        6264     6264           
  Branches     1109     1109           
=======================================
  Hits         2726     2726           
  Misses       3365     3365           
  Partials      173      173

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dea88d0...822bf6c. Read the comment docs.

k-ye · 2020-10-30T23:57:41Z

Regarding the memory order issue, I find memory fences pretty useful in the CUDA backend.

Yep. Unfortunately, Metal's synchronization barriers are pretty rudimentary compared to CUDA's... The doc only mentions barriers that are for both execution and memory, and it's only scoped to threadgroups (~= a CUDA block)...

Thinking further, Metal clearly doesn't support more advanced memory orders, given that the only memory order it has in the atomics is metal::memory_order_relaxed...

[metal] Revise NodeManager/ListManager's implementation

c30f943

k-ye requested review from yuanming-hu and taichi-gardener October 30, 2020 14:02

k-ye changed the title ~~[metal] Revise NodeManager/ListManager's implementation due to weak memory order~~ [metal] Revise NodeManager's implementation due to weak memory order Oct 30, 2020

[skip ci] enforce code format

afb66ee

yuanming-hu approved these changes Oct 30, 2020

View reviewed changes

Merge branch 'master' into mtl-ptr-fix

822bf6c

k-ye merged commit a878232 into taichi-dev:master Oct 31, 2020

k-ye deleted the mtl-ptr-fix branch October 31, 2020 00:42

yuanming-hu mentioned this pull request Oct 31, 2020

[release] v0.7.4 #2013

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metal] Revise NodeManager's implementation due to weak memory order #2008

[metal] Revise NodeManager's implementation due to weak memory order #2008

k-ye commented Oct 30, 2020

yuanming-hu left a comment

codecov bot commented Oct 30, 2020 •

edited

Loading

k-ye commented Oct 30, 2020 •

edited

Loading

[metal] Revise NodeManager's implementation due to weak memory order #2008

[metal] Revise NodeManager's implementation due to weak memory order #2008

Conversation

k-ye commented Oct 30, 2020

yuanming-hu left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 30, 2020 • edited Loading

Codecov Report

k-ye commented Oct 30, 2020 • edited Loading

codecov bot commented Oct 30, 2020 •

edited

Loading

k-ye commented Oct 30, 2020 •

edited

Loading