Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scheduler-allocated data collectible #119

Merged
merged 4 commits into from
Jul 16, 2020

Conversation

shwestrick
Copy link
Collaborator

This implements two fixes:

  • Local data allocated by scheduler workers during their idle loops is now collectible
    • Scheduler threads are properly initialized at depth 1
    • Each thread now has a "minimum local collection depth" which for scheduler workers is set at 1. For fork-join threads, this is set at 2, to disallow local collections at the root of the user hierarchy.
    • Examples of such local data: random number generation state, idle timing counters, etc.
  • Scheduler-allocated threads are now collectible
    • These threads are now placed in the hierarchy and are subject to local collections.
    • (By "scheduler-allocated thread" I mean the threads that are allocated in order to execute stolen tasks.)
    • (Previously, these thread objects lived at depth 0, in the "global heap", which essentially made them uncollectible.)

Some preliminary testing has shown that these fixes are important in some cases. The runtime overhead appears to be small.

There was a bug in initialization for setting up the scheduler
threads in the actual hierarchy, where some scheduler threads
were at depth 1, and other were at detpth 0. This fixes that bug.

Recall, the intended architecture is below, where H is the root
of the heap hierarchy (from the perspective of the source program)
and S0, S1, ... are the heaps of the scheduler threads, up to
(N-1) for N processors. The "global" heap, which is the only heap
at depth 0, contains any data allocated before scheduler
initialization, as well as shared scheduler data (e.g. deques)
and new thread records.

      +--------------------------+
   0  |         global           |
      +--------------------------+
        |      |       |      |
      +---+  +----+  +----+
   1  | H |  | S0 |  | S1 |  ...
      +---+  +----+  +----+

The cool thing about this architecture is that the existing local
collection algorithm works for collecting scheduler data S0, S1 etc.
without any modifications.

Originally, we tried accomplishing this by setting (in Parallel_init)
the depths of the other threads to 1. And actually, there was redundancy
because the initThreadAndHeap calls (init-world.c) allocated extra threads
at depth 1. However, due to other usage of threads in the MLton libraries
(e.g. Thread.register, used for FFI), the "current thread" of the
additional processors at the time of Parallel_init was not necessarily the
same physical thread as the one that will be used for the scheduler.

To fix this, we set initial thread depths to 0 and wait until we know
for sure that we are using a scheduler thread to set the depth to 1.
The existing GC policy disables local collections for shallow
depths, which also implicitly was preventing scheduler collections.
To fix this, I gave each thread a minLocalCollectionDepth
field which is the minimum depth permitted to be in-scope for
a local collection. By default, this is set according to the GC
policy, but can be overridden explicitly. The scheduler threads
set their own minLocalCollectionDepth to permit collections.
"Move" new threads into the hierarchy by setting their depths
(and the depths of their hierarchical heaps) immediately before
switching to them. This allows thread objects and their stacks
to be GC'ed as normal objects.

Currently, a thread which begins executing at depth D lives at
depth D-1 in the hierarchy. I tried depth D, but ran into
bugs: I believe this doesn't play nicely with the down-pointer
which is created at join points, from the ancestor thread to the
child thread. By putting the thread at depth D-1, the thread object
itself remains out-of-scope of local collections until after the
join, at which point the thread is no longer needed.
@shwestrick shwestrick merged commit d58ac92 into MPLLang:master Jul 16, 2020
@shwestrick shwestrick deleted the thread-gc-fixes-merge branch July 16, 2020 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant