Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow tasks to migrate among threads #35688

Closed
JeffBezanson opened this issue May 1, 2020 · 5 comments
Closed

allow tasks to migrate among threads #35688

JeffBezanson opened this issue May 1, 2020 · 5 comments
Labels
multithreading Base.Threads and related functionality

Comments

@JeffBezanson
Copy link
Member

Currently when a task blocks, it must be restarted on the same thread it ran on before. This is because some of our code (in both the runtime and in generated code) assumes the ptls pointer cannot change during a task. We need to arrange to reload it in the right places.

We've been aware of this limitation the whole time, but I don't believe there's an issue to track it yet.

@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label May 1, 2020
@tkf
Copy link
Member

tkf commented May 5, 2020

I brought it up in #35686 (comment) but what would happen in @spawn @async f()? Does it prevent thread migration inside f?

@yuyichao
Copy link
Contributor

yuyichao commented May 5, 2020

Reloading tls sounds hard, but I think we can instead expose a task local storage instead.

@sgaure
Copy link

sgaure commented May 11, 2020

Since this bites me with an occasional 30-60% performance hit I've had a look at it. (I'm an old cilk-fan, and sometimes throw in spawns indiscriminately). If I define MIGRATE_TASKS in options.h and rebuild julia, some test code I've made now runs fine with a stable performance. However, other real-world code invariably segfaults in gc.

I've not managed to figure out where the ptls needs to be reloaded. Presumably it's only after a task has been switched out, but that seems to be taken care of with the ifdef's. The julia part of the scheduler does not seem to save the ptls across any jl_switch. I must admit that I do not fully understand the gc, is it somewhere there the ptls gets outdated?

@sgaure
Copy link

sgaure commented May 21, 2020

I have done a little experiment. I have uncommented #define MIGRATE_TASKS in options.h to run with the incomplete thread migration. It turns out that it's not only the ptls that must be reloaded, that wouldn't help, because its loading is oddly broken. So, I run on linux, an amd Ryzen 7 2700X, on commit 9655c21252 with the MIGRATE_TASKS modification.

mythread() = ccall(:pthread_self, Culong, ())
myptls() = ccall(:jl_get_ptls_states, Culong, ())

function tfun()
    @ccall printf("tid:%d, pt:%p ptls:%p\n"::Cstring, 
                  Threads.threadid()::Cint, mythread()::Culong,myptls()::Culong)::Cvoid
    sleep(0.01)
    Threads.threadid(), mythread()
end
function sfun(t)
    fetch(t)
    @ccall printf("Tid:%d, pt:%p ptls:%p\n"::Cstring, 
                  Threads.threadid()::Cint, mythread()::Culong,myptls()::Culong)::Cvoid
    Threads.threadid(), mythread(), myptls()
end

tfun(); sfun(1)
tt = [Threads.@spawn tfun() for i in 1:10]
ss = fetch.([Threads.@spawn sfun($t) for t in tt])
unique(ss)

Now, what are we doing here?

We are comparing three ways to identify a thread in Julia. The Threads.threadid() looks up the local thread storage by calling jl_get_ptls_states() and extract the tid-field, this is done in jl_threadid() in threading.c. The mythread() calls the pthread_self() which returns a thread id, an unsigned long identifying the current OS-thread. myptls() returns the address of the thread local storage block in julia by calling jl_get_ptls_states().

Now, given any of a tid, a pthread or a ptls, the other two are determined, after all its all the same thread. But this is not true.

The output is:

tid:1, pt:0x7f561429b240 ptls:0x7f56142974d0
Tid:1, pt:0x7f561429b240 ptls:0x7f56142974d0
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:4, pt:0x7f55fffff700 ptls:0x7f55ffffb990
tid:3, pt:0x7f5604db6700 ptls:0x7f5604db2990
tid:2, pt:0x7f56055b7700 ptls:0x7f56055b3990
Tid:3, pt:0x7f55fffff700 ptls:0x7f5604db2990
Tid:3, pt:0x7f55fffff700 ptls:0x7f5604db2990
Tid:2, pt:0x7f55fffff700 ptls:0x7f56055b3990
Tid:2, pt:0x7f56055b7700 ptls:0x7f56055b3990
Tid:2, pt:0x7f56055b7700 ptls:0x7f56055b3990
Tid:4, pt:0x7f56055b7700 ptls:0x7f55ffffb990
<here it hangs indefinitely>

In the four last lines, the pthread changes without the tid changing, and then the tid changes without the pthread changing. The ptls and the tid are however always in sync, it seems.

The only way this can happen is if the jl_get_ptls_states() returns different thread local storage with successive calls in the same thread. And the same value with calls from different threads. That is, the jl_get_ptls_states_static() in julia.h does not seem to work as it should. This will create all sorts of havoc with scheduling and gc. How that is possible, I am not certain. I thought that the JL_CONST_FUNC attribute on some of these functions could be the culprit, but I tried to remove it to no avail.

Moreover, if I replace the sleep() with a yield(), everything seems fine and in sync.

Julia Version 1.6.0-DEV.63
Commit 9655c21252* (2020-05-20 01:37 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD Ryzen 7 2700X Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.0 (ORCJIT, znver1)

@JeffBezanson
Copy link
Member Author

Fixed by #40715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

4 participants