Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task splitting: change additive accumulation to multiplicative #53408

Merged
merged 3 commits into from
Feb 28, 2024

Conversation

StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented Feb 20, 2024

Issue raised here. Given this definition, look at output values:

using .Threads

macro fetch(ex) :(fetch(@spawn($(esc(ex))))) end

function taskCorrelatedXoshiros(+ = +)
    r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64))
    r01, r00 = (@fetch(rand(UInt64)), rand(UInt64))
    r01 + r10 - (r00 + r11)
end

Before:

julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
9-element Vector{UInt64}:
 0x0000000000000000
 0x0000000000000001
 0x00000000007fffff
 0x0000000000800000
 0x0000000000800001
 0xffffffffff7fffff
 0xffffffffff800000
 0xffffffffff800001
 0xffffffffffffffff

After:

julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
1000-element Vector{UInt64}:
 0x000420647a085960
 0x0038c5b889b585c6
 0x007b29fae8107ff7
 0x00e73b9e883ac1c8
                  
 0xfe68be9c0dde1e88
 0xfeca042354218c35
 0xfeeb8203e470c96b
 0xfff5dbb8771307b9

julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000))
1000-element Vector{UInt64}:
 0x00126f951c1e56dc
 0x0025a82477ffac08
 0x002dd82c9986457a
 0x00a713c4d56a3dbc
                  
 0xfe2e40a5b345095e
 0xfe77b90881967436
 0xfea2559be63f1701
 0xff88b5b28cefac5f

@KristofferC KristofferC added the backport 1.11 Change should be backported to release-1.11 label Feb 20, 2024
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these new a[1:4] values come from?

@Seelengrab
Copy link
Contributor

I guess this part of the explanation/doc above jl_rng_split should be adjusted too:

It also means that when accumulating the dot product incrementally, as
described in SplitMix, we don't need to multiply weights by anything, we simply
add the random weight for the current task tree depth to the parent's dot
product to derive the child's dot product.

@StefanKarpinski
Copy link
Member Author

Where do these new a[1:4] values come from?

As the comment says, they are "random additive offsets".

@StefanKarpinski
Copy link
Member Author

I guess this part of the explanation/doc above jl_rng_split should be adjusted too:

Yes, I'm not adjusting and comment text until we settle on something.

@StefanKarpinski StefanKarpinski marked this pull request as draft February 20, 2024 19:41
@StefanKarpinski StefanKarpinski changed the title sk/mulsplit task splitting: change additive accumulation to multiplicative Feb 20, 2024
@oscardssmith
Copy link
Member

Does this change notably affect the task spawn time? (I'm assuming not, because add, multiply and byteswap are all really fast, but good to make sure). Otherwise, this looks like a good change.

@StefanKarpinski
Copy link
Member Author

I get identical timings between master and this rebased onto master—about 65ns. Would be good if someone could benchmark on different systems. I'm on an M2 system, so x86 would be helpful.

@carstenbauer
Copy link
Member

carstenbauer commented Feb 22, 2024

Just ran a quick and dirty benchmark on one of our compute nodes (x86 AMD Zen 3).

# master (a1db8daafd*)
julia> @btime @spawn(nothing);
  65.741 ns (4 allocations: 480 bytes)

# this PR rebased onto master
julia> @btime @spawn(nothing);
  66.098 ns (4 allocations: 480 bytes)

@KristofferC KristofferC mentioned this pull request Feb 26, 2024
28 tasks
@StefanKarpinski StefanKarpinski force-pushed the sk/mulsplit branch 2 times, most recently from eb6055f to 9b691d9 Compare February 27, 2024 18:14
In response to the demonstrated ease of constructing linearly correlated
task states, and thus linearly related task RNG outputs, this change
departs from using linear mixing for the compression function, which is
simply dot product in those RNGs. I've shown that we can use any mixing
function that's doubly bijective. This change uses `(2c+1)(2w+1)>>1` to
mix the LCG state and the xoshiro register state and _then_ applies the
PCG output function, which is designed to mask linearity.
src/task.c Outdated Show resolved Hide resolved
src/task.c Outdated Show resolved Hide resolved
src/task.c Outdated Show resolved Hide resolved
@StefanKarpinski StefanKarpinski marked this pull request as ready for review February 28, 2024 20:30
@StefanKarpinski
Copy link
Member Author

Ok, this is ready to go from my perspective.

@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Feb 28, 2024
@oscardssmith oscardssmith merged commit 77c0672 into master Feb 28, 2024
8 checks passed
@oscardssmith oscardssmith deleted the sk/mulsplit branch February 28, 2024 23:06
@oscardssmith oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Feb 28, 2024
KristofferC added a commit that referenced this pull request Mar 1, 2024
Backported PRs:
- [x] #53361 <!-- 🤖 [master] Bump the SparseArrays stdlib from c9f7293
to cb602d7 -->
- [x] #53300 <!-- allow external absint to hold custom data in
`codeinst.inferred` -->
- [x] #53342 <!-- Add `Base.wrap` to docs -->
- [x] #53372 <!-- Silence warnings in `test/file.jl` -->
- [x] #53357 <!-- 🤖 [master] Bump the Pkg stdlib from 6dd0e7c9e to
76070d295 -->
- [x] #53373 <!-- fix sysimage-native-code=no option with pkgimages -->
- [x] #53333 <!-- More consistent return value for annotations, take 2
-->
- [x] #53354 <!-- fix code coverage bug in tail position and `else` -->
- [x] #53407 <!-- fix sysimage-native-code=yes option -->
- [x] #53388 <!-- Fix documentation: thread pool of main thread -->
- [x] #53355 <!-- Fix synchronization issues on the GC scheduler. -->
- [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if
inputs contain no ∃ typevar. -->
- [x] #53437 <!-- Add debug variant of loader_trampolines.o -->
- [x] #53284 <!-- Add annotate! method for AnnotatedIOBuffer -->
- [x] #53466 <!-- [MozillaCACerts_jll] Update to v2023-12-12 -->
- [x] #53467 <!-- [LibGit2_jll] Update to v1.7.2 -->
- [x] #53326 <!-- RFC: when loading code for internal purposes, load
stdlib files directly, bypassing DEPOT_PATH, LOAD_PATH, and stale checks
-->
- [x] #53332
- [x] #53320 <!-- Add `Sys.isreadable, Sys.iswriteable`, update `ispath`
-->
- [x] #53476

Contains multiple commits, manual intervention needed:
- [ ] #53285 <!-- Add update mechanism for Terminfo, and common
user-alias data -->

Non-merged PRs with backport label:
- [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [ ] #53408 <!-- task splitting: change additive accumulation to
multiplicative -->
- [ ] #53403 <!-- Move parallel precompilation to Base -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #53391 <!-- Default to the medium code model in x86 linux -->
- [ ] #53125 <!-- coverage: count coverage where explicitly requested by
inference only -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
@KristofferC KristofferC mentioned this pull request Mar 1, 2024
60 tasks
KristofferC pushed a commit that referenced this pull request Mar 1, 2024
Issue raised
[here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454).
Given this definition, look at output values:
```jl
using .Threads

macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end

function taskCorrelatedXoshiros(+ = +)
    r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64))
    r01, r00 = (@fetch(rand(UInt64)), rand(UInt64))
    r01 + r10 - (r00 + r11)
end
```
Before:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
9-element Vector{UInt64}:
 0x0000000000000000
 0x0000000000000001
 0x00000000007fffff
 0x0000000000800000
 0x0000000000800001
 0xffffffffff7fffff
 0xffffffffff800000
 0xffffffffff800001
 0xffffffffffffffff
```
After:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
1000-element Vector{UInt64}:
 0x000420647a085960
 0x0038c5b889b585c6
 0x007b29fae8107ff7
 0x00e73b9e883ac1c8
                  ⋮
 0xfe68be9c0dde1e88
 0xfeca042354218c35
 0xfeeb8203e470c96b
 0xfff5dbb8771307b9

julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000))
1000-element Vector{UInt64}:
 0x00126f951c1e56dc
 0x0025a82477ffac08
 0x002dd82c9986457a
 0x00a713c4d56a3dbc
                  ⋮
 0xfe2e40a5b345095e
 0xfe77b90881967436
 0xfea2559be63f1701
 0xff88b5b28cefac5f
```
tecosaur pushed a commit to tecosaur/julia that referenced this pull request Mar 4, 2024
…Lang#53408)

Issue raised
[here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454).
Given this definition, look at output values:
```jl
using .Threads

macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end

function taskCorrelatedXoshiros(+ = +)
    r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64))
    r01, r00 = (@fetch(rand(UInt64)), rand(UInt64))
    r01 + r10 - (r00 + r11)
end
```
Before:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
9-element Vector{UInt64}:
 0x0000000000000000
 0x0000000000000001
 0x00000000007fffff
 0x0000000000800000
 0x0000000000800001
 0xffffffffff7fffff
 0xffffffffff800000
 0xffffffffff800001
 0xffffffffffffffff
```
After:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
1000-element Vector{UInt64}:
 0x000420647a085960
 0x0038c5b889b585c6
 0x007b29fae8107ff7
 0x00e73b9e883ac1c8
                  ⋮
 0xfe68be9c0dde1e88
 0xfeca042354218c35
 0xfeeb8203e470c96b
 0xfff5dbb8771307b9

julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000))
1000-element Vector{UInt64}:
 0x00126f951c1e56dc
 0x0025a82477ffac08
 0x002dd82c9986457a
 0x00a713c4d56a3dbc
                  ⋮
 0xfe2e40a5b345095e
 0xfe77b90881967436
 0xfea2559be63f1701
 0xff88b5b28cefac5f
```
KristofferC pushed a commit that referenced this pull request Mar 6, 2024
Issue raised
[here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454).
Given this definition, look at output values:
```jl
using .Threads

macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end

function taskCorrelatedXoshiros(+ = +)
    r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64))
    r01, r00 = (@fetch(rand(UInt64)), rand(UInt64))
    r01 + r10 - (r00 + r11)
end
```
Before:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
9-element Vector{UInt64}:
 0x0000000000000000
 0x0000000000000001
 0x00000000007fffff
 0x0000000000800000
 0x0000000000800001
 0xffffffffff7fffff
 0xffffffffff800000
 0xffffffffff800001
 0xffffffffffffffff
```
After:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
1000-element Vector{UInt64}:
 0x000420647a085960
 0x0038c5b889b585c6
 0x007b29fae8107ff7
 0x00e73b9e883ac1c8
                  ⋮
 0xfe68be9c0dde1e88
 0xfeca042354218c35
 0xfeeb8203e470c96b
 0xfff5dbb8771307b9

julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000))
1000-element Vector{UInt64}:
 0x00126f951c1e56dc
 0x0025a82477ffac08
 0x002dd82c9986457a
 0x00a713c4d56a3dbc
                  ⋮
 0xfe2e40a5b345095e
 0xfe77b90881967436
 0xfea2559be63f1701
 0xff88b5b28cefac5f
```

(cherry picked from commit 77c0672)
mkitti pushed a commit to mkitti/julia that referenced this pull request Mar 7, 2024
…Lang#53408)

Issue raised
[here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454).
Given this definition, look at output values:
```jl
using .Threads

macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end

function taskCorrelatedXoshiros(+ = +)
    r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64))
    r01, r00 = (@fetch(rand(UInt64)), rand(UInt64))
    r01 + r10 - (r00 + r11)
end
```
Before:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
9-element Vector{UInt64}:
 0x0000000000000000
 0x0000000000000001
 0x00000000007fffff
 0x0000000000800000
 0x0000000000800001
 0xffffffffff7fffff
 0xffffffffff800000
 0xffffffffff800001
 0xffffffffffffffff
```
After:
```jl
julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000))
1000-element Vector{UInt64}:
 0x000420647a085960
 0x0038c5b889b585c6
 0x007b29fae8107ff7
 0x00e73b9e883ac1c8
                  ⋮
 0xfe68be9c0dde1e88
 0xfeca042354218c35
 0xfeeb8203e470c96b
 0xfff5dbb8771307b9

julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000))
1000-element Vector{UInt64}:
 0x00126f951c1e56dc
 0x0025a82477ffac08
 0x002dd82c9986457a
 0x00a713c4d56a3dbc
                  ⋮
 0xfe2e40a5b345095e
 0xfe77b90881967436
 0xfea2559be63f1701
 0xff88b5b28cefac5f
```
KristofferC added a commit that referenced this pull request Mar 17, 2024
Backported PRs:
- [x] #39071 <!-- Add a lazy `logrange` function and `LogRange` type -->
- [x] #51802 <!-- Allow AnnotatedStrings in log messages -->
- [x] #53369 <!-- Orthogonalize re-indexing for FastSubArrays -->
- [x] #48050 <!-- improve `--heap-size-hint` arg handling -->
- [x] #53482 <!-- add IR encoding for EnterNode -->
- [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t
-->
- [x] #53507 <!-- update staled `Core.Compiler.Effects` documentation
-->
- [x] #53408 <!-- task splitting: change additive accumulation to
multiplicative -->
- [x] #53523 <!-- add back an alias for `check_top_bit` -->
- [x] #53377 <!-- add _readdirx for returning more object info gathered
during dir scan -->
- [x] #53525 <!-- fix InteractiveUtils call in Base.runtests on failure
-->
- [x] #53540 <!-- use more efficient `_readdirx` for tab completion -->
- [x] #53545 <!-- use `_readdirx` for `walkdir` -->
- [x] #53551 <!-- revert "Add @create_log_macro for making custom styled
logging macros (#52196)" -->
- [x] #53554 <!-- Always return a value in 1-d circshift! of
abstractarray.jl -->
- [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [x] #53571 <!-- Update Documenter to v1.3 for inventory writing -->
- [x] #53403 <!-- Move parallel precompilation to Base -->
- [x] #53589 <!-- add back `unsafe_convert` to pointer for arrays -->
- [x] #53596 <!-- build: remove extra .a file -->
- [x] #53606 <!-- fix error path in `precompilepkgs` -->
- [x] #53004 <!-- Unexport with, at_with, and ScopedValue from Base -->
- [x] #53629 <!-- typo fix in scoped values docs -->
- [x] #53630 <!-- sroa: Fix incorrect scope counting -->
- [x] #53598 <!-- Use Base parallel precompilation to build stdlibs -->
- [x] #53649 <!-- precompilepkgs: package in boths deps and weakdeps are
in fact only weak -->
- [x] #53671 <!-- Fix bootstrap Base precompile in cross compile
configuration -->
- [x] #52125 <!-- Load Pkg if not already to reinstate missing package
add prompt -->
- [x] #53602 <!-- Handle zero on arrays of unions of number types and
missings -->
- [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created -->
- [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug -->
- [x] #53679 <!-- move precompile workload back from Base -->
- [x] #53663 <!-- add isassigned methods for reinterpretarray -->
- [x] #53662 <!-- [REPL] fix incorrectly cleared line after completions
accepted -->
- [x] #53611 <!-- Linalg: matprod_dest for Diagonal and adjvec -->
- [x] #53659 <!-- fix #52025, re-allow all implicit pointer casts in
cconvert for Array -->
- [x] #53631 <!-- LAPACK: validate input parameters to throw informative
errors -->
- [x] #53628 <!-- Make some improvements to the Scoped Values
documentation. -->
- [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value  -->
- [x] #53391 <!-- Default to the medium code model in x86 linux -->
- [x] #53699 <!-- Move `isexecutable, isreadable, iswritable` to
`filesystem.jl` -->
- [x] #41232 <!-- Fix linear indexing for ReshapedArray if the parent
has offset axes -->
- [x] #53527 <!-- Enable analyzegc checks for try catch and fix found
issues -->
- [x] #52092 
- [x] #53682 <!-- Increase build precompilation -->
- [x] #53720 
- [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by
innervars. -->

Contains multiple commits, manual intervention needed:
- [ ] #53305 <!-- Propagate inbounds in isassigned with CartesianIndex
indices -->

Non-merged PRs with backport label:
- [ ] #53736 <!-- fix literal-pow to return the right type when the base
is -1 -->
- [ ] #53707 <!-- Make ScopedValue public -->
- [ ] #53696 <!-- add invokelatest to on_done callback in bracketed
paste -->
- [ ] #53660 <!-- put Logging back in default sysimage -->
- [ ] #53509 <!-- revert moving "creating packages" from Pkg.jl -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #51928 <!-- Styled markdown, with a few tweaks -->
- [ ] #51816 <!-- User-themable stacktraces -->
- [ ] #51811 <!-- Make banner size depend on terminal size -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label Mar 18, 2024
@KristofferC
Copy link
Member

I want to see what package's tests "broke" from the RNG stream changing here.

@nanosoldier runtests(; vs = "@c80a9647166d59782eff6be8c75a5052e17483d7")

@StefanKarpinski
Copy link
Member Author

Hopefully not too disruptive since it only changes the RNG stream in child tasks, not the seeded task; I suspect that it's fairly rare for packages to do multithreaded deterministic RNG testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants