-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task splitting: change additive accumulation to multiplicative #53408
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these new a[1:4]
values come from?
I guess this part of the explanation/doc above
|
As the comment says, they are "random additive offsets". |
Yes, I'm not adjusting and comment text until we settle on something. |
86c50dd
to
4926f77
Compare
Does this change notably affect the task spawn time? (I'm assuming not, because add, multiply and byteswap are all really fast, but good to make sure). Otherwise, this looks like a good change. |
I get identical timings between master and this rebased onto master—about 65ns. Would be good if someone could benchmark on different systems. I'm on an M2 system, so x86 would be helpful. |
Just ran a quick and dirty benchmark on one of our compute nodes (x86 AMD Zen 3). # master (a1db8daafd*)
julia> @btime @spawn(nothing);
65.741 ns (4 allocations: 480 bytes)
# this PR rebased onto master
julia> @btime @spawn(nothing);
66.098 ns (4 allocations: 480 bytes) |
eb6055f
to
9b691d9
Compare
In response to the demonstrated ease of constructing linearly correlated task states, and thus linearly related task RNG outputs, this change departs from using linear mixing for the compression function, which is simply dot product in those RNGs. I've shown that we can use any mixing function that's doubly bijective. This change uses `(2c+1)(2w+1)>>1` to mix the LCG state and the xoshiro register state and _then_ applies the PCG output function, which is designed to mask linearity.
9b691d9
to
9bc2c6c
Compare
Ok, this is ready to go from my perspective. |
Backported PRs: - [x] #53361 <!-- 🤖 [master] Bump the SparseArrays stdlib from c9f7293 to cb602d7 --> - [x] #53300 <!-- allow external absint to hold custom data in `codeinst.inferred` --> - [x] #53342 <!-- Add `Base.wrap` to docs --> - [x] #53372 <!-- Silence warnings in `test/file.jl` --> - [x] #53357 <!-- 🤖 [master] Bump the Pkg stdlib from 6dd0e7c9e to 76070d295 --> - [x] #53373 <!-- fix sysimage-native-code=no option with pkgimages --> - [x] #53333 <!-- More consistent return value for annotations, take 2 --> - [x] #53354 <!-- fix code coverage bug in tail position and `else` --> - [x] #53407 <!-- fix sysimage-native-code=yes option --> - [x] #53388 <!-- Fix documentation: thread pool of main thread --> - [x] #53355 <!-- Fix synchronization issues on the GC scheduler. --> - [x] #53429 <!-- Subtype: skip slow-path in `local_∀_∃_subtype` if inputs contain no ∃ typevar. --> - [x] #53437 <!-- Add debug variant of loader_trampolines.o --> - [x] #53284 <!-- Add annotate! method for AnnotatedIOBuffer --> - [x] #53466 <!-- [MozillaCACerts_jll] Update to v2023-12-12 --> - [x] #53467 <!-- [LibGit2_jll] Update to v1.7.2 --> - [x] #53326 <!-- RFC: when loading code for internal purposes, load stdlib files directly, bypassing DEPOT_PATH, LOAD_PATH, and stale checks --> - [x] #53332 - [x] #53320 <!-- Add `Sys.isreadable, Sys.iswriteable`, update `ispath` --> - [x] #53476 Contains multiple commits, manual intervention needed: - [ ] #53285 <!-- Add update mechanism for Terminfo, and common user-alias data --> Non-merged PRs with backport label: - [ ] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [ ] #53408 <!-- task splitting: change additive accumulation to multiplicative --> - [ ] #53403 <!-- Move parallel precompilation to Base --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #53391 <!-- Default to the medium code model in x86 linux --> - [ ] #53125 <!-- coverage: count coverage where explicitly requested by inference only --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility -->
Issue raised [here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454). Given this definition, look at output values: ```jl using .Threads macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end function taskCorrelatedXoshiros(+ = +) r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64)) r01, r00 = (@fetch(rand(UInt64)), rand(UInt64)) r01 + r10 - (r00 + r11) end ``` Before: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 9-element Vector{UInt64}: 0x0000000000000000 0x0000000000000001 0x00000000007fffff 0x0000000000800000 0x0000000000800001 0xffffffffff7fffff 0xffffffffff800000 0xffffffffff800001 0xffffffffffffffff ``` After: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 1000-element Vector{UInt64}: 0x000420647a085960 0x0038c5b889b585c6 0x007b29fae8107ff7 0x00e73b9e883ac1c8 ⋮ 0xfe68be9c0dde1e88 0xfeca042354218c35 0xfeeb8203e470c96b 0xfff5dbb8771307b9 julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000)) 1000-element Vector{UInt64}: 0x00126f951c1e56dc 0x0025a82477ffac08 0x002dd82c9986457a 0x00a713c4d56a3dbc ⋮ 0xfe2e40a5b345095e 0xfe77b90881967436 0xfea2559be63f1701 0xff88b5b28cefac5f ```
…Lang#53408) Issue raised [here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454). Given this definition, look at output values: ```jl using .Threads macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end function taskCorrelatedXoshiros(+ = +) r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64)) r01, r00 = (@fetch(rand(UInt64)), rand(UInt64)) r01 + r10 - (r00 + r11) end ``` Before: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 9-element Vector{UInt64}: 0x0000000000000000 0x0000000000000001 0x00000000007fffff 0x0000000000800000 0x0000000000800001 0xffffffffff7fffff 0xffffffffff800000 0xffffffffff800001 0xffffffffffffffff ``` After: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 1000-element Vector{UInt64}: 0x000420647a085960 0x0038c5b889b585c6 0x007b29fae8107ff7 0x00e73b9e883ac1c8 ⋮ 0xfe68be9c0dde1e88 0xfeca042354218c35 0xfeeb8203e470c96b 0xfff5dbb8771307b9 julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000)) 1000-element Vector{UInt64}: 0x00126f951c1e56dc 0x0025a82477ffac08 0x002dd82c9986457a 0x00a713c4d56a3dbc ⋮ 0xfe2e40a5b345095e 0xfe77b90881967436 0xfea2559be63f1701 0xff88b5b28cefac5f ```
Issue raised [here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454). Given this definition, look at output values: ```jl using .Threads macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end function taskCorrelatedXoshiros(+ = +) r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64)) r01, r00 = (@fetch(rand(UInt64)), rand(UInt64)) r01 + r10 - (r00 + r11) end ``` Before: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 9-element Vector{UInt64}: 0x0000000000000000 0x0000000000000001 0x00000000007fffff 0x0000000000800000 0x0000000000800001 0xffffffffff7fffff 0xffffffffff800000 0xffffffffff800001 0xffffffffffffffff ``` After: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 1000-element Vector{UInt64}: 0x000420647a085960 0x0038c5b889b585c6 0x007b29fae8107ff7 0x00e73b9e883ac1c8 ⋮ 0xfe68be9c0dde1e88 0xfeca042354218c35 0xfeeb8203e470c96b 0xfff5dbb8771307b9 julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000)) 1000-element Vector{UInt64}: 0x00126f951c1e56dc 0x0025a82477ffac08 0x002dd82c9986457a 0x00a713c4d56a3dbc ⋮ 0xfe2e40a5b345095e 0xfe77b90881967436 0xfea2559be63f1701 0xff88b5b28cefac5f ``` (cherry picked from commit 77c0672)
…Lang#53408) Issue raised [here](https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454). Given this definition, look at output values: ```jl using .Threads macro fetch(ex) :(fetch(@Spawn($(esc(ex))))) end function taskCorrelatedXoshiros(+ = +) r11, r10 = @fetch (@fetch(rand(UInt64)), rand(UInt64)) r01, r00 = (@fetch(rand(UInt64)), rand(UInt64)) r01 + r10 - (r00 + r11) end ``` Before: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 9-element Vector{UInt64}: 0x0000000000000000 0x0000000000000001 0x00000000007fffff 0x0000000000800000 0x0000000000800001 0xffffffffff7fffff 0xffffffffff800000 0xffffffffff800001 0xffffffffffffffff ``` After: ```jl julia> sort!(unique(taskCorrelatedXoshiros() for _ = 1:1000)) 1000-element Vector{UInt64}: 0x000420647a085960 0x0038c5b889b585c6 0x007b29fae8107ff7 0x00e73b9e883ac1c8 ⋮ 0xfe68be9c0dde1e88 0xfeca042354218c35 0xfeeb8203e470c96b 0xfff5dbb8771307b9 julia> sort!(unique(taskCorrelatedXoshiros(*) for _ = 1:1000)) 1000-element Vector{UInt64}: 0x00126f951c1e56dc 0x0025a82477ffac08 0x002dd82c9986457a 0x00a713c4d56a3dbc ⋮ 0xfe2e40a5b345095e 0xfe77b90881967436 0xfea2559be63f1701 0xff88b5b28cefac5f ```
Backported PRs: - [x] #39071 <!-- Add a lazy `logrange` function and `LogRange` type --> - [x] #51802 <!-- Allow AnnotatedStrings in log messages --> - [x] #53369 <!-- Orthogonalize re-indexing for FastSubArrays --> - [x] #48050 <!-- improve `--heap-size-hint` arg handling --> - [x] #53482 <!-- add IR encoding for EnterNode --> - [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t --> - [x] #53507 <!-- update staled `Core.Compiler.Effects` documentation --> - [x] #53408 <!-- task splitting: change additive accumulation to multiplicative --> - [x] #53523 <!-- add back an alias for `check_top_bit` --> - [x] #53377 <!-- add _readdirx for returning more object info gathered during dir scan --> - [x] #53525 <!-- fix InteractiveUtils call in Base.runtests on failure --> - [x] #53540 <!-- use more efficient `_readdirx` for tab completion --> - [x] #53545 <!-- use `_readdirx` for `walkdir` --> - [x] #53551 <!-- revert "Add @create_log_macro for making custom styled logging macros (#52196)" --> - [x] #53554 <!-- Always return a value in 1-d circshift! of abstractarray.jl --> - [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing queue --> - [x] #53571 <!-- Update Documenter to v1.3 for inventory writing --> - [x] #53403 <!-- Move parallel precompilation to Base --> - [x] #53589 <!-- add back `unsafe_convert` to pointer for arrays --> - [x] #53596 <!-- build: remove extra .a file --> - [x] #53606 <!-- fix error path in `precompilepkgs` --> - [x] #53004 <!-- Unexport with, at_with, and ScopedValue from Base --> - [x] #53629 <!-- typo fix in scoped values docs --> - [x] #53630 <!-- sroa: Fix incorrect scope counting --> - [x] #53598 <!-- Use Base parallel precompilation to build stdlibs --> - [x] #53649 <!-- precompilepkgs: package in boths deps and weakdeps are in fact only weak --> - [x] #53671 <!-- Fix bootstrap Base precompile in cross compile configuration --> - [x] #52125 <!-- Load Pkg if not already to reinstate missing package add prompt --> - [x] #53602 <!-- Handle zero on arrays of unions of number types and missings --> - [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created --> - [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug --> - [x] #53679 <!-- move precompile workload back from Base --> - [x] #53663 <!-- add isassigned methods for reinterpretarray --> - [x] #53662 <!-- [REPL] fix incorrectly cleared line after completions accepted --> - [x] #53611 <!-- Linalg: matprod_dest for Diagonal and adjvec --> - [x] #53659 <!-- fix #52025, re-allow all implicit pointer casts in cconvert for Array --> - [x] #53631 <!-- LAPACK: validate input parameters to throw informative errors --> - [x] #53628 <!-- Make some improvements to the Scoped Values documentation. --> - [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value --> - [x] #53391 <!-- Default to the medium code model in x86 linux --> - [x] #53699 <!-- Move `isexecutable, isreadable, iswritable` to `filesystem.jl` --> - [x] #41232 <!-- Fix linear indexing for ReshapedArray if the parent has offset axes --> - [x] #53527 <!-- Enable analyzegc checks for try catch and fix found issues --> - [x] #52092 - [x] #53682 <!-- Increase build precompilation --> - [x] #53720 - [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by innervars. --> Contains multiple commits, manual intervention needed: - [ ] #53305 <!-- Propagate inbounds in isassigned with CartesianIndex indices --> Non-merged PRs with backport label: - [ ] #53736 <!-- fix literal-pow to return the right type when the base is -1 --> - [ ] #53707 <!-- Make ScopedValue public --> - [ ] #53696 <!-- add invokelatest to on_done callback in bracketed paste --> - [ ] #53660 <!-- put Logging back in default sysimage --> - [ ] #53509 <!-- revert moving "creating packages" from Pkg.jl --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #51928 <!-- Styled markdown, with a few tweaks --> - [ ] #51816 <!-- User-themable stacktraces --> - [ ] #51811 <!-- Make banner size depend on terminal size --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
I want to see what package's tests "broke" from the RNG stream changing here. @nanosoldier |
Hopefully not too disruptive since it only changes the RNG stream in child tasks, not the seeded task; I suspect that it's fairly rare for packages to do multithreaded deterministic RNG testing. |
Issue raised here. Given this definition, look at output values:
Before:
After: