Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour of SharedArray in single core usage #10773

Closed
nilshg opened this issue Apr 8, 2015 · 21 comments
Closed

Unexpected behaviour of SharedArray in single core usage #10773

nilshg opened this issue Apr 8, 2015 · 21 comments
Labels
parallelism Parallel or distributed computation system:windows Affects only Windows

Comments

@nilshg
Copy link
Sponsor Contributor

nilshg commented Apr 8, 2015

See this discussion in the Julia users group:

When running a @sync @parallel loop which writes its results into different SharedArrays on just one core, some of the returned arrays will contain information of other arrays being assigned to. This does not happen when the code is run on mutiple cores. I'm copying my original example from the users group below; in this example the return array r2 will contain the results of r3, while the three arrays calculated in parallel contain the expected results:

x1 = linspace(1, 3, 3)
x2 = linspace(1, 3, 3)
x3 = linspace(1, 3, 3)

function getresults(x1::Array, x2::Array, x3::Array)
  result1 = SharedArray(Float64, (3,3,3))
  result2 = similar(result1)
  result3 = similar(result1)

  @sync @parallel for a=1:3
    for b=1:3
      for c=1:3
        result1[a,b,c] = x1[a]*x2[b]*x3[c]
        result2[a,b,c] = sqrt(x1[a]*x2[b]*x3[c])
        result3[a,b,c] = (x1[a]*x2[b]*x3[c])^2
      end
    end
  end
  return sdata(result1), sdata(result2), sdata(result3)
end

# Compute function using 1 core
(r1,r2,r3) = getresults(x1, x2, x3)

# Add remaining cores as workers, compute again
nprocs()==CPU_CORES || addprocs(CPU_CORES-1)
(r1_par,r2_par,r3_par) = getresults(x1, x2, x3)
@nilshg
Copy link
Sponsor Contributor Author

nilshg commented Apr 8, 2015

Just to add, one could "fix" the example above by initializing the result arrays as

nprocs() > 1 ? result1 = SharedArray(Float64, (3,3,3)) : result1 = Array(Float64, (3,3,3))

in case this isn't actually a bug but expected behaviour for SharedArray, in that case I would at least vote for mentioning this in the docs, as I for one spent half a day trying to figure out why my results changed so dramatically before realizing I had just forgotten to add workers...

@simonster
Copy link
Member

@nilshg I tried with two systems and wasn't able to reproduce this. Can you give the output of versioninfo()?

@nilshg
Copy link
Sponsor Contributor Author

nilshg commented Apr 9, 2015

Versioninfo:

Julia Version 0.3.7
Commit cb9bcae* (2015-03-23 21:36 UTC)
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

On this system, I'm getting the following:

sum(abs(r1-r1_par))  # 0.0
sum(abs(r2-r2_par))  # 2672.719
sum(abs(r3-r3_par))  # 0.0 
sum(abs(r2-r3_par))  # 0.0

The problem does not occur on the same machine using Julia Version 0.4.0-dev+4157 though.

@ihnorton ihnorton added the parallelism Parallel or distributed computation label Apr 11, 2015
@timholy
Copy link
Sponsor Member

timholy commented Apr 11, 2015

Works for me (sum(abs(r2-r2_par)) == 0) on

julia> versioninfo()
Julia Version 0.3.7-pre+1
Commit d15f183* (2015-02-17 22:12 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7 CPU       L 640  @ 2.13GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@tkelman
Copy link
Contributor

tkelman commented Apr 11, 2015

I can reproduce the problem with

Julia Version 0.3.6-pre+76
Commit 79846f8 (2015-02-17 00:52 UTC)
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

so it's likely a Windows-specific quirk in the SharedArray implementation. I think @twadleigh wrote that code?

@tkelman tkelman added the system:windows Affects only Windows label Apr 12, 2015
@twadleigh
Copy link
Contributor

I did write the code for the windows implementation. I didn't, however, do any testing beyond what was already in the testbed for the POSIX implementation.

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

Thanks Tracy. Would be helpful if someone who has a Windows machine and a bit of time can try tracking down the OS API calls that underlie the SharedArray operations and figure out more precisely what causes this.

@twadleigh
Copy link
Contributor

I just noticed that @nilshg says it is working on 0.4, which makes me scratch my head a bit.

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

We seem to be getting more and more "fixed on master but don't know by what" bugs. Unless we can find some obviously related bugfix that would be simple to backport, trying to bisect this on Windows could be a lot of work and might point to some major restructuring of internals that can't be backported.

@nilshg
Copy link
Sponsor Contributor Author

nilshg commented Apr 14, 2015

Apologies, I might have been a little quick in saying that it works on 0.4; just went back to double check and now I'm getting the same (wrong) results as on 0.3.7. Maybe others who are running both versions could quickly verify this?

@twadleigh
Copy link
Contributor

I think I just found the bug, and it is probably only windows-specific by accident. Check out:

shm_seg_name = string("/jl", getpid(), round(Int64,time() * 10^9))

The shared segment name is generated, in part, using system time. If you create shared arrays in succession too quickly (as in this example), you will get non-unique segment names.

Is the time returned from time() lower res on windows? If so, that could be why the problem is only noticeable there.

Anyway, the fix should be simple.

@twadleigh
Copy link
Contributor

Another reason why this may work on POSIX vs. Windows: there is no analog of shm_unlink for windows. It is a no-op there.

Still, the fix is to uniquify the segment name.

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

Good catch! I would not be at all surprised if time() were lower-resolution on Windows.

@timholy
Copy link
Sponsor Member

timholy commented Apr 14, 2015

That's indeed really good debugging, @twadleigh. What about using tempname?

@tkelman
Copy link
Contributor

tkelman commented Apr 14, 2015

There are some still-unresolved platform discrepancies regarding tempname - #9053

@ViralBShah
Copy link
Member

Cc @amitmurthy

@twadleigh
Copy link
Contributor

Would pid plus a sufficiently long randstring be sufficiently safe? Or maybe pid plus a munged stringification of a gensym?

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 14, 2015

Maybe try time_ns() instead of time()? That uses a different C call that should have higher precision.

@ihnorton
Copy link
Member

Rather than time, this could be done with Base.random.uuid4. Or on Windows there is also CoCreateGuid (I don't know how the strength compares).

@twadleigh
Copy link
Contributor

I'm going to put together a PR with a name made from some digits of the pid, some digits of time, and padded with randstring characters.

twadleigh added a commit to twadleigh/julia that referenced this issue Apr 18, 2015
Compensates for the lack of an analog of `shm_unlink` in Windows.

Addresses JuliaLang#10773.
@twadleigh
Copy link
Contributor

Went with 6 digits of pid with a long randstring.

twadleigh added a commit to twadleigh/julia that referenced this issue Apr 19, 2015
Compensates for the lack of an analog of `shm_unlink` in Windows.

Addresses JuliaLang#10773.
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
Compensates for the lack of an analog of `shm_unlink` in Windows.

Addresses JuliaLang#10773.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

9 participants