build results vary from parallelism #4611

bmwiedemann · 2023-07-28T18:27:59Z

Bug report

Bug summary

After fixing #4609, there is some other remaining issue and my tools said, it is about the number of cores I give the build-VM.

Code for reproduction

build once each in a 1-core-VM and a 2-core-VM

osc co openSUSE:Factory/python-yt && cd $_
for N in 1 2 ; do
    osc build --noservice --vm-type=kvm -j$N --keep-pkg=RPMS.$N
    md5sum `find /var/tmp/build-root/standard-x86_64/home/abuild/rpmbuild/ -name \*.c|sort` > $N.md5
done
diff -u ?.md5

Actual outcome

bounding_volume_hierarchy.cpython-310-x86_64-linux-gnu.so and other binaries vary

/home/abuild/rpmbuild/BUILD/yt-4.1.4/build/lib.linux-x86_64-cpython-311/yt/utilities/lib/bounding_volume_hierarchy.cpp        2039-08-30 07:17:28.000000000 +0000
@@ -7059,7 +7059,7 @@            
                                 goto __pyx_L18;
                                 __pyx_L18:;
                                 #ifdef _OPENMP
-                                #pragma omp critical(__pyx_parallel_lastprivates1)                              
+                                #pragma omp critical(__pyx_parallel_lastprivates0)                                  
                                 #endif /* _OPENMP */
                                 {   
                                     __pyx_parallel_temp0 = __pyx_v_i;

Expected outcome

It should be possible to create bit-identical results (currently, that works only by doing all builds in 1-core-VMs)

Version Information

Operating System: openSUSE-Tumbleweed 20230724
Python Version: 3.9 - 3.11
yt version: 4.1.4
Other Libraries (if applicable):

This bug was found while working on reproducible builds for openSUSE.

The text was updated successfully, but these errors were encountered:

welcome · 2023-07-28T18:28:01Z

Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.

neutrinoceros · 2023-07-28T18:53:12Z

I half-remember similar problems being raised in the past by downstream packagers (sometimes parallel builds may even crash). Any leads on what we should be on the look out for ?

bmwiedemann · 2023-07-29T05:29:47Z

From the context, it could come somewhere out of cython - it would not be the first compiler that has trouble with parallel processing.

But I know too little about your setup. python-scikit-learn seems to have a similar problem.

neutrinoceros · 2023-07-31T19:37:26Z

related issue: #4278

bmwiedemann · 2024-12-02T12:59:06Z

4.4.0 still has this problem, so it seems to be unrelated to #4278 .

I used strace to find where pixelization_routines.cpp comes from and found that it is written twice by python3 -mpip wheel . Is there some internal parallelism and maybe those two variants differ and when building on multicore machines, it is decided by a race which variant wins (is written last)?

Also the omp in the diff could that mean openmp is involved.

chrishavlin · 2024-12-02T16:24:07Z

bounding_volume_hierarchy.cpython-310-x86_64-linux-gnu.so and other binaries vary

Also the omp in the diff could that mean openmp is involved.

Both pixelization_routines.pyx and bounding_volume_hierarchy.pyx contain extra compiler directives related to openmp compiler arguments:

# distutils: extra_compile_args = CPP14_FLAG OMP_ARGS
# distutils: extra_link_args = CPP14_FLAG OMP_ARGS

and there are a number of other files (that utilize cython.parallel) with the same directives -- here's a list that uses the extra omp args internally: bounding_volume_hierarchy.pyx, image_samplers.pyx, image_utilities.pyx, geometry_utils.pyx, pixelization_routines.pyx, _octree_raytracing.pyx . I'd be curious to know if the list of binaries that vary matches that list, which could indicate an issue with how omp is being linked?

bmwiedemann · 2024-12-02T17:33:16Z

I found variations are only in 2 .cpp files:

/home/abuild/rpmbuild/BUILD/yt-4.4.0/yt/utilities/lib/misc_utilities.cpp
/home/abuild/rpmbuild/BUILD/yt-4.4.0/yt/utilities/lib/pixelization_routines.cpp

This is certainly before compiling and linking. Somehow the cpp code-generation introduces non-determinism.

I did a build with debuginfo that contains these diffs at the very end:
https://rb.zq1.de/compare.factory-20241129/diffs/python-yt-4.4-diff.txt

bmwiedemann · 2024-12-03T06:47:11Z

The variations are somewhat random. After 5 tries, I also had these once in
/home/abuild/rpmbuild/BUILD/yt-4.4.0/yt/utilities/lib/image_samplers.cpp

matthewturk · 2024-12-03T18:51:55Z

Is it possible that if we "upgraded" all the C++ to use the same C++ standard the variations would lessen or disappear?

bmwiedemann · 2024-12-04T06:46:02Z

I don't think so, because this diff is already in the C++ source code. The code that generates pixelization_routines.cpp from pixelization_routines.pyx is the part that likely needs fixing. If it operates by the principle of "garbage in => garbage out", fixing the input data (.pyx) could also help.

neutrinoceros · 2024-12-04T07:41:40Z

Do we have any indication that this could (or couldn't) be an upstream bug in Cython ?

bmwiedemann · 2024-12-04T08:06:02Z

Here is one data point: I found 160 packages in openSUSE that use cython, but the only of them with this kind of variations in a .cpp file is python-yt.

matthewturk · 2024-12-04T14:48:28Z

That's a genuinely convincing data point. ;-)

My thought process for my question was that if we're including them (via .pxd inclusion, for instance) and there is some temporary file generated, it could be that the order in which that temporary generated file is created is the dominant factor.

chrishavlin · 2024-12-04T19:01:40Z

Some progress I think -- First, I was able to reproduce the difference locally: setting OMP_NUM_THREADS in my environment (e.g., export OMP_NUM_THREADS=3) and building as normal (i already had openmp installed), the generated pixelization_routines.cpp and misc_utilities.cpp would end up with those #pragma omp critical(__pyx_parallel_lastprivates0) lines reported in @bmwiedemann's initial comment. On rebuilding, I'd occasionally get variations in the trailing integer in __pyx_parallel_lastprivates0, which would lead to the diffs that @bmwiedemann described.

Actually looking up what that pragma omp critical directive means (link):

the omp critical directive identifies a section of code that must be executed by a single thread at a time.

The .pyx lines that generate these omp directives correspond to prange loops, which is potentially troubling if this directive mean that those prange loops will be forced to run as single threaded loops.

Potential fixes

pixelization_routines.pyx

The two prange loops re-acquire the gil to check for error signals, removing those checks remove the omp critical directives. i.e., building with the following diff:

diff --git a/yt/utilities/lib/pixelization_routines.pyx b/yt/utilities/lib/pixelization_routines.pyx
index a6a9f23d9..3846777b1 100644
--- a/yt/utilities/lib/pixelization_routines.pyx
+++ b/yt/utilities/lib/pixelization_routines.pyx
@@ -1213,10 +1213,6 @@ def pixelize_sph_kernel_projection(
             local_buff[i] = 0.0
 
         for j in prange(0, posx.shape[0], schedule="dynamic"):
-            if j % 100000 == 0:
-                with gil:
-                    PyErr_CheckSignals()
-
             xiter[1] = yiter[1] = ziter[1] = 999
 
             if check_period[0] == 1:
@@ -1569,9 +1565,6 @@ def pixelize_sph_kernel_slice(
             local_buff[i] = 0.0
 
         for j in prange(0, posx.shape[0], schedule="dynamic"):
-            if j % 100000 == 0:
-                with gil:
-                    PyErr_CheckSignals()

results in a pixelization_routines.cpp without those #pragma omp critical(__pyx_parallel_lastprivates0) lines. I'm not actually sure if the directive would result in the entire prange loop running on a single thread (bad) or just the error check (probably fine).

`misc_utilities.pyx`

Slightly more complicated -- misc_utilities.pyx similarly reacquires the gil within a prange loop in order to check for errors and update a progress bar but it also includes a conditional break in the prange, I had to remove both with the following diff to avoid the #pragma omp critical(__pyx_parallel_lastprivates0) lines in the generated cpp:

diff --git a/yt/utilities/lib/misc_utilities.pyx b/yt/utilities/lib/misc_utilities.pyx
index 2a5e61efa..cee791500 100644
--- a/yt/utilities/lib/misc_utilities.pyx
+++ b/yt/utilities/lib/misc_utilities.pyx
@@ -979,8 +979,6 @@ def gravitational_binding_energy(
     cdef np.float64_t this_potential
 
     n_q = mass.size
-    pbar = get_pbar("Calculating potential for %d cells with %d thread(s)" % (n_q,num_threads),
-        n_q)
 
     # using reversed iterator in order to make use of guided scheduling
     # (inner loop is getting more and more expensive)
@@ -1003,14 +1001,7 @@ def gravitational_binding_energy(
                    (y_i - y_o) * (y_i - y_o) +
                    (z_i - z_o) * (z_i - z_o))
         total_potential += this_potential
-        if truncate and this_potential / kinetic > 1.:
-            break
-        with gil:
-            PyErr_CheckSignals()
-            # this call is not thread safe, but it gives a reasonable approximation
-            pbar.update()
-
-    pbar.finish()
+
     return total_potential

removing the progress bar is not a big deal IMO. But it doesn't seem correct to simply remove the early break -- is this potentially a cython bug? or does it make sense for a break to force single-threaded execution?

thoughts

So it seems that either of the following in a prange results in the # pragma omp critical(__pyx_parallel_lastprivates0) directives being added to the generated cpp:

re-acquiring the gil within a prange loop (to check error state)
adding a break

And because those __pyx_parallel_lastprivates0 lines will vary in final integer depending on number of threads (and also I've seen between builds with the same number of threads), you end up with diffs in the cpp and resulting binaries.

What I don't know is if these are yt bugs or potential cython bugs or even a bug at all? But that's where it's coming from...

(note I did push up the above diffs to a branch here, but have not submitted a PR cause I think some discussion is warranted first)

neutrinoceros · 2024-12-06T13:50:35Z

@chrishavlin wow, thanks for this in depth inquiry !
I think it's fine to remove these calls to PyErr_CheckSignals (they should be most useful for debbuging anyway) and I agree that dropping a progress bar isn't a big deal. I suspect the easiest way we could learn whether break inside prange creating a critical section is bug in Cython or not would be by asking upstream.

bmwiedemann · 2024-12-06T15:30:47Z

Thanks @chrishavlin for the great analysis. I tested your patch and found that yt-4.4.0/yt/utilities/lib/image_samplers.cpp still has similar variations (might be low-entropy, so could need a few tries to trigger)

--- /var/tmp/build-root.12/.mount/home/abuild/rpmbuild/BUILD/yt-4.4.0/yt/utilities/lib/image_samplers.cpp       2024-12-06 14:55:24.336666665 +0000
+++ /var/tmp/build-root.12b/.mount/home/abuild/rpmbuild/BUILD/yt-4.4.0/yt/utilities/lib/image_samplers.cpp      2041-01-08 04:12:36.353333332 +0000
@@ -25213,7 +25213,7 @@
                                 goto __pyx_L26;
                                 __pyx_L26:;
                                 #ifdef _OPENMP
-                                #pragma omp critical(__pyx_parallel_lastprivates0)
+                                #pragma omp critical(__pyx_parallel_lastprivates1)
                                 #endif /* _OPENMP */
                                 {   
                                     __pyx_parallel_temp0 = __pyx_v_i;
@@ -26264,7 +26264,7 @@
                                 goto __pyx_L27;
                                 __pyx_L27:;
                                 #ifdef _OPENMP
-                                #pragma omp critical(__pyx_parallel_lastprivates1)
+                                #pragma omp critical(__pyx_parallel_lastprivates2)
                                 #endif /* _OPENMP */
                                 {
                                     __pyx_parallel_temp0 = __pyx_v_i;

chrishavlin · 2024-12-06T16:07:32Z

@bmwiedemann image_samplers.pyx includes a similar check for python error signals (re-acquiring the GIL) within a prange, I'll try removing that too, will let you know when up update my branch.

@neutrinoceros ya, asking upstream is probably the thing to do but I'll see if i can work out a simpler reproducible example first.

chrishavlin · 2024-12-06T16:23:56Z

ok, removing the critical sections from the generated image_samplers.cpp is a bit more complicated -- need to remove the python error check as expected but also need a small refactor due to these lines that occur within a prange:

yt/yt/utilities/lib/image_samplers.pyx

Lines 285 to 286 in 6672c17

    
           for i in range(Nch): 
        
               self.image[vi, vj, i] = idata.rgba[i]

That access to self.image gives rise to one of the critical directives: the fix will be to refactor this to use a local image buffer for each omp thread (e.g., similar to what's done in pixelize_sph_kernel_projection in pixelization_routines.pyx with the local_buff variable here).

matthewturk · 2024-12-06T16:32:15Z

A different image buffer for each thread would be a pretty big memory increase -- looking at Npix by Nch, which gets pretty big on images we'd want to be small. Can't we instead cache a reference to it (so that it's not doing self.) and then use standard OMP directives to make sure there's no overlap?

chrishavlin · 2024-12-06T17:02:17Z

Can't we instead cache a reference to it (so that it's not doing self.) and then use standard OMP directives to make sure there's no overlap?

Good point on the memory increase -- this does sound like a better approach. And I don't think this particular loop should actually have any overlap as it is, but I could use some ideas on what "use standard OMP directives" to use here :) I know you can use omp functions directly...

neutrinoceros added the build related to the build process label Jul 31, 2023

neutrinoceros mentioned this issue Mar 25, 2024

ENH: Add a few bitarray functions and tests #4525

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build results vary from parallelism #4611

build results vary from parallelism #4611

bmwiedemann commented Jul 28, 2023 •

edited

Loading

welcome bot commented Jul 28, 2023

neutrinoceros commented Jul 28, 2023

bmwiedemann commented Jul 29, 2023

neutrinoceros commented Jul 31, 2023

bmwiedemann commented Dec 2, 2024

chrishavlin commented Dec 2, 2024

bmwiedemann commented Dec 2, 2024

bmwiedemann commented Dec 3, 2024

matthewturk commented Dec 3, 2024

bmwiedemann commented Dec 4, 2024

neutrinoceros commented Dec 4, 2024

bmwiedemann commented Dec 4, 2024

matthewturk commented Dec 4, 2024

chrishavlin commented Dec 4, 2024

neutrinoceros commented Dec 6, 2024

bmwiedemann commented Dec 6, 2024

chrishavlin commented Dec 6, 2024

chrishavlin commented Dec 6, 2024 •

edited

Loading

matthewturk commented Dec 6, 2024

chrishavlin commented Dec 6, 2024 •

edited

Loading

build results vary from parallelism #4611

build results vary from parallelism #4611

Comments

bmwiedemann commented Jul 28, 2023 • edited Loading

Bug report

welcome bot commented Jul 28, 2023

neutrinoceros commented Jul 28, 2023

bmwiedemann commented Jul 29, 2023

neutrinoceros commented Jul 31, 2023

bmwiedemann commented Dec 2, 2024

chrishavlin commented Dec 2, 2024

bmwiedemann commented Dec 2, 2024

bmwiedemann commented Dec 3, 2024

matthewturk commented Dec 3, 2024

bmwiedemann commented Dec 4, 2024

neutrinoceros commented Dec 4, 2024

bmwiedemann commented Dec 4, 2024

matthewturk commented Dec 4, 2024

chrishavlin commented Dec 4, 2024

Potential fixes

pixelization_routines.pyx

misc_utilities.pyx

thoughts

neutrinoceros commented Dec 6, 2024

bmwiedemann commented Dec 6, 2024

chrishavlin commented Dec 6, 2024

chrishavlin commented Dec 6, 2024 • edited Loading

matthewturk commented Dec 6, 2024

chrishavlin commented Dec 6, 2024 • edited Loading

bmwiedemann commented Jul 28, 2023 •

edited

Loading

`misc_utilities.pyx`

chrishavlin commented Dec 6, 2024 •

edited

Loading

chrishavlin commented Dec 6, 2024 •

edited

Loading