-
Notifications
You must be signed in to change notification settings - Fork 3
/
BUGS
58 lines (57 loc) · 4.32 KB
/
BUGS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---------------------------------------------------------------------------
SET_Delete_Node and MAP_Delete_Node do not seem to be correct
in terms of returning the next node in turn (valgrinded out)
--------------------------------------------------------------------------------
cvi.c needs a better input point selection, see: ./cvitest inp/cvi.13
-----------------------------------------------------------------------------
Contact detection still crude: see inp/tetpart.py
-------------------------------------------------------------
Sweep algorithms get stuck when run on many cpus (e.g. -np 8 with mpi1.py and N=20)
GUESS: insertion sort is run on badly ordered sets with O(n ^2) complexity
2011: seems to be still there
------------------------------------------------------------------------------------
Too many cpus when compared to problem size cause parallel consitency issues:
e.g. => 10x10x1 simplified core on 64 CPUs
------------------------------------------------------------------------------------
Zoltan bug for 10x10x1 simplified core and 32 CPUs =>
body extents contain constraint extents and yet for example:
Body extents Zoltan_LB_Box_Assign processors: 28
Constraint point Zoltan_LB_Point_Assign processor: 6
------------------------------------------------------------------------------------
Intel compilers based compilation gets stuck in parallel (with openmpi/intel)
-------------------------------------------------------------------------------------------------------------------------
Run inp/ellcomp.py for > 30000 ellipsoids on >= 16 CPUs. They are initially all well separated.
Q: Why is the initial PARBALL time so dominant?
A: Zoltan rebalancing time is very long even though little change is made to the domain. As soon as you get behind
one cluste node the communication time inside of Zoltan becomes quite long.
Current solution: Repartition every 'dom->updatefreq' steps
A: that was a cluster node bug in fact
-------------------------------------------------------------------------------------------------------------------------
valgrind any example with HDF5 output enabled and look into the initial warnings:
==30178== Syscall param write(buf) points to uninitialised byte(s)
==30178== at 0x101862C22: write (in /usr/lib/system/libsystem_kernel.dylib)
==30178== by 0x100F07EF5: H5FD_sec2_write (in /usr/local/hdf5/lib/libhdf5.7.dylib)
==30178== by 0x1000DCACD: write_frame (pbf.c:82)
==30178== by 0x1000DC9A2: PBF_Close (pbf.c:268)
==30178== by 0x10013982C: write_state (sol.c:251)
==30178== by 0x100139054: SOLFEC_Run (sol.c:747)
==30178== by 0x10012D381: lng_RUN (lng.c:7537)
==30178== by 0x100CFCD8B: PyEval_EvalFrameEx (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100CF9351: PyEval_EvalCodeEx (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100CF8DCA: PyEval_EvalCode (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100D1900D: PyParser_ASTFromFile (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100D190B0: PyRun_FileExFlags (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== Address 0x10b206cd6 is 134 bytes inside a block of size 140 alloc'd
==30178== at 0x1004BD3B1: malloc (vg_replace_malloc.c:303)
==30178== by 0x100F0BFAD: H5FL_malloc (in /usr/local/hdf5/lib/libhdf5.7.dylib)
==30178== by 0x1000DCACD: write_frame (pbf.c:82)
==30178== by 0x1000DC9A2: PBF_Close (pbf.c:268)
==30178== by 0x10013982C: write_state (sol.c:251)
==30178== by 0x100139054: SOLFEC_Run (sol.c:747)
==30178== by 0x10012D381: lng_RUN (lng.c:7537)
==30178== by 0x100CFCD8B: PyEval_EvalFrameEx (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100CF9351: PyEval_EvalCodeEx (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100CF8DCA: PyEval_EvalCode (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100D1900D: PyParser_ASTFromFile (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
==30178== by 0x100D190B0: PyRun_FileExFlags (in /System/Library/Frameworks/Python.framework/Versions/2.7/Python)
-------------------------------------------------------------------------------------------------------------------------