Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate segfault when building docs #119

Closed
JoshuaLampert opened this issue Mar 26, 2024 · 6 comments · Fixed by #120
Closed

Investigate segfault when building docs #119

JoshuaLampert opened this issue Mar 26, 2024 · 6 comments · Fixed by #120

Comments

@JoshuaLampert
Copy link
Member

Sometimes we observe a segfault when building the docs both in CI (e.g. here), but I am also able to reproduce this (only stochastically) locally. I was able to reproduce (only sometimes) the segfault for the minimal example, where only the troubleshooting.md page is built with the only content being

```julia
using P4est
```
@JoshuaLampert
Copy link
Member Author

Ok, I finally localized the problem. What I wrote above is not right. The problem does not occur in the case of only building troubleshooting.md (I didn't know that all *.md files located in docs/src are actually built, even if they are not inside the pages list). The actual problem lies in the introduction.md, more precisely in the p4est_destroy e.g. here. Sometimes, I get e.g. with julia --project=. inside the main P4est.jl directory:

julia> using P4est, MPI; MPI.Init()
MPI.ThreadLevel(2)

julia> connectivity = p4est_connectivity_new_periodic()
Ptr{p4est_connectivity} @0x0000000000a0b750

julia> connectivity_pw = PointerWrapper(connectivity)
PointerWrapper{p4est_connectivity}(Ptr{p4est_connectivity} @0x0000000000a0b750)

julia> connectivity_pw.num_trees[]
1

julia> p4est = p4est_new_ext(MPI.COMM_WORLD, connectivity, 0, 0, true, 0, C_NULL, C_NULL)
Into p4est_new with min quadrants 0 level 0 uniform 1
New p4est with 1 trees on 1 processors
Initial level 0 potential global quadrants 1 per tree 1
Done p4est_new with 1 total quadrants
Ptr{P4est.LibP4est.p4est} @0x000000000099ac90

julia> p4est_pw = PointerWrapper(p4est)
PointerWrapper{P4est.LibP4est.p4est}(Ptr{P4est.LibP4est.p4est} @0x000000000099ac90)

julia> p4est_pw.connectivity.num_trees[]
1

julia> p4est_connectivity_destroy(connectivity)

julia> p4est_destroy(p4est)

[17232] signal 11 (1): Segmentation fault
in expression starting at REPL[9]:1
sc_free_aligned at /workspace/srcdir/p4est-2.8/sc/src/sc.c:469 [inlined]
sc_free at /workspace/srcdir/p4est-2.8/sc/src/sc.c:677
sc_array_reset at /workspace/srcdir/p4est-2.8/sc/src/sc_containers.c:167
p4est_destroy at /workspace/srcdir/p4est-2.8/src/p4est.c:522
p4est_destroy at /home/lampert/.julia/dev/P4est/src/LibP4est.jl:4247
unknown function (ip: 0x745e47f0c5b5)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2165 [inlined]
do_call at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:675
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:815
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:886
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:886
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:428 [inlined]
eval_user_input at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:224
repl_backend_loop at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:320
#start_repl_backend#59 at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:305
start_repl_backend at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:302
#run_repl#72 at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:461
run_repl at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:447
unknown function (ip: 0x745e47f01b96)
#1127 at ./client.jl:441
jfptr_YY.1127_17256 at /home/lampert/opt/julia/julia-1.11.0-alpha1/share/julia/compiled/v1.11/REPL/u0gqU_F0RV5.so (unknown line)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2165 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1030 [inlined]
invokelatest at ./essentials.jl:1027 [inlined]
run_main_repl at ./client.jl:425
repl_main at ./client.jl:562 [inlined]
_start at ./client.jl:536
jfptr__start_69069.1 at /home/lampert/opt/julia/julia-1.11.0-alpha1/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2165 [inlined]
true_main at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/jlapi.c:898
jl_repl_entrypoint at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/jlapi.c:1057
main at julia (unknown line)
unknown function (ip: 0x745e5e029d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 2107404 (Pool: 2107307; Big: 97); GC: 3
Segmentation fault

It could be related to the PointerWrapper as I didn't get the segfault (after several tries) after removing the PointerWrapper lines from above.

@ranocha
Copy link
Member

ranocha commented Mar 27, 2024

Are PointerWrappers collected by the GC? Does anything change if you call the GC manually?

@JoshuaLampert
Copy link
Member Author

Difficult to say as most of the time it works anyway. Putting GC.gc() before p4est_destroy(p4est) always worked for me, but that doesn't necessarily mean that it fixes the problem. We would rather expect that if PointerWrappers are collected by the GC it would segfault after manually garbage collecting, right? But the latter does not seem to be the case.

@ranocha
Copy link
Member

ranocha commented Mar 27, 2024

It's indeed weird. PointerWrappers are also just structs (nothing mutable), so they should be allocated on the stack and the GC should not interact with them.

@lcw
Copy link
Contributor

lcw commented Mar 27, 2024

I am not sure this is your issue but you need to call p4est_destroy(p4est) before calling p4est_connectivity_destroy(connectivity). This is because p4est_destroy references the connectivity here
https://github.com/cburstedde/p4est/blob/9d68673609ebd4e44e7ee026ddbdefe53d6b8b2e/src/p4est.c#L512-L513.

@JoshuaLampert
Copy link
Member Author

Thanks a lot @lcw! That would make sense. In the tests we always have p4est_destroy before p4est_connectivity_destroy. That would also explain why we've never noticed it there, but only in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants