Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgap: Fix GC crashes on aarch64 #34701

Closed
collares opened this issue Oct 28, 2022 · 5 comments · Fixed by #35114
Closed

libgap: Fix GC crashes on aarch64 #34701

collares opened this issue Oct 28, 2022 · 5 comments · Fixed by #35114

Comments

@collares
Copy link
Contributor

collares commented Oct 28, 2022

A refactor in #27946 introduced "unprotected" (not surrounded by GAP_Enter/GAP_Leave) GAP_ValueGlobalVariable calls. I believe this might be a GC hazard, because after updating to GAP 4.12.1 I started seeing aarch64 crashes on NixOS infrastructure such as:

#0  0x0000fffff79740e8 in wait4 ()
#1  0x0000fffff5dc6b78 in print_enhanced_backtrace ()
#2  0x0000fffff5dc8190 in sigdie ()
#3  0x0000fffff5dcb1c0 in cysigs_signal_handler ()
#4  0x0000fffff7ffb7cc in __kernel_rt_sigreturn ()
#5  0x0000ffff99a0bf28 in ConvString ()
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000ffff99989930 in Pr ()
#10 0x0000ffff9998aa18 in CloseOutput ()
#11 0x0000ffff99884828 in capture_stdout () at /build/sage-src-9.7/pkgs/sagemath-standard/sage/libs/gap/element.pyx:154
...

I also see cases where capture_stdout throws errors such as sage.libs.gap.util.GAPError: Error, Length: <list> must be a list (not the integer 255) and then crashes. Both types of errors are fixed by this ticket.

Note that I am nesting GAP_Enter/GAP_Leave calls because I didn't remove the preexisting calls inside capture_stdout. That's because I feared removing the innermost calls might create a new footgun (and I believe nested GAP_Enter/GAP_Leave calls are explicitly supported), but removing them should cause no problem. Removing them might even be preferable for performance reasons, I don't know.

This ticket's branch is based on #34391.

(Edit: I previously thought that holding onto a char* past the GAP_Enter/GAP_Leave blocks, as is done in GapElement's __str__ and _repr_ methods, could also cause GC hazards. It doesn't seem to be a problem in practice, though, and I am not qualified to tell if it's a problem in theory.)

Depends on #34391

CC: @embray @videlec @dimpase

Component: interfaces

Author: Mauricio Collares

Issue created by migration from https://trac.sagemath.org/ticket/34701

Edit: Removed branch. The code changes are now at #35114.

@collares collares added this to the sage-9.8 milestone Oct 28, 2022
@collares

This comment has been minimized.

@collares

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 29, 2022

Changed commit from 49be111 to 90acc7f

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 29, 2022

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

90acc7fSurround GAP_ValueGlobalVariable calls by GAP_Enter/GAP_Leave

@mkoeppe
Copy link
Contributor

mkoeppe commented Jan 28, 2023

Author: Mauricio Collares

@mkoeppe mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023
@vbraun vbraun closed this as completed in 0d01c38 Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants