Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make display detection more robust. #916

Merged
merged 6 commits into from
Jul 27, 2023

Conversation

nuclearsandwich
Copy link
Contributor

Attempting to improve the display detection for the Linux GPU agents to resolve #906.

I am not certain whether this change makes the detection logic much more
readable but I do think it makes it slightly more robust while also
spreading it out a little.
5 seconds is probably on the long side but if 5s is not long enough then
  my hypothesis that the issue is startup time may be incorrect.
Substitutions don't print by default when print suppression is on.
Comment on lines +12 to +15
# Strip the path and leading X from the X11 socket
# but check that the resulting string is numeric and
# non-empty before exporting.
DISPLAY=:$(basename $i | sed -n -E 's/^X([0-9]+)/:\1/p')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$i is no longer just the value of the DISPLAY variable so once confirming a likely active X11 server we need to get the value of DISPLAY for export. Compared with the prior substitution this one suppresses values that do not match X[0-9]+.

Comment on lines 16 to 18
if [ -n $DISPLAY ]; then
export DISPLAY
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the variable value could be empty, but an earlier socket file may have set a DISPLAY, only export the current value if it is not empty.

Comment on lines +86 to +87
# Wait for lightdm service to restart X11.
sleep 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 seconds is possibly way too long, but if this does not work as expected we likely need a different hypothesis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to figure out how we can inspect systemd activations, maybe a loop with a timeout on systemctl is-active can work here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not certain that the watching lightdm activation / status would be sufficient to resolve the race since I don't know if lightdm uses systemd's notification API for actually declaring readiness or if it's a simple/exec type service which is considered active as soon as the process runs https://www.freedesktop.org/software/systemd/man/systemd.service.html in which case we'd still need to wait ourselves for X11 to start.

Copy link
Contributor

@j-rivero j-rivero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a fix for bash. I left some other minor comments.

# but check that the resulting string is numeric and
# non-empty before exporting.
DISPLAY=:$(basename $i | sed -n -E 's/^X([0-9]+)/:\1/p')
if [ -n $DISPLAY ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ -n $DISPLAY ]; then
if [[ -n $DISPLAY ]]; then

dragons are in [ ] single brackets https://www.shellcheck.net/wiki/SC2070

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added quotes in 73ba462 I prefer POSIX compatible shell to Bash-isms where possible, although since this is an explicitly bash script I don't mind being overridden on this point.

Comment on lines +86 to +87
# Wait for lightdm service to restart X11.
sleep 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to figure out how we can inspect systemd activations, maybe a loop with a timeout on systemctl is-active can work here.

@j-rivero j-rivero merged commit 1fa30be into master Jul 27, 2023
1 check passed
@j-rivero j-rivero deleted the nuclearsandwich/export-display-variable branch July 27, 2023 16:13
Crola1702 added a commit that referenced this pull request Jul 31, 2023
Signed-off-by: Crola1702 <cristobal.arroyo@ekumenlabs.com>
j-rivero added a commit that referenced this pull request Aug 1, 2023
j-rivero added a commit that referenced this pull request Aug 1, 2023
* Resurrect changes in (#916) (#978)
* Remove extra : in DISPLAY variable

Signed-off-by: Jose Luis Rivero <jrivero@osrfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DISPLAY environment variable race condition when checking GPU support
2 participants