Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FvwmPager crashes at restart #1137

Open
dirteat opened this issue Dec 10, 2024 · 18 comments
Open

FvwmPager crashes at restart #1137

dirteat opened this issue Dec 10, 2024 · 18 comments
Assignees
Labels
type:bug Something's broken!
Milestone

Comments

@dirteat
Copy link

dirteat commented Dec 10, 2024

Hi there,

  • Fvwm3 version (run: fvwm3 --version)
    fvwm3 1.1.1 (released)
    with support for: ReadLine, XPM, PNG, SVG, Shape, XShm, SM, Bidi text, XRandR, XRender, XCursor, XFT, NLS

  • Linux distribution or BSD name/version
    Mageia, devel version Cauldron

  • Platform (run: uname -sp)
    x86_64

Expected Behaviour

My config Swallow the FvwmPager into a button, works fine when fvwm3 starts, but the pager disappear when hitting the "restart" menu and the logs show that it crashed.

Enabling logging

pkill -USR2 fvwm3

Nothing in the logs appears.

Steps to Reproduce

How can the problem be reproduced? For this, the following is helpful:

  • Reduce the problem to the smallest fvwm configuration example (where
    possible). In progress!

  • Does the problem also happen with Fvwm2?
    The problem does not appear with fvwm2, but it was also not present with the previous version of fvwm3, 1.0.6a

Include your configuration with this issue.
It is very long, I'll provide a thin one once I have isolated the minimal config triggering it. Possibly the swallowing?

Test (EnvIsSet FVWM_IS_FVWM3) *DeskButtonsBox: (Swallow (NoHints) "Desk 0" "FvwmPager")

Does Fvwm3 crash?

No, but the Pager module does:

: [🡕] Process 2649567 (FvwmPager) of user 1000 dumped core.

Stack trace of thread 2649567:
#0 0x0000000000408d4d set_vp_size_and_loc (FvwmPager + 0x8d4d)
#1 0x000000000040e198 initialize_fpmonitor_windows.part.0 (FvwmPager + 0xe198)
#2 0x000000000040ed81 init_fvwm_pager (FvwmPager + 0xed81)
#3 0x000000000040755b main (FvwmPager + 0x755b)
#4 0x00007fa3be2d1737 __libc_start_call_main (libc.so.6 + 0x23737)
#5 0x00007fa3be2d17f5 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x237f5)
#6 0x0000000000407981 _start (FvwmPager + 0x7981)
ELF object binary architecture: AMD x86-64

Cheers,
Chris.

@dirteat dirteat added the type:bug Something's broken! label Dec 10, 2024
@ThomasAdam ThomasAdam self-assigned this Dec 10, 2024
@ThomasAdam ThomasAdam added this to the 1.1.2 milestone Dec 10, 2024
@ThomasAdam ThomasAdam added this to FVWM3 Dec 10, 2024
@github-project-automation github-project-automation bot moved this to To do in FVWM3 Dec 10, 2024
@somiaj
Copy link
Collaborator

somiaj commented Dec 10, 2024

One thing I do notice is your FvwmButtons configuration shouldn't use the name of the page, if you add a name to Desk 0, the swallow command may not be able to match. Use the class/resource. Something like this would be better.
Also can you share your full FvwmPager configuration.

Test (EnvIsSet FVWM_IS_FVWM3) *DeskButtonsBox: (Swallow (NoHints) "FvwmPager" "FvwmPager")

@somiaj
Copy link
Collaborator

somiaj commented Dec 10, 2024

@ThomasAdam the only thing I can see is the method set_vp_size_and_loc does divide by fp->virtual_scr.{VxPages,VyPages,VWwidth,VHeight}, is there any chance one of those is zero. This isn't something I've ran into in the numerous times I restart fvwm though. Wonder if there is some issue with their monitor setup?

@ThomasAdam
Copy link
Member

ThomasAdam commented Dec 10, 2024

@somiaj

Maybe. But the stacktrace would indicate a SIGFPE -- although I can't tell from the informatioon we've been provided if that's true.

@dirteat -- I'm going to need to see a backtrace with debug symbols, please. I'd also want to see a fuller log from $FVWM_USERDIR/fvwm3-output.log. If you're able to reproduce this, you can add -v to the fvwm3 commandline, and send that log file through.

@dirteat
Copy link
Author

dirteat commented Dec 10, 2024

All right, thanks for the feedback, I'll fix my swallow! I'll provide a bt, but the segfault is occuring one over many restarts (I can tell now, I've been toying with that since a few hours), and at all restarts the pager is not restarted. So the segfault and my issue here are not in one-to-one correspondence.

BUT, I finally found out the culprit in my configuration:

*FvwmPager: HilightColorset * 8

The number "8" is just a user-defined colorset:

Colorset 8 fg white, bg black, RootTransparent buffer, Tint green 35, \
	 bgTint green 35 

The Pager not restarting is occurring as soon as I set this option. Reading the man, I've just discovered it was deprecated, so this might be my fault indeed. That config is following me since decades :)

I think you should be able to reproduce. First start with that option set, everything works fine, a restart and the pager is gone without any log.

Let me know if you need me to do other tests. I'll backtrace the segfault if I can trigger it in a rational way.

Thanks!!

NB: The monitor setup is unusual indeed (beamer on HDMI0, DP-4 main screen. Due to their different aspect ratios and resolution, HDMI0 is centered within DP4).

HDMI-0 connected 1920x1080+320+260
DP-4 connected primary 2560x1600+0+0

@ThomasAdam
Copy link
Member

@dirteat

I really don't think the HilightColorset * 8 config line is the problem.

Given:

Stack trace of thread 2649567:
#0 0x0000000000408d4d set_vp_size_and_loc (FvwmPager + 0x8d4d)

What I need from this, is a reliable stack trace to tell me where in set_vp_size_and_loc the issue is occurring. I really don't want to guess either, and with a stacktrace with debug symbols, I won't have to.

@ExecutorElassus
Copy link

ExecutorElassus commented Dec 11, 2024

@dirteat

The Pager not restarting is occurring as soon as I set this option. Reading the man, I've just discovered it was deprecated, so this might be my fault indeed. That config is following me since decades :)

Can you tell me what man page you're looking at, and what needs changing? I also have a very old config, and this week's update has caused the colors on FvwmPager to act weird (the active highlight color is now default light gray, not the colorset I specified).

@somiaj
Copy link
Collaborator

somiaj commented Dec 11, 2024

@ExecutorElassus Please don't piggy back on other issues, it distracts from the issue at hand which is FvwmPager crashing. If the FvwmPager manual page doesn't help you figure out how to set the colors you want for the pager, you can ask for help on the forums.

@dirteat
Copy link
Author

dirteat commented Dec 11, 2024

@ThomasAdam
I will provide a backtrace.

This is weird, I can reproduce 100% of the time on my machine, without even using a Swallowing, that the Pager disappear at restart only when I set:

*FvwmPager: HilightColorset * 8

@ExecutorElassus
Yes, the Pager is now default light gray, without HilightColorset, I have no idea how to color it (man fvwm3all).

@dirteat
Copy link
Author

dirteat commented Dec 11, 2024

@ThomasAdam

I really don't think the HilightColorset * 8 config line is the problem.

You're probably right, I've tested on another computer running fvwm3 1.1.1 within mono screen output (DP-0 connected primary 5120x2160+0+0), I don't get the bug anymore even though I am setting the *FvwmPager: HilightColorset * 8... Backtrace coming....

@dirteat
Copy link
Author

dirteat commented Dec 11, 2024

gdb -p $pidofpager running and waiting... (the segfault is not systematic).

Still, I've narrowed down the issue a bit more. In addition to HilightColorset, my screen configuration has to be 2 screens without a match on the upper right corner!?

For instance, this configuration works (Pager restart fine with HilightColorset set)
pagerok

Any other configuration I've tried with 2 screens not touching in the upper right corner fail, like this one (the default):
nopager

@ThomasAdam
Copy link
Member

@dirteat

Yeah -- I know the problem will be with how FvwmPager is reacting to monitor information. I'm just waiting for the bt full backtrace from GDB.

@somiaj -- without wanting to second guess this, it's possible we might need to implement the taxi-cab logic in FvwmPager in fpmonitor_from_xy()?

@somiaj
Copy link
Collaborator

somiaj commented Dec 11, 2024

@ThomasAdam I don't think that is the issue, although most the time fpmonitor_from_xy() is used to determine which monitor was clicked, which needs to return NULL if in dead space (i.e. don't move the view port of a monitor area that was not clicked). There are a few instances that it will fallback to the current monitor, but all locations are checking if the return is NULL so I don't think that is the problem here. I could implement finding the closest, but unsure if it will help this issue.

The issue appears to me is why a monitor doesn't have its virtual size and pages set correctly, as I cannot think of what else would fail in the function that was reported having trouble during the init. Since the user seems to suggest this is related to some option, could options be getting in the way or preventing getting the monitor information correctly? Being able to peak at the monitor struts at the time of the crash is needed, and I assume the coredump will tell us that.

@somiaj
Copy link
Collaborator

somiaj commented Dec 12, 2024

If we want to try to randomly test things, assuming the error is division by zero or some undefined value in the fpmonitor strut, here is an attempt to initialize values a bit.

diff --git a/modules/FvwmPager/fpmonitor.c b/modules/FvwmPager/fpmonitor.c
index e9802b83f..010f59ec1 100644
--- a/modules/FvwmPager/fpmonitor.c
+++ b/modules/FvwmPager/fpmonitor.c
@@ -32,6 +32,12 @@ struct fpmonitor *fpmonitor_new(struct monitor *m)
                fp->CPagerWin = NULL;
        fp->m = m;
        fp->disabled = false;
+       fp->virtual_scr.Vx = 0;
+       fp->virtual_scr.Vy = 0;
+       fp->virtual_scr.VWidth = m->si->w;
+       fp->virtual_scr.VHeight = m->si->h;
+       fp->virtual_scr.VxPages = 1;
+       fp->virtual_scr.VyPages = 1;
        TAILQ_INSERT_TAIL(&fp_monitor_q, fp, entry);
 
        return (fp);

So apply this patch and see if that changes things in any way.

@somiaj
Copy link
Collaborator

somiaj commented Dec 12, 2024

Looking at your monitor setup, is one monitor set on top of another monitor, so they are both viewing the same basic area?

@dirteat
Copy link
Author

dirteat commented Dec 12, 2024

assuming the error is division by zero or some undefined value

In the segfault I reported before, the logs also show

Dec 10 20:45:29 brenva kernel: traps: FvwmPager[2747572] trap divide error ip:408d4d sp:7ffe4704efe0 error:0 in FvwmPager[406000+2e000]

Looking at your monitor setup, is one monitor set on top of another monitor, so they are both viewing the same basic area?

Yes, but notice that their height/width are different, so they never cover exactly the same region.

@dirteat
Copy link
Author

dirteat commented Dec 12, 2024

So apply this patch and see if that changes things in any way.

I gave a try, the pager does not appear at all whatever the config, and restarting fvwm3, I see the process FvwmPager getting multiplied. I reverted to 1.1.1 bare. Still no segfault, just some logs when I toy with arandr moving the two displays around while trying to trigger a crash:

[1734034309.540183]  log opened (because of SIGUSR2)
[1734034320.332185] scan_screens: Case 2: processing 2 monitors
[1734034320.342643] scan_screens: Case 2.1: new monitor
[1734034320.342662] monitor_mark_new: Added new monitor: HDMI-0 (0x16caf3c0)
[1734034320.343941] monitor_update_ewmh: Applying EWMH changes to new HDMI-0
[1734034320.343948] monitor_update_ewmh: new_monitor: HDMI-0 ((nil)) compared to ((nil))
[1734034320.400354] monitor_emit_broadcast: DP-4: emit monitor primary change
[1734034320.400451] _execute_command_line: No such command 'RandRFunc'
[1734034320.400458] monitor_emit_broadcast: HDMI-0: emit: 4
[1734034320.400460] monitor_emit_broadcast: HDMI-0: emit monitor enabled
[1734034320.400509] _execute_command_line: No such command 'RandRFunc'
[1734034559.899757]  log opened (because of SIGUSR2)
[1734034741.696943] scan_screens: Case 2: processing 2 monitors
[1734034741.697713] monitor_mark_changed: HDMI-0: x: 320, y: 260, w: 1920, h: 1080 comp: x: 2560, y: 216, w: 1920, h: 1080
[1734034741.697728] scan_screens: Case 2.2: HDMI-0 changed
[1734034741.701711] monitor_emit_broadcast: DP-4: emit monitor primary change
[1734034741.701855] _execute_command_line: No such command 'RandRFunc'
[1734034741.701865] monitor_emit_broadcast: HDMI-0: emit: 256
[1734034741.701870] monitor_emit_broadcast: HDMI-0: emit monitor changed
[1734034741.701939] _execute_command_line: No such command 'RandRFunc'
[1734034741.702031] dispatch_event: Ignoring duplicate event 42565

@ThomasAdam
Copy link
Member

Thanks, @dirteat -- I think I see the problem.

@dirteat
Copy link
Author

dirteat commented Dec 15, 2024

A quick update, I haven't managed to caputure any FvwmPager segfault under gdb, so I have no idea how those were triggered. In the logs, in addition to the typical messages quoted earlier, I have sometimes, series of:

1734036411.459067] monitor_emit_broadcast: HDMI-0: emit monitor changed
[1734036411.459156] _execute_command_line: No such command 'RandRFunc'
[1734036576.928890] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734036576.928942] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734036786.653634] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734036786.653663] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037033.064203] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037033.064236] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037313.755545] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037313.755597] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037344.593259] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037344.593303] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037359.451997] CreateConditionMask: Use comma instead of whitespace to separate conditions
[1734037359.452019] CreateConditionMask: Use comma instead of whitespace to separate conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something's broken!
Projects
Status: To do
Development

No branches or pull requests

4 participants