Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex dirty region causes overflow of xrdp comms buffer #318

Closed
metalefty opened this issue Jul 7, 2024 · 21 comments · Fixed by #319 or #321
Closed

Complex dirty region causes overflow of xrdp comms buffer #318

metalefty opened this issue Jul 7, 2024 · 21 comments · Fixed by #319 or #321

Comments

@metalefty
Copy link
Member

metalefty commented Jul 7, 2024

Version: xrdp v0.10.0 and xorgxrdp v0.10.1

Steps to reproduce:

  1. Use xterm as WM for simplicity
  2. Connect from mstsc.exe with 4K screen (3840x2160)
  3. Run xeyes -geometry 3840x2160 (exactly the same screen size with xrandr reports)
  4. Xorg hangs up

I see the following xorgxrdp log. This hang-up doesn't happen v0.9. So it is related to recent GFX change.

I recalled #64 from the log with the following log line. It looks like a buffer overrun. I also recalled #5 and #62.

[  2341.417] rdpClientConSendMsg: overrun error len, 35029 stream size 32868, client count 2

It looks like a buffer overrun, I tried to increase buffer size to 64k. I don't see any obvious buffer overrun with 64k buffer but Xorg still hangs up.

[  2320.111] rdpClientConProcessMsgClientInput: invalidate x 0 y 0 cx 5759 cy 2163
[  2329.463] rdpRRGetInfo:
[  2329.464] rdpRRCrtcGetGamma: 0x55993aa226f0 0x55993aa3ebf0 0x55993aa3eff0 0x55993aa3edf0
[  2329.464] rdpRRCrtcGetGamma: 0x55993aa226f0 0x55993aa3ebf0 0x55993aa3eff0 0x55993aa3edf0
[  2329.465] rdpRRCrtcGetGamma: 0x55993aa3f200 0x55993aa2ec50 0x55993aa2f050 0x55993aa2ee50
[  2329.465] rdpRRCrtcGetGamma: 0x55993aa3f200 0x55993aa2ec50 0x55993aa2f050 0x55993aa2ee50
[  2341.417] rdpClientConSendMsg: overrun error len, 35029 stream size 32868, client count 2
[  2398.054] rdpClientConProcessMsgClientInput: invalidate x 0 y 0 cx 5759 cy 2163
[  2529.347] rdpClientConProcessMsgClientInput: invalidate x 0 y 0 cx 5759 cy 2163
[  2534.024] rdpClientConRecv: g_sck_recv failed(returned 0)
[  2534.024] rdpClientConRecvMsg: error
[  2534.024] rdpClientConCheck: rdpClientConGotData failed
[  2534.024] rdpClientConDisconnect:
[  2534.024] rdpRemoveClientConFromDev: removing clientCon 0x55993aa3fcf0
[  2534.024] (EE)
[  2534.024] (EE) Backtrace:
[  2534.025] (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x89) [0x559939442619]
[  2534.025] (EE) 1: /usr/libexec/Xorg (0x55993926c000+0x1df149) [0x55993944b149]
[  2534.025] (EE) 2: /lib64/libc.so.6 (0x7fac2bc00000+0x54db0) [0x7fac2bc54db0]
[  2534.025] (EE) 3: /lib64/libc.so.6 (0x7fac2bc00000+0xa154c) [0x7fac2bca154c]
[  2534.025] (EE) 4: /lib64/libc.so.6 (raise+0x16) [0x7fac2bc54d06]
[  2534.025] (EE) 5: /lib64/libc.so.6 (abort+0xd3) [0x7fac2bc287f3]
[  2534.025] (EE) 6: /lib64/libc.so.6 (0x7fac2bc00000+0x29130) [0x7fac2bc29130]
[  2534.026] (EE) 7: /lib64/libc.so.6 (0x7fac2bc00000+0xab617) [0x7fac2bcab617]
[  2534.026] (EE) 8: /lib64/libc.so.6 (0x7fac2bc00000+0xad30c) [0x7fac2bcad30c]
[  2534.026] (EE) 9: /lib64/libc.so.6 (free+0x55) [0x7fac2bcaf955]
[  2534.026] (EE) 10: /usr/lib64/xorg/modules/libxorgxrdp.so (0x7fac2b24c000+0xc12b) [0x7fac2b25812b]
[  2534.026] (EE) 11: /usr/lib64/xorg/modules/libxorgxrdp.so (rdpClientConCheck+0x13c) [0x7fac2b25a14c]
[  2534.026] (EE) 12: /usr/libexec/Xorg (WakeupHandler+0xba) [0x5599392d01ca]
[  2534.026] (EE) 13: /usr/libexec/Xorg (WaitForSomething+0x1ef) [0x55993944484f]
[  2534.026] (EE) 14: /usr/libexec/Xorg (0x55993926c000+0x4c3dd) [0x5599392b83dd]
[  2534.027] (EE) 15: /lib64/libc.so.6 (0x7fac2bc00000+0x3feb0) [0x7fac2bc3feb0]
[  2534.027] (EE) 16: /lib64/libc.so.6 (__libc_start_main+0x80) [0x7fac2bc3ff60]
[  2534.027] (EE) 17: /usr/libexec/Xorg (_start+0x25) [0x5599392b8ea5]
[  2534.027] (EE)
[  2534.027] (EE)
Fatal server error:
[  2534.027] (EE) Caught signal 6 (Aborted). Server aborting
[  2534.027] (EE)
[  2534.027] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[  2534.027] (EE) Please also check the log file at ".xorgxrdp.10.log" for additional information.
[  2534.027] (EE)
[  2534.027] rdpmouseControl: what 4
[  2534.027] rdpkeybControl: what 4
[  2534.027] rdpLeaveVT:
@metalefty
Copy link
Member Author

@matt335672 @Nexarian @jsorg71 Can you reproduce this and have a look at this?

@matt335672
Copy link
Member

matt335672 commented Jul 10, 2024

I've just tried to reproduce and I've not managed it so far.

I've got a breakpoint set on rdpClientCon.c:500 where the log message is generated. It's not firing.

xrdp commit : d6fce2f17355ffb3c783ffb2a31c418b3e930b26
xorgxrdp commit : 0e13e5d (tag: v0.10.1)

In startwm.sh I've just got exec xterm -geometry 80x40

I'm definitely negotiating GFX. Client is Windows 10.

I'm probably missing something. Can you share a suitable .ini file?

@metalefty
Copy link
Member Author

.ini has quite simple diff.

An important point, "Use all monitors" and multimon is required. (I can reproduce with a single monitor though).

This is likely to require at least a 4K screen size.

diff --git a/sesman/sesman.ini.in b/sesman/sesman.ini.in
index 685af28b..7606f5d0 100644
--- a/sesman/sesman.ini.in
+++ b/sesman/sesman.ini.in
@@ -140,7 +140,7 @@ EnableSyslog=true
 ; CentOS 8              :  param=/usr/libexec/Xorg
 ; FreeBSD (from 2022Q4) :  param=/usr/local/libexec/Xorg
 ;
-param=Xorg
+param=/usr/libexec/Xorg
 ; Leave the rest parameters as-is unless you understand what will happen.
 param=-config
 param=xrdp/xorg.conf

Here is another Xorg log.

[185866.422] rdpClientConSendMsg: overrun error len, 36725 stream size 32868, client count 2
[185869.812] (EE)
[185869.812] (EE) Backtrace:
[185869.812] (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x7d) [0x55c58cf3de7d]
[185869.812] (EE) 1: /usr/libexec/Xorg (0x55c58cd83000+0x1bec3d) [0x55c58cf41c3d]
[185869.812] (EE) 2: /lib64/libpthread.so.0 (0x7f2cd6ba8000+0x12d10) [0x7f2cd6bbad10]
[185869.812] (EE) 3: /usr/libexec/Xorg (dixLookupProperty+0x50) [0x55c58cdfed10]
[185869.812] (EE) 4: /usr/libexec/Xorg (0x55c58cd83000+0x7c89e) [0x55c58cdff89e]
[185869.812] (EE) 5: /usr/libexec/Xorg (0x55c58cd83000+0x5c5de) [0x55c58cddf5de]
[185869.812] (EE) 6: /usr/libexec/Xorg (0x55c58cd83000+0x60686) [0x55c58cde3686]
[185869.812] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xe5) [0x7f2cd681dd85]
[185869.812] (EE) 8: /usr/libexec/Xorg (_start+0x2e) [0x55c58cdcd02e]
[185869.812] (EE)
[185869.812] (EE) Segmentation fault at address 0x0
[185869.812] (EE)
Fatal server error:
[185869.812] (EE) Caught signal 11 (Segmentation fault). Server aborting

@matt335672
Copy link
Member

matt335672 commented Jul 11, 2024

Thanks. I'll have another go. I can probably set up a VM with two 4K monitors.

@matt335672
Copy link
Member

I'm still not able to reproduce this with a Windows 10 client and two 4K monitors.

In some ways I'm surprised this is happening, as with xterm as a WM and running xterm as an application, everything is in black-and-white. I'd have though that would compress really well.

Could you post your xrdp.log and full Xorg.log for a crash? I can see you're not running with two identical monitors but I can't see the exact geometry. That might have something to do with it. Also, with a full xrdp.log I can check I'm negotiating the same codec.

I've tried with and without --enable-glamor on xorgxrdp.

@metalefty
Copy link
Member Author

@matt335672 Ah, I made a serious typo. xterm at step 3 is actually xeyes. They are completely different software and, naturally, you couldn't reproduce. Could you try again? If you are still not able to reproduce, I will collect log and post.

@matt335672
Copy link
Member

Thanks for the update!

Following your new instructions I've reproduced this on a single 4K monitor. Stack trace is as follows:-

(gdb) where
#0  rdpClientConSendMsg (dev=0x56e76232f6a0, clientCon=0x56e76264e370) at rdpClientCon.c:500
#1  0x00007117547e16e9 in rdpClientConSendPending (dev=0x56e76232f6a0, clientCon=0x56e76264e370) at rdpClientCon.c:533
#2  0x00007117547e7951 in rdpClientConSendPaintRectShmFd (dev=0x56e76232f6a0, clientCon=0x56e76264e370, id=0x7ffd77ef4780, dirtyReg=0x56e762911ee0, copyRects=0x56e762946620, 
    numCopyRects=787) at rdpClientCon.c:2735
#3  0x00007117547e7f26 in rdpCapRect (clientCon=0x56e76264e370, cap_rect=0x7ffd77ef4770, id=0x7ffd77ef4780) at rdpClientCon.c:2857
#4  0x00007117547e8230 in rdpDeferredUpdateCallback (timer=0x56e76245be10, now=2554056, arg=0x56e76264e370) at rdpClientCon.c:2953
#5  0x000056e7610c7fc0 in ?? ()
#6  0x000056e7610c8248 in WaitForSomething ()
#7  0x000056e760f583e7 in ?? ()
#8  0x000056e760f5c6b4 in ?? ()
#9  0x0000711754e29d90 in __libc_start_call_main (main=main@entry=0x56e760f455d0, argc=argc@entry=11, argv=argv@entry=0x7ffd77ef4a18) at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x0000711754e29e40 in __libc_start_main_impl (main=0x56e760f455d0, argc=11, argv=0x7ffd77ef4a18, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, 
    stack_end=0x7ffd77ef4a08) at ../csu/libc-start.c:392
#11 0x000056e760f45605 in _start ()

I've worked out where the 'overrun' message is coming from. We call rdpClientConPreCheck() before sending a frame to make sure there's space in the buffer. However, if the size we pass in is bigger than the maximum size of the buffer, we don't resize the buffer.

I've patched that as follows:-

--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -1748,7 +1748,8 @@ rdpClientConPreCheck(rdpPtr dev, rdpClientCon *clientCon, int in_size)
             rv = 1;
         }
         clientCon->count = 0;
-        init_stream(clientCon->out_s, 0);
+        // Make sure stream is now big enough for the requested size
+        init_stream(clientCon->out_s, in_size + 20);
         s_push_layer(clientCon->out_s, iso_hdr, 8);
     }
 

@jsorg71 - could you check my logic?

However, like you, I'm still seeing the X server hang up - I just don't get the message. So there seem to be two things going on here. I'll take another look Monday.

@jsorg71
Copy link
Contributor

jsorg71 commented Jul 14, 2024

rdpClientConPreCheck is to check if the new message can fit, if not, send what's there to make room.
This buffer is not meant to grow. Maybe we can change that but I'd rather not. I think it's 32KB.
I wonder what is causing the size to get big. I see there are 787 crects, that is ok. I can't see how many drects, maybe that is it. I'll try to reproduce it.

@matt335672
Copy link
Member

Understood - thanks.

@matt335672
Copy link
Member

Here are some variables from a recent fail:-

Variable Value
start_frame_bytes 16
num_rects_d 2499
num_rects_c 1612
wiretosurface2_bytes 32921
end_frame_bytes 12
size 32961

There's a couple of observations here:-

  1. num_rects_d + num_rects_c (4111) exceeds RDP_MAX_TILES (4096). Is this significant?
  2. If num_rects_d + num_rects_c was 4096, size would still be 32841 which would exceed the buffer available.

RDP_MAX_TILES was last changed by fcc9ab2

@matt335672
Copy link
Member

I've made the following logging patch so I can see what the drects look like:-

--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -2684,6 +2684,12 @@ rdpClientConSendPaintRectShmFd(rdpPtr dev, rdpClientCon *clientCon,
         size += end_frame_bytes;        /* end frame message */
         size += 4;                      /* message 62 data_bytes */
 
+        if (size > (clientCon->out_s->size - 20))
+        {
+            // Way too big - pre-check will fail
+            LLOGLN(0, ("Size check prob: requested=%d max=%d", size, (clientCon->out_s->size - 20)));
+        }
+
         rdpClientConPreCheck(dev, clientCon, size);
         s = clientCon->out_s;
         out_uint16_le(s, 62);
@@ -2840,6 +2846,16 @@ rdpCapRect(rdpClientCon *clientCon, BoxPtr cap_rect, struct image_data *id)
     cap_dirty_save = rdpRegionCreate(NullBox, 0);
     rdpRegionCopy(cap_dirty_save, cap_dirty);
     num_rects = REGION_NUM_RECTS(cap_dirty);
+    if (num_rects > 2000)
+    {
+        BoxPtr rects_d = REGION_RECTS(cap_dirty);
+        int i;
+        LLOGLN(0, ("Dirty count = %d", num_rects));
+        for (i = 0; i < num_rects; ++i)
+        {
+            LLOGLN(0, ("dirty[%d] = (%d, %d, %d, %d)", i, rects_d[i].x1, rects_d[i].y1, rects_d[i].x2, rects_d[i].y2)); 
+        }
+    }
     if (num_rects > 0)
     {
         rects = 0;

Log file attached
xorgxrdp.12.log

@matt335672
Copy link
Member

I can now see the problem, but I don't really understand the detail of this code. @jsorg71 - this really needs your expertise.

When xeyes is run, it's drawing a large number of small damage rectangles around the circular edge of the eyes. With a large monitor, these are considerable. When the copy rectangles are added, we get an overflow of the buffer. xeyes is a great test tool.

I can get xeyes working with this patch. It's NOT production quality:-

--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -2684,6 +2684,12 @@ rdpClientConSendPaintRectShmFd(rdpPtr dev, rdpClientCon *clientCon,
         size += end_frame_bytes;        /* end frame message */
         size += 4;                      /* message 62 data_bytes */
 
+        if (size > (clientCon->out_s->size - 20))
+        {
+            // Way too big - pre-check will fail
+            LLOGLN(0, ("Size check prob: requested=%d max=%d", size, (clientCon->out_s->size - 20)));
+        }
+
         rdpClientConPreCheck(dev, clientCon, size);
         s = clientCon->out_s;
         out_uint16_le(s, 62);
@@ -2836,10 +2842,17 @@ rdpCapRect(rdpClientCon *clientCon, BoxPtr cap_rect, struct image_data *id)
     LLOGLN(10, ("rdpCapRect: cap_rect x1 %d y1 %d x2 %d y2 %d",
                cap_rect->x1, cap_rect->y1, cap_rect->x2, cap_rect->y2));
     rdpRegionIntersect(cap_dirty, cap_dirty, clientCon->dirtyRegion);
+    num_rects = REGION_NUM_RECTS(cap_dirty);
+    if (num_rects > 2000) // What should this be??
+    {
+        // Collapse down to the bounding rectangle
+        rdpRegionUnionRect(cap_dirty, rdpRegionExtents(cap_dirty));
+        num_rects = REGION_NUM_RECTS(cap_dirty);
+    }
+
     /* make a copy of cap_dirty because it may get altered */
     cap_dirty_save = rdpRegionCreate(NullBox, 0);
     rdpRegionCopy(cap_dirty_save, cap_dirty);
-    num_rects = REGION_NUM_RECTS(cap_dirty);
     if (num_rects > 0)
     {
         rects = 0;

This also stops the X server crashing.

@jsorg71
Copy link
Contributor

jsorg71 commented Jul 16, 2024

We can't do much with crects but drect, yea, we can use rdpRegionExtents to simplify it. It's probably easier on the system to encode more pixels than have this complex region. There is a define for this, MAX_CAPTURE_RECTS that we lost with GFX(my fault) that should be put back.
There is also another change I want to make in rdpDeferredUpdateCallback where it deals with clientCon->dev->monitorCount < 1 and uses MAX_CAPTURE_PIXELS with bands. Since the RFX encoder can handle chunking out large updates, we don't need that. In fact, this code path won't work with H264.
I can create a PR.

@jsorg71
Copy link
Contributor

jsorg71 commented Jul 16, 2024

I create #319 for this

@jsorg71
Copy link
Contributor

jsorg71 commented Jul 16, 2024

It looks like I removed the use of MAX_CAPTURE_RECTS in 83e434e

@metalefty
Copy link
Member Author

Reopen until fix in v0.10.

@metalefty metalefty reopened this Jul 17, 2024
@matt335672 matt335672 linked a pull request Jul 17, 2024 that will close this issue
@matt335672 matt335672 changed the title shmem related buffer overrun? Complex dirty region causes overflow of xrdp comms buffer Jul 17, 2024
@metalefty
Copy link
Member Author

Merged into v0.10 branch, I will release v0.10.2 in July or mid-August.

@Mogaba
Copy link

Mogaba commented Oct 25, 2024

I encountered the exact same error with v0.10.1 but in a different scenario: when I grab a window and start to drag it around fast, eventually everything freezes and the same errors appear in the log file.
I cannot upgrade to v0.10.2 right now, can someone please confirm that this it the same issue?

I use XFCE if it matters.

Update: I did some testing and it seems it happens only if you disable effects AND enable "Hide content of windows - When moving".

@matt335672
Copy link
Member

@Mogaba - look for the buffer overrun error mentioned in the first post in this thread.

@Mogaba
Copy link

Mogaba commented Oct 25, 2024

@Mogaba - look for the buffer overrun error mentioned in the first post in this thread.

Yep, it's the same error:

rdpClientConSendMsg: overrun error len, 44365 stream size 32868, client count 2

Does it mean it's probably be fixed by upgrading to v0.10.2?

@matt335672
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants