Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dss_ssn_specified_client packetdrill test fails (timeout) #98

Closed
matttbe opened this issue Oct 6, 2020 · 4 comments · Fixed by multipath-tcp/packetdrill#20
Closed
Assignees
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Oct 6, 2020

This test is available there: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt

It is not failing on current net-next but it is with the current export branch.

Here is the output of a git bisect:

47d4beeecd06f283840cf2def08f0952a6c689c7 is the first bad commit
commit 47d4beeecd06f283840cf2def08f0952a6c689c7
Date:   Mon Oct 5 05:09:46 2020 +0000

    mptcp: refactor shutdown and close

    We must not close the subflows before all the MPTCP level
    data, comprising the DATA_FIN has been acked at the MPTCP
    level, otherwise we could be unable to retransmit as needed.

    __mptcp_wr_shutdown() shutdown is responsible to check for the
    correct status and close all subflows. Is called by the output
    path after spooling any data and at shutdown/close time.

    In a similar way, __mptcp_destroy_sock() is responsible to clean-up
    the MPTCP level status, and is called when the msk transition
    to TCP_CLOSE.

    The protocol level close() does not force anymore the TCP_CLOSE
    status, but orphan the msk socket and all the subflows.
    Orphaned msk sockets are forciby closed after a timeout or
    when all MPTCP-level data is acked.

    There is a caveat about keeping the orphaned subflows around:
    the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
    tcp_close(). To prevent accessing freed memory on later MPTCP
    level operations, the msk acquires a reference to each subflow
    socket and prevent subflow_ulp_release() from releasing the
    subflow context before __mptcp_destroy_sock().

    The additional subflow references are released by __mptcp_done()
    and the async ULP release is detected checking ULP ops. If such
    field has been already cleared by the ULP release path, the
    dangling context is freed directly by __mptcp_done().

 net/mptcp/options.c  |   2 +-
 net/mptcp/protocol.c | 269 ++++++++++++++++++++++++++++++++++++---------------
 net/mptcp/protocol.h |  10 +-
 net/mptcp/subflow.c  |  21 +++-
 4 files changed, 216 insertions(+), 86 deletions(-)
bisect run success
@matttbe matttbe added the bug label Oct 6, 2020
@matttbe
Copy link
Member Author

matttbe commented Oct 6, 2020

I am going to apply the different patches shared by @pabeni recently and retest.

@matttbe
Copy link
Member Author

matttbe commented Oct 6, 2020

Note that I still have the issue with the latest export branch updated 10minutes ago

@matttbe
Copy link
Member Author

matttbe commented Oct 6, 2020

I got the opportunity to look a bit more and the FIN is not sent, the status is still ESTABLISHED:

# tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

# packetdrill -vvv dss_ssn_specified_client.pkt &
[3] 207
socket syscall: 1601999857.704910
setsockopt syscall: 1601999857.707578
fcntl syscall: 1601999857.710214
fcntl syscall: 1601999857.712603
connect syscall: 1601999857.715185
outbound sniffed packet:  0.010372 S 971032171:971032171(0) win 65535 <mss 1460,sackOK,TS val 2492150980 ecr 0,nop,wscale 8,mp_capable v1 flags: |H| >
inbound injected packet:  0.020924 S. 0:0(0) ack 971032172 win 65535 <mss 1460,sackOK,TS val 700 ecr 2492150980,nop,wscale 8,mp_capable v1 flags: |H| sender_key: 2>
outbound sniffed packet:  0.029541 . 971032172:971032172(0) ack 1 win 256 <nop,nop,TS val 2492150999 ecr 700,mp_capable v1 flags: |H| sender_key: 2310140576115051019 receiver_key: 2>
15:57:37.715175 tun0  Out IP 192.168.234.140.48588 > 192.0.2.1.8080: Flags [S], seq 971032171, win 65535, options [mss 1460,sackOK,TS val 2492150980 ecr 0,nop,wscale 8,mptcp capable v1], length 0
15:57:37.734267 tun0  In  IP 192.0.2.1.8080 > 192.168.234.140.48588: Flags [S.], seq 0, ack 971032172, win 65535, options [mss 1460,sackOK,TS val 700 ecr 2492150980,nop,wscale 8,mptcp capable v1 {0x200000000000000}], length 0
15:57:37.734344 tun0  Out IP 192.168.234.140.48588 > 192.0.2.1.8080: Flags [.], ack 1, win 256, options [nop,nop,TS val 2492150999 ecr 700,mptcp capable v1 {0xbca8d449d440f20,0x200000000000000}], length 0
getsockopt syscall: 1601999857.944294
fcntl syscall: 1601999858.153181
inbound injected packet:  0.551480 P. 1:1001(1000) ack 971032172 win 450 <nop,nop,dss dack8 16733486346948834518 dsn8 13263177308786788773 ssn 1 dll 1000 no_checksum flags: MmAa>
outbound sniffed packet:  0.561527 . 971032172:971032172(0) ack 1001 win 264 <nop,nop,TS val 2492151531 ecr 700,dss dack8 13263177308786789773 flags: Aa>
15:57:38.266261 tun0  In  IP 192.0.2.1.8080 > 192.168.234.140.48588: Flags [P.], seq 1:1001, ack 1, win 450, options [nop,nop,mptcp dss ack 16733486346948834518 seq 13263177308786788773 subseq 1 len 1000], length 1000: HTTP
15:57:38.266330 tun0  Out IP 192.168.234.140.48588 > 192.0.2.1.8080: Flags [.], ack 1001, win 264, options [nop,nop,TS val 2492151531 ecr 700,mptcp dss ack 13263177308786789773], length 0
read syscall: 1601999858.575194
write syscall: 1601999858.577826
outbound sniffed packet:  0.873012 P. 971032172:971032272(100) ack 1001 win 264 <nop,nop,TS val 2492151843 ecr 700,dss dack8 13263177308786789773 dsn8 16733486346948834518 ssn 1 dll 100 no_checksum flags: MmAa,nop,nop>
inbound injected packet:  0.887556 . 1001:1001(0) ack 971032272 win 450 <dss dack8 16733486346948834618 flags: Aa>
15:57:38.577815 tun0  Out IP 192.168.234.140.48588 > 192.0.2.1.8080: Flags [P.], seq 1:101, ack 1001, win 264, options [nop,nop,TS val 2492151843 ecr 700,mptcp dss ack 13263177308786789773 seq 16733486346948834518 subseq 1 len 100,nop,nop], length 100: HTTP
15:57:38.598978 tun0  In  IP 192.0.2.1.8080 > 192.168.234.140.48588: Flags [.], ack 101, win 450, options [mptcp dss ack 16733486346948834618], length 0
close syscall: 1601999858.999334
outbound sniffed packet:  1.294509 . 971032272:971032272(0) ack 1001 win 264 <nop,nop,TS val 2492151864 ecr 700,dss dack8 13263177308786789773 dsn8 16733486346948834618 ssn 0 dll 1 no_checksum flags: MmAaF,nop,nop>
15:57:38.999312 tun0  Out IP 192.168.234.140.48588 > 192.0.2.1.8080: Flags [.], ack 1001, win 264, options [nop,nop,TS val 2492151864 ecr 700,mptcp dss fin ack 13263177308786789773 seq 16733486346948834618 subseq 0 len 1,nop,nop], length 0

# netstat -tnp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 192.168.234.140:48588   192.0.2.1:8080          ESTABLISHED -                   

# cat dss_ssn_specified_client.pkt 
// connect() function, connection initiated by the kernel
--tolerance_usecs=100000
`../common/defaults.sh`


0.0   socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0.0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.0  fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
+0.0  fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0

// Establish connection and verify that there was no error.

+0.0  connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.0   > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 flags[flag_h] nokey>
+0.0   < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscale 8,mpcapable v1 flags[flag_h] key[skey=2] >
+0.0   > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h] key[ckey,skey]>
+0.200 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
+0.205 fcntl(3, F_SETFL, O_RDWR) = 0   // set back to blocking
+0.1   < P. 1:1001(1000) ack 1 win 450  <nop, nop, dss dack8=1 dsn8=1 ssn=1 dll=1000 nocs>
+0.0   > . 1:1(0) ack 1001 <nop, nop, TS val 100 ecr 700,dss dack8=1001 ssn=1 dll=0 nocs>
+0.3  read(3, ..., 1000) = 1000
+0.0 write(3,..., 100) = 100
+0.0   > P. 1:101(100) ack 1001 <nop, nop, TS val 100 ecr 700, dss dack8=1001 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
+0.0   < .  1001:1001(0) ack 101 win 450 <dss dack8=101 nocs>
+0.4 close(3) = 0
+0.0   > . 101:101(0) ack 1001 <nop, nop,TS val 100 ecr 700,dss dack8=1001 dsn8=101 ssn=0 dll=1 nocs fin, nop, nop>
+0.0   > F. 101:101(0) ack 1001 <nop, nop,TS val 100 ecr 700, dss dack8=1001 dsn8=101 ssn=0 dll=1 nocs fin, nop, nop>

@dcaratti dcaratti self-assigned this Oct 9, 2020
@dcaratti
Copy link
Contributor

I got the opportunity to look a bit more and the FIN is not sent, the status is still ESTABLISHED:

as per recent (verbal) discussion, this is probably intentional because commit 47d4bee requires an outbound DATA_FIN to be explicitly acked with an ACK. Something like:

--- a/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt
+++ b/gtests/net/mptcp/dss/dss_ssn_specified_client.pkt
@@ -24,4 +24,5 @@
 +0.0   < .  1001:1001(0) ack 101 win 450 <dss dack8=101 nocs>
 +0.4 close(3) = 0
 +0.0   > . 101:101(0) ack 1001 <nop, nop,TS val 100 ecr 700,dss dack8=1001 dsn8=101 ssn=0 dll=1 nocs fin, nop, nop>
-+0.0   > F. 101:101(0) ack 1001 <nop, nop,TS val 100 ecr 700, dss dack8=1001 dsn8=101 ssn=0 dll=1 nocs fin, nop, nop>
++0.0   < . 1001:1001(0) ack 101 win 450 <dss dack8=102 nocs>
++0.0   > F. 101:101(0) ack 1001 <nop, nop,TS val 100 ecr 700, dss dack8=1001 nocs>

the problem with current export branch is: changing the script like in the above example is not sufficient to convince Linux to transmit an outbound FIN/ACK packet. After the kernel under test received the DACK8 equal to 102, the value sk_state transitions from TCP_FIN_WAIT_1 to TCP_FIN_WAIT_2, so, after the call to mptcp_check_data_fin(), this:


+       /* if the msk data is completely acked, or the socket timedout,
+        * there is no point in keeping around an orphaned sk
+        */
+       if (sock_flag(sk, SOCK_DEAD) &&
+           (mptcp_check_close_timeout(sk) ||
+           (state != sk->sk_state && sk->sk_state == TCP_CLOSE))) {
+               inet_sk_state_store(sk, TCP_CLOSE);
+               __mptcp_destroy_sock(sk, 0);
+               goto unlock;
+       }
+

does not happen because of the value of sk->sk_state.

matttbe pushed a commit that referenced this issue Oct 17, 2020
when the data-fin is acked on all subflows, the socket goes in
FIN_WAIT_2 state. Call __mptcp_destroy_sock() to transmit a TCP FIN.

Closes: #98
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
dcaratti added a commit to dcaratti/packetdrill that referenced this issue Oct 22, 2020
otherwise packetdrill will need to wait 60 seconds before getting the
TCP FIN packet.

Closes: multipath-tcp/mptcp_net-next#98
dcaratti added a commit to dcaratti/packetdrill that referenced this issue Oct 22, 2020
otherwise packetdrill will need to wait 60 seconds before getting the
TCP FIN packet.

Closes: multipath-tcp/mptcp_net-next#98
dcaratti added a commit to dcaratti/packetdrill that referenced this issue Oct 22, 2020
otherwise packetdrill will need to wait 60 seconds before getting the
TCP FIN packet.

Closes: multipath-tcp/mptcp_net-next#98
matttbe pushed a commit that referenced this issue Oct 24, 2020
Lockdep complains at boot:

=============================
[ BUG: Invalid wait context ]
5.7.0-05093-g46d91ecd597b #98 Not tainted
-----------------------------
swapper/1 is trying to lock:
0000000060931b98 (&desc[i].request_mutex){+.+.}-{3:3}, at: __setup_irq+0x11d/0x623
other info that might help us debug this:
context-{4:4}
1 lock held by swapper/1:
 #0: 000000006074fed8 (sigio_spinlock){+.+.}-{2:2}, at: sigio_lock+0x1a/0x1c
stack backtrace:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0-05093-g46d91ecd597b #98
Stack:
 7fa4fab0 6028dfd1 0000002a 6008bea5
 7fa50700 7fa50040 7fa4fac0 6028e016
 7fa4fb50 6007f6da 60959c18 00000000
Call Trace:
 [<60023a0e>] show_stack+0x13b/0x155
 [<6028e016>] dump_stack+0x2a/0x2c
 [<6007f6da>] __lock_acquire+0x515/0x15f2
 [<6007eb50>] lock_acquire+0x245/0x273
 [<6050d9f1>] __mutex_lock+0xbd/0x325
 [<6050dc76>] mutex_lock_nested+0x1d/0x1f
 [<6008e27e>] __setup_irq+0x11d/0x623
 [<6008e8ed>] request_threaded_irq+0x169/0x1a6
 [<60021eb0>] um_request_irq+0x1ee/0x24b
 [<600234ee>] write_sigio_irq+0x3b/0x76
 [<600383ca>] sigio_broken+0x146/0x2e4
 [<60020bd8>] do_one_initcall+0xde/0x281

Because we hold sigio_spinlock and then get into requesting
an interrupt with a mutex.

Change the spinlock to a mutex to avoid that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
jenkins-tessares pushed a commit that referenced this issue May 28, 2021
[  612.157429] ==================================================================
[  612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0
[  612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382

[  612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G
OE     5.13.0-rc2+ #98
[  612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-1.fc33 04/01/2014
[  612.159640] Workqueue:  0x0 (deferredclose)
[  612.159669] Call Trace:
[  612.159685]  dump_stack+0xbb/0x107
[  612.159711]  print_address_description.constprop.0+0x18/0x140
[  612.159733]  ? process_one_work+0x90/0x9b0
[  612.159743]  ? process_one_work+0x90/0x9b0
[  612.159754]  kasan_report.cold+0x7c/0xd8
[  612.159778]  ? lock_is_held_type+0x80/0x130
[  612.159789]  ? process_one_work+0x90/0x9b0
[  612.159812]  kasan_check_range+0x145/0x1a0
[  612.159834]  process_one_work+0x90/0x9b0
[  612.159877]  ? pwq_dec_nr_in_flight+0x110/0x110
[  612.159914]  ? spin_bug+0x90/0x90
[  612.159967]  worker_thread+0x3b6/0x6c0
[  612.160023]  ? process_one_work+0x9b0/0x9b0
[  612.160038]  kthread+0x1dc/0x200
[  612.160051]  ? kthread_create_worker_on_cpu+0xd0/0xd0
[  612.160092]  ret_from_fork+0x1f/0x30

[  612.160399] Allocated by task 2358:
[  612.160757]  kasan_save_stack+0x1b/0x40
[  612.160768]  __kasan_kmalloc+0x9b/0xd0
[  612.160778]  cifs_new_fileinfo+0xb0/0x960 [cifs]
[  612.161170]  cifs_open+0xadf/0xf20 [cifs]
[  612.161421]  do_dentry_open+0x2aa/0x6b0
[  612.161432]  path_openat+0xbd9/0xfa0
[  612.161441]  do_filp_open+0x11d/0x230
[  612.161450]  do_sys_openat2+0x115/0x240
[  612.161460]  __x64_sys_openat+0xce/0x140

When mod_delayed_work is called to modify the delay of pending work,
it might return false and queue a new work when pending work is
already scheduled or when try to grab pending work failed.

So, Increase the reference count when new work is scheduled to
avoid use-after-free.

Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
jenkins-tessares pushed a commit that referenced this issue Oct 15, 2021
Commit

  3c73b81 ("x86/entry, selftests: Further improve user entry sanity checks")

added a warning if AC is set when in the kernel.

Commit

  662a022 ("x86/entry: Fix AC assertion")

changed the warning to only fire if the CPU supports SMAP.

However, the warning can still trigger on a machine that supports SMAP
but where it's disabled in the kernel config and when running the
syscall_nt selftest, for example:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 49 at irqentry_enter_from_user_mode
  CPU: 0 PID: 49 Comm: init Tainted: G                T 5.15.0-rc4+ #98 e6202628ee053b4f310759978284bd8bb0ce6905
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:irqentry_enter_from_user_mode
  ...
  Call Trace:
   ? irqentry_enter
   ? exc_general_protection
   ? asm_exc_general_protection
   ? asm_exc_general_protectio

IS_ENABLED(CONFIG_X86_SMAP) could be added to the warning condition, but
even this would not be enough in case SMAP is disabled at boot time with
the "nosmap" parameter.

To be consistent with "nosmap" behaviour, clear X86_FEATURE_SMAP when
!CONFIG_X86_SMAP.

Found using entry-fuzz + satrandconfig.

 [ bp: Massage commit message. ]

Fixes: 3c73b81 ("x86/entry, selftests: Further improve user entry sanity checks")
Fixes: 662a022 ("x86/entry: Fix AC assertion")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20211003223423.8666-1-vegard.nossum@oracle.com
jenkins-tessares pushed a commit that referenced this issue Jul 10, 2022
With arch_prepare_bpf_trampoline removed on x86:

  [...]
  #98/1    lsm_cgroup/functional:SKIP
  #98      lsm_cgroup:SKIP
  Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED

Fixes: dca85aa ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220630224203.512815-1-sdf@google.com
matttbe pushed a commit that referenced this issue May 3, 2024
There are several places where either chan->lock or chan->vchan.lock was
not held. Add appropriate locking. This fixes lockdep warnings like

[   31.077578] ------------[ cut here ]------------
[   31.077831] WARNING: CPU: 2 PID: 40 at drivers/dma/xilinx/xilinx_dpdma.c:834 xilinx_dpdma_chan_queue_transfer+0x274/0x5e0
[   31.077953] Modules linked in:
[   31.078019] CPU: 2 PID: 40 Comm: kworker/u12:1 Not tainted 6.6.20+ #98
[   31.078102] Hardware name: xlnx,zynqmp (DT)
[   31.078169] Workqueue: events_unbound deferred_probe_work_func
[   31.078272] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   31.078377] pc : xilinx_dpdma_chan_queue_transfer+0x274/0x5e0
[   31.078473] lr : xilinx_dpdma_chan_queue_transfer+0x270/0x5e0
[   31.078550] sp : ffffffc083bb2e10
[   31.078590] x29: ffffffc083bb2e10 x28: 0000000000000000 x27: ffffff880165a168
[   31.078754] x26: ffffff880164e920 x25: ffffff880164eab8 x24: ffffff880164d480
[   31.078920] x23: ffffff880165a148 x22: ffffff880164e988 x21: 0000000000000000
[   31.079132] x20: ffffffc082aa3000 x19: ffffff880164e880 x18: 0000000000000000
[   31.079295] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   31.079453] x14: 0000000000000000 x13: ffffff8802263dc0 x12: 0000000000000001
[   31.079613] x11: 0001ffc083bb2e34 x10: 0001ff880164e98f x9 : 0001ffc082aa3def
[   31.079824] x8 : 0001ffc082aa3dec x7 : 0000000000000000 x6 : 0000000000000516
[   31.079982] x5 : ffffffc7f8d43000 x4 : ffffff88003c9c40 x3 : ffffffffffffffff
[   31.080147] x2 : ffffffc7f8d43000 x1 : 00000000000000c0 x0 : 0000000000000000
[   31.080307] Call trace:
[   31.080340]  xilinx_dpdma_chan_queue_transfer+0x274/0x5e0
[   31.080518]  xilinx_dpdma_issue_pending+0x11c/0x120
[   31.080595]  zynqmp_disp_layer_update+0x180/0x3ac
[   31.080712]  zynqmp_dpsub_plane_atomic_update+0x11c/0x21c
[   31.080825]  drm_atomic_helper_commit_planes+0x20c/0x684
[   31.080951]  drm_atomic_helper_commit_tail+0x5c/0xb0
[   31.081139]  commit_tail+0x234/0x294
[   31.081246]  drm_atomic_helper_commit+0x1f8/0x210
[   31.081363]  drm_atomic_commit+0x100/0x140
[   31.081477]  drm_client_modeset_commit_atomic+0x318/0x384
[   31.081634]  drm_client_modeset_commit_locked+0x8c/0x24c
[   31.081725]  drm_client_modeset_commit+0x34/0x5c
[   31.081812]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x104/0x168
[   31.081899]  drm_fb_helper_set_par+0x50/0x70
[   31.081971]  fbcon_init+0x538/0xc48
[   31.082047]  visual_init+0x16c/0x23c
[   31.082207]  do_bind_con_driver.isra.0+0x2d0/0x634
[   31.082320]  do_take_over_console+0x24c/0x33c
[   31.082429]  do_fbcon_takeover+0xbc/0x1b0
[   31.082503]  fbcon_fb_registered+0x2d0/0x34c
[   31.082663]  register_framebuffer+0x27c/0x38c
[   31.082767]  __drm_fb_helper_initial_config_and_unlock+0x5c0/0x91c
[   31.082939]  drm_fb_helper_initial_config+0x50/0x74
[   31.083012]  drm_fbdev_dma_client_hotplug+0xb8/0x108
[   31.083115]  drm_client_register+0xa0/0xf4
[   31.083195]  drm_fbdev_dma_setup+0xb0/0x1cc
[   31.083293]  zynqmp_dpsub_drm_init+0x45c/0x4e0
[   31.083431]  zynqmp_dpsub_probe+0x444/0x5e0
[   31.083616]  platform_probe+0x8c/0x13c
[   31.083713]  really_probe+0x258/0x59c
[   31.083793]  __driver_probe_device+0xc4/0x224
[   31.083878]  driver_probe_device+0x70/0x1c0
[   31.083961]  __device_attach_driver+0x108/0x1e0
[   31.084052]  bus_for_each_drv+0x9c/0x100
[   31.084125]  __device_attach+0x100/0x298
[   31.084207]  device_initial_probe+0x14/0x20
[   31.084292]  bus_probe_device+0xd8/0xdc
[   31.084368]  deferred_probe_work_func+0x11c/0x180
[   31.084451]  process_one_work+0x3ac/0x988
[   31.084643]  worker_thread+0x398/0x694
[   31.084752]  kthread+0x1bc/0x1c0
[   31.084848]  ret_from_fork+0x10/0x20
[   31.084932] irq event stamp: 64549
[   31.084970] hardirqs last  enabled at (64548): [<ffffffc081adf35c>] _raw_spin_unlock_irqrestore+0x80/0x90
[   31.085157] hardirqs last disabled at (64549): [<ffffffc081adf010>] _raw_spin_lock_irqsave+0xc0/0xdc
[   31.085277] softirqs last  enabled at (64503): [<ffffffc08001071c>] __do_softirq+0x47c/0x500
[   31.085390] softirqs last disabled at (64498): [<ffffffc080017134>] ____do_softirq+0x10/0x1c
[   31.085501] ---[ end trace 0000000000000000 ]---

Fixes: 7cbb0c6 ("dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://lore.kernel.org/r/20240308210034.3634938-2-sean.anderson@linux.dev
Signed-off-by: Vinod Koul <vkoul@kernel.org>
matttbe pushed a commit that referenced this issue Nov 15, 2024
Zijian Zhang says:

====================
Several fixes to test_sockmap and added push/pop logic for msg_verify_data
Before the fixes, some of the tests in test_sockmap are problematic,
resulting in pseudo-correct result.

1. txmsg_pass is not set in some tests, as a result, no eBPF program is
attached to the sockmap.
2. In SENDPAGE, a wrong iov_length in test_send_large may result in some
test skippings and failures.
3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause
msg_loop_rx end early and skip some data tests.

Besides, for msg_verify_data, I added push/pop checking logic to function
msg_verify_data and added more tests for different cases.

After that, I found that there are some bugs in bpf_msg_push_data,
bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason
why they have not been exposed is that because of the above problems, they
will not be triggered.

With the fixes, we can pass the sockmap test with data integrity test now.
However, the fixes to test_sockmap expose more problems in sockhash test
with SENDPAGE and ktls with SENDPAGE.

v1 -> v2:
  - Rebased to the latest bpf-next net branch.

The problem I observed,
1. In sockhash test, a NULL pointer kernel BUG will be reported for nearly
every cork test. More inspections are needed for splice_to_socket.

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 0 P4D 0
Oops: Oops: 0000 [#3] PREEMPT SMP PTI
CPU: 3 UID: 0 PID: 2122 Comm: test_sockmap 6.12.0-rc2.bm.1-amd64+ #98
Tainted: [D]=DIE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:splice_to_socket+0x34a/0x480
Call Trace:
 <TASK>
 ? __die_body+0x1e/0x60
 ? page_fault_oops+0x159/0x4d0
 ? exc_page_fault+0x7e/0x180
 ? asm_exc_page_fault+0x26/0x30
 ? splice_to_socket+0x34a/0x480
? __memcg_slab_post_alloc_hook+0x205/0x3c0
? alloc_pipe_info+0xd6/0x1f0
? __kmalloc_noprof+0x37f/0x3b0
direct_splice_actor+0x40/0x100
splice_direct_to_actor+0xfd/0x290
? __pfx_direct_splice_actor+0x10/0x10
do_splice_direct_actor+0x82/0xb0
? __pfx_direct_file_splice_eof+0x10/0x10
do_splice_direct+0x13/0x20
? __pfx_direct_splice_actor+0x10/0x10
do_sendfile+0x33c/0x3f0
__x64_sys_sendfile64+0xa7/0xc0
do_syscall_64+0x62/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---

2. txmsg_pass are not set before, and some tests are skipped. Now after
the fixes, we have some failure cases now. More fixes are needed either
for the selftest or the ktls kernel code.

1/ 6 sockhash:ktls:txmsg test passthrough:OK
2/ 6 sockhash:ktls:txmsg test redirect:OK
3/ 1 sockhash:ktls:txmsg test redirect wait send mem:OK
4/ 6 sockhash:ktls:txmsg test drop:OK
5/ 6 sockhash:ktls:txmsg test ingress redirect:OK
6/ 7 sockhash:ktls:txmsg test skb:OK
7/12 sockhash:ktls:txmsg test apply:OK
8/12 sockhash:ktls:txmsg test cork:OK
9/ 3 sockhash:ktls:txmsg test hanging corks:OK
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
10/11 sockhash:ktls:txmsg test push_data:FAIL
detected data corruption @Iov[0]:0 17 != 00, 00 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 00 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
11/17 sockhash:ktls:txmsg test pull-data:FAIL
recv failed(): Invalid argument
rx thread exited with err 1.
recv failed(): Invalid argument
rx thread exited with err 1.
recv failed(): Bad message
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @Iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
12/ 9 sockhash:ktls:txmsg test pop-data:FAIL
recv failed(): Bad message
rx thread exited with err 1.
recv failed(): Bad message
rx thread exited with err 1.
13/ 6 sockhash:ktls:txmsg test push/pop data:FAIL
14/ 1 sockhash:ktls:txmsg test ingress parser:OK
15/ 0 sockhash:ktls:txmsg test ingress parser2:OK
Pass: 11 Fail: 17
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants