Skip to content

Commit

Permalink
net/tunnel: wait until all sk_user_data reader finish before releasin…
Browse files Browse the repository at this point in the history
…g the sock

There is a race condition in vxlan that when deleting a vxlan device
during receiving packets, there is a possibility that the sock is
released after getting vxlan_sock vs from sk_user_data. Then in
later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
NULL pointer dereference. e.g.

   #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
   Freescale#1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
   Freescale#2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
   Freescale#3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
   Freescale#4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
   Freescale#5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
   Freescale#6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
      [exception RIP: vxlan_ecn_decapsulate+0x3b]
      RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
      RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
      RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
      RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
      R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
      R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   Freescale#7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
   Freescale#8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
   Freescale#9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
  Freescale#10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
  Freescale#11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
  Freescale#12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
  Freescale#13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
  Freescale#14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
  Freescale#15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
  Freescale#16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
  Freescale#17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
  Freescale#18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3

Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh

Fix this by waiting for all sk_user_data reader to finish before
releasing the sock.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
liuhangbin authored and davem330 committed Dec 12, 2022
1 parent 2f623aa commit 3cf7203
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions net/ipv4/udp_tunnel_core.c
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
void udp_tunnel_sock_release(struct socket *sock)
{
rcu_assign_sk_user_data(sock->sk, NULL);
synchronize_rcu();
kernel_sock_shutdown(sock, SHUT_RDWR);
sock_release(sock);
}
Expand Down

0 comments on commit 3cf7203

Please sign in to comment.