Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force fallback to TCP #241

Closed
dulive opened this issue Nov 18, 2021 · 11 comments
Closed

Force fallback to TCP #241

dulive opened this issue Nov 18, 2021 · 11 comments
Assignees
Labels

Comments

@dulive
Copy link

dulive commented Nov 18, 2021

Hello,

Is it possible to force an arbitrary MPTCP connection to fallback to TCP from the userspace, for example using NETLINK?

@VenkateswaranJ
Copy link

Hello,

Mptcp connection will fall back to TCP on the below scenarios.

  1. Any of the hosts (server or client) is not Mptcp capable.
  2. If there is any buggy middlebox in your network that remove or change the Mptcp option in the TCP packet header.

Currently, you can't force an active MPTCP connection to TCP fallback via Netlink command.

@dulive
Copy link
Author

dulive commented Nov 18, 2021

Currently, you can't force an active MPTCP connection to TCP fallback via Netlink command.

Are there any plans to support it in the future or it's out of the scope?

@VenkateswaranJ
Copy link

VenkateswaranJ commented Nov 18, 2021

Current Netlink command implementation only focus on managing Mptcp connection (create subflow, destroy subflow etc.,)
You can have a look here #186

But can you share your use case for this? Why do you want to force an active Mptcp Connection to TCP in the middle of communication?

@dulive
Copy link
Author

dulive commented Nov 18, 2021

Actually it wouldn't be in the middle of a communication but rather when a MPTCP connection is created or established.

Basically I want to force fallback to TCP if a MPTCP connection was created/established for some interface that I don't want to use MPTCP (for example the interface is connected to an unprotected/untrusted network) or if I don't trust the other side to use MPTCP.

@VenkateswaranJ
Copy link

VenkateswaranJ commented Nov 18, 2021

Sorry for misunderstanding your question. So if you know that untrusted interface beforehand then I would suggest you create a TCP socket to initiate a connection to that interface instead of an MPTCP socket. If you only get to know that untrusted interface after the connection is established then there is no straightforward way to force that connection to fallback(at least as far as I know).

@dulive
Copy link
Author

dulive commented Nov 18, 2021

This is not for a software/application to use MPTCP but for a daemon/mptcpd plugin that would control the use of MPTCP on the system (for example, if any application on the system tried to use MPTCP through an untrusted interface it would force it to TCP).

there is no straightforward way to force that connection to fallback(at least as far as I know).

I see, but could it work if the packets were intercepted with Netfilter and altered to be simply TCP?

@matttbe
Copy link
Member

matttbe commented Nov 18, 2021

Hello,

I don't think we should do some policy management via the Path Manager.

With Netfilter, you can drop TCP options. Would it not be OK for you?
I don't know if we can do something "similar" with other "security" features implemented in the kernel..

@VenkateswaranJ
Copy link

This is not for a software/application to use MPTCP but for a daemon/mptcpd plugin that would control the use of MPTCP on the system (for example, if an application on the system tried to use MPTCP through an untrusted interface it would force it to TCP).

Hmm, Mptcpd daemon can't do that. It can only manage active mptcp connection (like create/destroy subflow, announce/remove IP etc., ). It's not possible to make any changes at the mptcp connection level.

I see, but could it work if the packets were intercepted with Netfilter and altered to be simply TCP?

Yepp you can do that. I even tried it sometime before for testing, it will fall back to infinite mapping.

Also check this issue: multipath-tcp/mptcp#420

#! /usr/bin/env python2.7
from scapy.all import *
from netfilterqueue import NetfilterQueue
import os

iptablesr = "iptables -A FORWARD -j NFQUEUE --queue-num 1"

os.system(iptablesr)
os.system("sysctl net.ipv4.ip_forward=1")

def check_mptcp_option(option_list):
    for option in option_list:
    	if option[0] == 30:
    	    return True
    return False

def modify(packet):
	ip_pkt = IP(packet.get_payload())
	try:
		ip_tcp = ip_pkt.getlayer(TCP)
		if check_mptcp_option(ip_tcp.options):
			payload_before = len(ip_pkt[TCP])
			option = list(ip_pkt[TCP].options[-1][1])
			option[-1] = 'm'
			option = "".join(option)
			ip_pkt[TCP].options = ip_pkt[TCP].options[:-1]
			ip_pkt[TCP].options.append((30, ""))
			payload_after = len(ip_pkt[TCP])
			payload_dif = payload_after - payload_before
			ip_pkt[IP].len = ip_pkt[IP].len + payload_dif
			ip_pkt[TCP].dataofs = (payload_after - len(ip_pkt[TCP].payload))/4
			del ip_pkt[IP].chksum
			del ip_pkt[TCP].chksum
			print ip_pkt[TCP].options
			packet.set_payload(str(ip_pkt))
		packet.accept()
	except Exception as e:
		print e
		packet.accept() #just skip the packet unmodified.

nfqueue = NetfilterQueue()
nfqueue.bind(1, modify)
try:
    print "[*] waiting for data"
    nfqueue.run()
except KeyboardInterrupt:
    nfqueue.unbind()
    print "Flushing iptables."
    os.system('iptables -F')
    os.system('iptables -X')

@matttbe matttbe self-assigned this Nov 18, 2021
@dulive
Copy link
Author

dulive commented Nov 18, 2021

With Netfilter, you can drop TCP options. Would it not be OK for you?

As I never used Netfilter I was looking if there was already something more easy and direct since there are events for MPTCP communication creation and establishment. At the same time, queuing all IPv4 and IPv6 packets just to modify those which are MPTCP and from/to an untrusted/unprotected interface seems a little too much for a simple Mptcpd plugin for example. But is always something that I can try.

Hmm, Mptcpd daemon can't do that. It can only manage active mptcp connection (like create/destroy subflow, announce/remove IP etc., ). It's not possible to make any changes at the mptcp connection level.

Well a plugin can extend Mptcpd to do so, it can have its own Netlink socket and everything, but this is not related to the issue/question.

Also check this issue: multipath-tcp/mptcp#420

Will do so.

@matttbe
Copy link
Member

matttbe commented Nov 19, 2021

With Netfilter, you can drop TCP options. Would it not be OK for you?

As I never used Netfilter I was looking if there was already something more easy and direct since there are events for MPTCP communication creation and establishment. At the same time, queuing all IPv4 and IPv6 packets just to modify those which are MPTCP and from/to an untrusted/unprotected interface seems a little too much for a simple Mptcpd plugin for example. But is always something that I can try.

With IPTables for example, you can match MPTCP traffic with -p tcp --tcp-option 30. Then you can use TCPOPTSTRIP target to strip MPTCP option. Additionally, you can restrict to one interface and only to SYN. For example:

iptables -w -t filter -A OUTPUT -o eth0 -p tcp --tcp-option 30 --tcp-flags SYN,ACK SYN -j TCPOPTSTRIP --strip-options 30

(same with ip6tables, similar with nftables or other tools on top of them)

So does it make sense not to do some policy actions from the PM? :)

@dulive
Copy link
Author

dulive commented Nov 19, 2021

Interesting I didn't know about that, it's probably the best solution.
Now I just have to learn how to do that in C.

So does it make sense not to do some policy actions from the PM? :)

With what you said it makes a lot of sense

Thank you both.

@dulive dulive closed this as completed Nov 19, 2021
jenkins-tessares pushed a commit that referenced this issue May 26, 2022
We got issue as follows:
EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
ext4_get_first_dir_block: bh->b_data=0xffff88810bee6000 len=34478
ext4_get_first_dir_block: *parent_de=0xffff88810beee6ae bh->b_data=0xffff88810bee6000
ext4_rename_dir_prepare: [1] parent_de=0xffff88810beee6ae
==================================================================
BUG: KASAN: use-after-free in ext4_rename_dir_prepare+0x152/0x220
Read of size 4 at addr ffff88810beee6ae by task rep/1895

CPU: 13 PID: 1895 Comm: rep Not tainted 5.10.0+ #241
Call Trace:
 dump_stack+0xbe/0xf9
 print_address_description.constprop.0+0x1e/0x220
 kasan_report.cold+0x37/0x7f
 ext4_rename_dir_prepare+0x152/0x220
 ext4_rename+0xf44/0x1ad0
 ext4_rename2+0x11c/0x170
 vfs_rename+0xa84/0x1440
 do_renameat2+0x683/0x8f0
 __x64_sys_renameat+0x53/0x60
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f45a6fc41c9
RSP: 002b:00007ffc5a470218 EFLAGS: 00000246 ORIG_RAX: 0000000000000108
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f45a6fc41c9
RDX: 0000000000000005 RSI: 0000000020000180 RDI: 0000000000000005
RBP: 00007ffc5a470240 R08: 00007ffc5a470160 R09: 0000000020000080
R10: 00000000200001c0 R11: 0000000000000246 R12: 0000000000400bb0
R13: 00007ffc5a470320 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:00000000440015ce refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x10beee
flags: 0x200000000000000()
raw: 0200000000000000 ffffea00043ff4c8 ffffea0004325608 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88810beee580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88810beee600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88810beee680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                  ^
 ffff88810beee700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88810beee780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
ext4_rename_dir_prepare: [2] parent_de->inode=3537895424
ext4_rename_dir_prepare: [3] dir=0xffff888124170140
ext4_rename_dir_prepare: [4] ino=2
ext4_rename_dir_prepare: ent->dir->i_ino=2 parent=-757071872

Reason is first directory entry which 'rec_len' is 34478, then will get illegal
parent entry. Now, we do not check directory entry after read directory block
in 'ext4_get_first_dir_block'.
To solve this issue, check directory entry in 'ext4_get_first_dir_block'.

[ Trigger an ext4_error() instead of just warning if the directory is
  missing a '.' or '..' entry.   Also make sure we return an error code
  if the file system is corrupted.  -TYT ]

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220414025223.4113128-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
jenkins-tessares pushed a commit that referenced this issue Jul 20, 2023
Add a big batch of test coverage to assert all aspects of the tcx opts
attach, detach and query API:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  #238     tc_opts_after:OK
  #239     tc_opts_append:OK
  #240     tc_opts_basic:OK
  #241     tc_opts_before:OK
  #242     tc_opts_chain_classic:OK
  #243     tc_opts_demixed:OK
  #244     tc_opts_detach:OK
  #245     tc_opts_detach_after:OK
  #246     tc_opts_detach_before:OK
  #247     tc_opts_dev_cleanup:OK
  #248     tc_opts_invalid:OK
  #249     tc_opts_mixed:OK
  #250     tc_opts_prepend:OK
  #251     tc_opts_replace:OK
  #252     tc_opts_revision:OK
  Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-8-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
matttbe pushed a commit that referenced this issue Aug 17, 2023
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").

  # ./test_progs -t tc
  [...]
  #234     tc_links_after:OK
  #235     tc_links_append:OK
  #236     tc_links_basic:OK
  #237     tc_links_before:OK
  #238     tc_links_chain_classic:OK
  #239     tc_links_chain_mixed:OK
  #240     tc_links_dev_cleanup:OK
  #241     tc_links_dev_mixed:OK
  #242     tc_links_ingress:OK
  #243     tc_links_invalid:OK
  #244     tc_links_prepend:OK
  #245     tc_links_replace:OK
  #246     tc_links_revision:OK
  #247     tc_opts_after:OK
  #248     tc_opts_append:OK
  #249     tc_opts_basic:OK
  #250     tc_opts_before:OK
  #251     tc_opts_chain_classic:OK
  #252     tc_opts_chain_mixed:OK
  #253     tc_opts_delete_empty:OK
  #254     tc_opts_demixed:OK
  #255     tc_opts_detach:OK
  #256     tc_opts_detach_after:OK
  #257     tc_opts_detach_before:OK
  #258     tc_opts_dev_cleanup:OK
  #259     tc_opts_invalid:OK
  #260     tc_opts_mixed:OK
  #261     tc_opts_prepend:OK
  #262     tc_opts_replace:OK
  #263     tc_opts_revision:OK
  [...]
  Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants