Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_ipv6: fix UDP issue #5926

Closed

Conversation

mattiantonini
Copy link
Contributor

Hi All,
in #5596 we discussed about the problem of propagation of a UDP packet when it passes through another 6LoWPAN node. I closed that PR and I open this new one with the updated code. It should work also for non-6LoWPAN networks.

@miri64 miri64 self-assigned this Oct 11, 2016
@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking GNRC labels Oct 11, 2016
@Yonezawa-T2
Copy link
Contributor

Yonezawa-T2 commented Oct 12, 2016

I don't see what's the problem and why this solves it. If there is a problem, I guess the packet is invalid, or corrupted in some other module. Can we get a packet dump from Wireshark or debug output? Alternatively, can we get runtime state of pkt structure with debugger? If we get a packet dump, we can modify tests/gnrc_ipv6_ext or tests/gnrc_sixlowpan to replay it without a board, and we can trace execution with a debugger.

Even if this does fix the problem, this code may not work for 6LoWPAN border router, that is both 6LoWPAN node and IPv6 node.

@@ -855,7 +855,9 @@ static void _receive(gnrc_pktsnip_t *pkt)
ipv6 = gnrc_pktbuf_mark(pkt, sizeof(ipv6_hdr_t), GNRC_NETTYPE_IPV6);

first_ext = pkt;
#ifndef MODULE_GNRC_SIXLOWPAN
pkt->type = GNRC_NETTYPE_UNDEF; /* snip is no longer IPv6 */
Copy link
Contributor

@Yonezawa-T2 Yonezawa-T2 Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes sense only for 6LoWPAN packets. See #5596 (comment). So #ifndef MODULE_GNRC_SIXLOWPAN doesn't make sense.
If we doesn't set pkt->type = GNRC_NETTYPE_UNDEF when the 6LoWPAN module is enabled, other modules (e.g. gnrc_icmpv6, gnrc_udp) will be confused because a snip without IPv6 header is marked as a IPv6 snip. Recall #5596 (comment).

[edit]pkt->type is updated gnrc_ipv6_demux, so this does not affect gnrc_icmpv6 or gnrc_udp. [/edit]

@OlegHahm
Copy link
Member

@mattiantonini, could you describe the problem and a way to reproduce it?

@mattiantonini
Copy link
Contributor Author

mattiantonini commented Oct 12, 2016

@OlegHahm, I'm working with 2 samr21-xpro: on the first is running gnrc_border_router (I'll call it A) and on the other (I'll call it B) is running gnrc_networking. I've well configured my scenario infact I can ping both my nodes from linux shell. But, when I send a UDP packet to B (with nc) it is forwarded correctly on tap interface (I seen it on wireshark) but it arrives corrupted (wrong checksum) to B and it is dropped by UDP thread. I've enabled packet dump and the packet arrives with different packet lengths in ipv6 and udp headers (fixed to 8, it is the UDP header length) and the udp payload is removed.

@smlng
Copy link
Member

smlng commented Oct 12, 2016

Sounds like I experience something similar: I have multihop setup as well (using RPL) I can successfully ping any node in the network, even over multiple hops. But when I send a UDP message to same node, along the same path it does not get through.

But I have to take a closer look into this ...

@aabadie
Copy link
Contributor

aabadie commented Oct 13, 2016

I also have the same problem: you can reproduce it by following the RIOT border router IoT-LAB tutorial and use the 2016.07 release or latest master instead of the riot-upstream repository provided by iot-lab team and which is based on 2016.04.
I think gnrc UDP through gnrc BR is broken for quite some time now (see #5596 which was opened the 4th of july).

I would also recommend a non regression testing infrastructure for this.

@aabadie
Copy link
Contributor

aabadie commented Oct 13, 2016

I tested this PR on iot-lab : UDP issue fixed. I'm +1 to merge this one.

@@ -888,11 +890,19 @@ static void _receive(gnrc_pktsnip_t *pkt)
if (byteorder_ntohs(hdr->len) < pkt->size) {
gnrc_pktbuf_realloc_data(pkt, byteorder_ntohs(hdr->len));
}
#ifdef MODULE_GNRC_SIXLOWPAN
else if (byteorder_ntohs(hdr->len) >
(gnrc_pkt_len_upto(pkt, GNRC_NETTYPE_IPV6))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node have both 6LoWPAN and IPv6 interface, this condition may not catch invalid packets.

Running debugger on tests/gnrc_ipv6_ext with 6LoWPAN enabled (i.e. adding USEMODULE += gnrc_sixlowpan_default to the Makefile) shows gnrc_pkt_len_upto(pkt, GNRC_NETTYPE_IPV6) == 82 while byteorder_ntohs(hdr->len) == 42 because the former includes IPv6 header. So, a malicious packet with byteorder_ntohs(hdr->len) == 82 will pass this validation even if its actual length is 42.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is my biggest problem with this PR too. Please add a check if the interface is a 6LoWPAN interface + leave the check below in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(second only if GNRC_NETIF_NUM > 1)

@Yonezawa-T2
Copy link
Contributor

Yonezawa-T2 commented Oct 14, 2016

I found that iphc_nhc_udp_encode checks an UDP snip is marked as IPv6 snip to see whether the packet is forwarded from non-6LoWPAN IPv6 interface.

if (udp->type == GNRC_NETTYPE_IPV6) {

    if (udp->type == GNRC_NETTYPE_IPV6) {
        /* forwarded ipv6 packet */
        size_t diff = sizeof(udp_hdr_t) - nhc_len;
        for (size_t i = nhc_len; i < (udp->size - diff); i++) {
            udp_data[i] = udp_data[i + diff];
        }
        /* NOTE: gnrc_pktbuf_realloc_data overflow if (udp->size - diff) < 4 */
        gnrc_pktbuf_realloc_data(udp, (udp->size - diff));
    }
    else {
        /* shrink udp allocation to final size */
        gnrc_pktbuf_realloc_data(udp, nhc_len);
        DEBUG("6lo iphc nhc: set udp len to %d\n", (int) nhc_len);
    }

iphc_nhc_udp_encode is a function that overwrites UDP header snip with compressed one, and trim the snip to the compressed size.

If we do not do pkt->type = GNRC_NETTYPE_UNDEF in gnrc_ipv6.c, and the packet is forwareded form non-6LoWPAN IPv6 interface, the snip will be like the following:

pkt
↓
{ type: GNRC_NETTYPE_NETIF, ... }
↓ next == ipv6_hdr
{ type: GNRC_NETTYPE_IPV6, ... }
↓ next == udp
{ type: GNRC_NETTYPE_IPV6, size: sizeof(udp_hdr_t) + size_of_payload, ... }
/* ↑ actually, this snip contains UDP header and payload despite its type  */

If the packet is forwareded from 6LoWPAN interface, the packet will be like the following:

pkt
↓
{ type: GNRC_NETTYPE_NETIF, ... }
↓ next == ipv6_hdr
{ type: GNRC_NETTYPE_IPV6, ... }
↓ next == udp
{ type: GNRC_NETTYPE_UDP, size: sizeof(udp_hdr_t), ... }  /* UDP header */
↓ next
{ type: GNRC_NETTYPE_UNDEF, ... } /* UDP payload */

In former case, iphc_nhc_udp_encode copies the payload before clipping the snip and trim to the size including the compressed header and the payload. In latter case, iphc_nhc_udp_encode just clips the snip to the size of the compressed header because the payload is managed by a dedicated snip.

If we do pkt->type = GNRC_NETTYPE_UNDEF in gnrc_ipv6.c, iphc_nhc_udp_encode is confused and throws away the payload. This is the problem @mattiantonini and @aabadie met with.

But in the first place, why is this if-statement necessary? If the udp snip does not contain payload, udp->size == sizeof(udp_hdr_t) and the for-loop is no-op. So I propose unifying the two cases:

diff --git a/sys/net/gnrc/network_layer/sixlowpan/iphc/gnrc_sixlowpan_iphc.c b/sys/net/gnrc/network_layer/sixlowpan/iphc/gnrc_sixlowpan_iphc.c
index 5305fd3..09041c9 100644
--- a/sys/net/gnrc/network_layer/sixlowpan/iphc/gnrc_sixlowpan_iphc.c
+++ b/sys/net/gnrc/network_layer/sixlowpan/iphc/gnrc_sixlowpan_iphc.c
@@ -568,20 +568,12 @@ inline static size_t iphc_nhc_udp_encode(gnrc_pktsnip_t *udp, ipv6_hdr_t *ipv6_h
     /* Set UDP header ID (rfc6282#section-5). */
     ipv6_hdr->nh |= NHC_UDP_ID;

-    if (udp->type == GNRC_NETTYPE_IPV6) {
-        /* forwarded ipv6 packet */
-        size_t diff = sizeof(udp_hdr_t) - nhc_len;
-        for (size_t i = nhc_len; i < (udp->size - diff); i++) {
-            udp_data[i] = udp_data[i + diff];
-        }
-        /* NOTE: gnrc_pktbuf_realloc_data overflow if (udp->size - diff) < 4 */
-        gnrc_pktbuf_realloc_data(udp, (udp->size - diff));
-    }
-    else {
-        /* shrink udp allocation to final size */
-        gnrc_pktbuf_realloc_data(udp, nhc_len);
-        DEBUG("6lo iphc nhc: set udp len to %d\n", (int) nhc_len);
+    size_t diff = sizeof(udp_hdr_t) - nhc_len;
+    for (size_t i = nhc_len; i < (udp->size - diff); i++) {
+      udp_data[i] = udp_data[i + diff];
     }
+    /* NOTE: gnrc_pktbuf_realloc_data overflow if (udp->size - diff) < 4 */
+    gnrc_pktbuf_realloc_data(udp, (udp->size - diff));

     return nhc_len;
 }

The if-statement is introduced by @jfischer-phytec-iot (#4544 and #5232). @jfischer-phytec-iot, @gebart, and @miri64, can we unify the two cases?

@miri64
Copy link
Member

miri64 commented Oct 14, 2016

Wow, thanks for the great analysis like always @Yonezawa-T2! I have my hands full with sock, the release and prep for the hackathon at the moment, but I will see if I find some time to squeeze this in too.

@miri64 miri64 added this to the Release 2016.10 milestone Oct 14, 2016
@miri64
Copy link
Member

miri64 commented Oct 14, 2016

For now I added it only as a "Known issue", since I doubt we will merge it in the current state.

@smlng
Copy link
Member

smlng commented Oct 14, 2016

@miri64 I just tested the alternative solution proposed by @Yonezawa-T2 and it works. Moreover it helps to resolve my multihop UDP/CoAP issue, see #5926 here.

My setup is as follows: RPi <--> RIOT A <--> RIOT B , when I send a CoAP get from RPi to RIOT B the answer got stuck in RIOT A without the fix, with this fix it gets through. However, what bothers me is that the CoAP get request packets gains 1 byte in size when it passes from RPi through RIOT A and the response as well. To clarify (sniffed with wireshark on a second RPi):

  1. RPi sends GET /temperature to RIOT B via RIOT A, overall packet size 99B
  2. RIOT A receives CoAP request, forwards packet to RIOT A, overall packet size now 100B
  3. RIOT B sends CoAP response to RPi via RIOT A, overall packet size 93B
  4. RIOT A receives CoAP request, forwards packet to RPi, overall packet size now 94B

But wireshark also Report BAD FCS on all those packets with 1B extra ...

@miri64
Copy link
Member

miri64 commented Oct 14, 2016

@smlng Can maybe put your sniff data somewhere so I can analyse this?

@smlng
Copy link
Member

smlng commented Oct 14, 2016

@miri64 sure! See here - the link is valid until 22.10.16.

@miri64
Copy link
Member

miri64 commented Oct 14, 2016

I'm looking at multihop_coap_with_udpfix.pcapng right now. Which one of those is RIOT A, which one is RIOT B and which is the RPi per the scenario you described above (because the sizes don't match and I don't see any problems with the packets themselves)

@miri64
Copy link
Member

miri64 commented Oct 14, 2016

Btw, that the 6LoWPAN frame changes after one hop is to be expected: since we can't/can use the link-layer address for IID compression we need to put it inline.

@smlng
Copy link
Member

smlng commented Oct 14, 2016

@miri64 ah gees, totally forgot to name which is which 😄 here you go:

  • RPi: fe80::1ac0:ffee:c0ff:ee01, fd16:abcd:ef01:3::1 and fd16:abcd:ef01:3:1ac0:ffee:c0ff:ee01
  • RIOT A: fe80::f2e2:4e7e:8874:3016 and fd16:abcd:ef01:3:f2e2:4e7e:8874:3016
  • RIOT B: fe80::f2e2:4e53:887a:3016 and fd16:abcd:ef01:3:f2e2:4e53:887a:3016

And the communication was:

  1. 2x RPi -> RIOT A GET /temperature
  2. 2x RPi -> RIOT B GET /temperature

@miri64
Copy link
Member

miri64 commented Oct 15, 2016

I cleaned up the dump, have a look the additional byte is the hop limit (its now 63 and thus not compressible anymore). So everything alright up until this point.

However, the FCS is wrong, as you pointed out. I guess that's because for some reason the CoAP payload get's shredded:

CON

1st hop:

0000   40 01 ad f2 bb 74 65 6d 70 65 72 61 74 75 72 65  @....temperature

2nd hop:

0000   40 01 ad f2 bb 74 65 6d 70 65 72 ff 27 f0 72 ff  @....temper.'.r.

(75f0 and 65ff)

ACK

1st hop:

0000   60 45 ad f2 c2 00 00 ff 32 37 2e 31 37           `E......27.17

2nd hop:

0000   60 45 ad f2 c2 00 00 ff 27 f0 2e ff 27           `E......'...'

(37f0 and 31ff)

So the replacement goes to the same values but at different positions (even in the overall 6LoWPAN frame)

@miri64
Copy link
Member

miri64 commented Oct 15, 2016

oh before the f0 shift there is also a shift to 27 :-/

@miri64
Copy link
Member

miri64 commented Oct 31, 2016

Postponed due to feature freeze

@miri64 miri64 modified the milestones: Release 2017.01, Release 2016.10 Oct 31, 2016
@OlegHahm
Copy link
Member

OlegHahm commented Nov 2, 2016

But this isn't a feature, is it?

@miri64
Copy link
Member

miri64 commented Nov 2, 2016

Right, sorry. But I think this PR still needs a lot of work ;-).

@miri64 miri64 modified the milestones: Release 2016.10, Release 2017.01 Nov 2, 2016
@aabadie
Copy link
Contributor

aabadie commented Nov 2, 2016

That would be great if a solution to the broken border router (using ethos/slip) could be found for this release (just putting a bit of extra pressure ;) ). We need this for IoT-LAB !

@miri64
Copy link
Member

miri64 commented Nov 2, 2016

Yes, but this is not it in the current state since it would break other stuff ;-).

@kYc0o
Copy link
Contributor

kYc0o commented Nov 6, 2016

I think this patch would be sufficient for now, and then we can provide a patch for the condition @Yonezawa-T2 describes, if it's the only problem. @miri64 what other stuff you think will be broken with this PR?

I'd like to have this into the release because it's somehow "essential" for IoT-Lab, the aim is to base the tutorials for this platform on this release.

@kYc0o
Copy link
Contributor

kYc0o commented Nov 6, 2016

Another option can be to apply the patch proposed by @Yonezawa-T2 and then fix the FCS problem in another PR... I don't know, I'm just thinking on a temporal fixing to make the thing work and then address the specific problems in next PRs...

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

@kYc0o have you read the thread? This PR breaks more then it fixes! The problem is IPHC, not IPv6. So a fix should go in there. Merging a half-baked PR figuratively 5 min before release sounds like a very bad idea to me. We can however always backport a real fix afterwards

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

I added it as a known issue. If someone can provide a non-breaking fix for the release: great! But I don't think patching a bug by creating another is a good solution.

@aabadie
Copy link
Contributor

aabadie commented Nov 7, 2016

Maybe someone knownledgeable on this network related things, and with maintainer rights could push in this PR directly ? (this is a recently added feature in GitHub)

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

It might also be a good idea to formulate a test for the release specifications for this for next release, so this is definitely fixed in 2017.01 ;-).

@kYc0o
Copy link
Contributor

kYc0o commented Nov 7, 2016

I agree with you, what was in my mind is that this PR fixes a expected functionality and I don't really get why it works for some setups and not for others. As @smlng pointed out the alternative proposed patch works for his setup, then I understand that the frames get modified and the FCS gets wrong (sorry I didn't catch the IPHC stuff here :( ) thus I thought we could find a solution for this afterwards... But in overall of course I agree with you on not merge patches which breaks other things (though it happened with the PR which broke this multi-hop through a BR functionality).

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

(though it happened with the PR which broke this multi-hop through a BR functionality).

This is why I aim now at more at caution than eagerness to fix that.

@mattiantonini
Copy link
Contributor Author

Hi all,
I have just tested the solution proposed by @Yonezawa-T2 in a single hop scenario (BR and udp server) and it works (I'm busy for other things). But, I do not understand if the Yonezawa-T2's solution breaks the multi-hop or not... if not, could it be the solution?

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

Maybe someone knownledgeable on this network related things, and with maintainer rights could push in this PR directly ? (this is a recently added feature in GitHub)

If (and only if) I find the time between testing and release note writing I can do that, but I would be happy if someone else looks into this (because e.g. #5729 is also on my plate and I did not find time for that either yet).

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

Okay now we have a failing test most likely due to this issue in the Release Specs, now we are talking so now I we need to resolve this issue fixed anyways for the release.

@aabadie
Copy link
Contributor

aabadie commented Nov 7, 2016

Okay now we have a failing test most likely due to this issue in the Release Specs, now we are talking so we need to resolve this issue fixed anyways for the release.

Thanks a lot @miri64 :)

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

Nope, was just wrong config on my part ^^"

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

But will look into it regardless for backporting.

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

(we always can make a 2016.10.1 ;-))

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

Okay, now I definitely hit it with the interop tests.... Seems like we need to postpone the release because of this issue :-/

@miri64
Copy link
Member

miri64 commented Nov 7, 2016

I now opened a PR with @Yonezawa-T2's proposal. Let's see if UDP multihop still works with it Intra-net (not tested by myself, but put into question by @smlng's comments) and over the BR (already tested and working).

@miri64
Copy link
Member

miri64 commented Nov 9, 2016

Alternative provided in #6064 was merged.

@miri64 miri64 closed this Nov 9, 2016
@miri64 miri64 removed this from the Release 2016.10 milestone Nov 9, 2016
@mattiantonini mattiantonini deleted the gnrc_ipv6/pr/udp-fix branch December 6, 2016 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants