ng_nativenet crashes when hammered #3222

kaspar030 · 2015-06-19T07:53:56Z

using "ping6 -f %tapX" from two threads on linux to riot at the same time, I can reliably crash native:

kernel_init(): This is RIOT! (Version: 2014.12-1938-g910a-booze)
kernel_init(): jumping into first task...
UART0 thread started.
uart0_init() [OK]
RIOT network stack example application
All up, running the shell now
> /home/kaspar/src/riot.8/examples/ng_networking/bin/native/ng_networking.elf: lpm_set(): select(): Resource temporarily unavailable
[kaspar@booze ng_networking (master)]$

Weird thing is that it's crashing in lpm_set() with "Resource temporarily unavailable", which points to EAGAIN in errno, which should have been handled before. Looks like a race.

@LudwigOrtmann, does native share errno between all threads, like "normal" RIOT?

The text was updated successfully, but these errors were encountered:

PeterKietzmann · 2015-07-06T11:52:28Z

Currently I'm running into the same problem with RIOT-RIOT pings after a while (just for the record)

PeterKietzmann · 2015-07-07T09:18:12Z

@kaspar030 do you have any further suggestions regarding this issue?

kaspar030 · 2015-07-07T09:51:45Z

Well, my gut feeling is that errno is being used in a non-thread-safe way. Other than completely refactoring native, we could try to ignore all errors from select.

kaspar030 · 2015-07-08T22:36:16Z

@PeterKietzmann Could you try #3340? It makes hammer-pings a lot more stable, and native crashes differently when applied.

PeterKietzmann · 2015-07-09T07:28:30Z

I did. Seems to fix the above described bug. Often when hammering one tap device with pings (RIOT-RIOT in my case) I get SIGTRAPs after isr_cpu_switch_context_exit() and SIGSEVs in __isr_stack[]. Any ideas?

jnohlgard · 2015-07-09T07:54:40Z

@PeterKietzmann From what you describe, I would guess a thread control block gets overwritten by a buffer overflow or stack overflow somewhere. Are you close to filling the stack on any thread?

PeterKietzmann · 2015-07-09T08:40:01Z

@gebart regarding buffer overflows I hoped that the address-sanitizer tool would have shown me warnings. Long time no warnings. According to the thread stack I remember @haukepetersen once experienced a strange behaviour that the IDLE thread grew in size over the time. I'm currently looking at this, but until now I didn't find anything. See this output+backtrace. Strangely enough idle_stack() was one of the last before SIGSEV.

Ping number: 280000
    pid | name                 | state    Q | pri | stack ( used) | location   
      1 | idle                 | pending  Q |  15 |  8192 ( 1100) | 0x806e1c0 
      2 | main                 | running  Q |   7 | 16384 ( 3044) | 0x806a1c0 
      3 | uart0                | bl rx    _ |   6 |  8192 (  960) | 0x80841c0 
      4 | pktdump              | bl rx    _ |   6 | 16384 (  960) | 0x807d680 
      5 | ipv6                 | bl rx    _ |   4 |  8192 ( 1632) | 0x807b440 
      6 | udp                  | bl rx    _ |   5 |  8192 (  960) | 0x80816c0 
      7 | tapnet               | bl rx    _ |   4 |  8192 ( 1596) | 0x8079400 
        | SUM                  |            |     | 73728 (10252)
Ping number: 290000
    pid | name                 | state    Q | pri | stack ( used) | location   
      1 | idle                 | pending  Q |  15 |  8192 ( 1100) | 0x806e1c0 
      2 | main                 | running  Q |   7 | 16384 ( 3044) | 0x806a1c0 
      3 | uart0                | bl rx    _ |   6 |  8192 (  960) | 0x80841c0 
      4 | pktdump              | bl rx    _ |   6 | 16384 (  960) | 0x807d680 
      5 | ipv6                 | bl rx    _ |   4 |  8192 ( 1632) | 0x807b440 
      6 | udp                  | bl rx    _ |   5 |  8192 (  960) | 0x80816c0 
      7 | tapnet               | bl rx    _ |   4 |  8192 ( 1596) | 0x8079400 
        | SUM                  |            |     | 73728 (10252)

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08070098 in idle_stack ()
#2  0x555e5bcb in setcontext () at ../sysdeps/unix/sysv/linux/i386/setcontext.S:39
#3  0x0804c78f in isr_thread_yield () at /home/kietzmann/Dokumente/RIOT_repo_devel/RIOT/cpu/native/native_cpu.c:196
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

PeterKietzmann · 2015-07-09T08:42:52Z

PS: What you see is the output of a modified version of the ng_networking example when pinging many times from tap0 to tap1 and printing the stack sizes each 10000 pings.

kaspar030 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: native Platform: This PR/issue effects the native platform labels Jun 19, 2015

kaspar030 assigned LudwigKnuepfer Jun 19, 2015

kaspar030 mentioned this issue Jul 8, 2015

cpu: native: work around shared errno in _native_lpm_sleep #3340

Merged

kaspar030 closed this as completed in #3340 Jul 9, 2015

kaspar030 mentioned this issue Jul 9, 2015

netdev2_tap: crashes when hammered #3341

Closed

OlegHahm modified the milestone: Release 2015.09 Sep 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ng_nativenet crashes when hammered #3222

ng_nativenet crashes when hammered #3222

kaspar030 commented Jun 19, 2015

PeterKietzmann commented Jul 6, 2015

PeterKietzmann commented Jul 7, 2015

kaspar030 commented Jul 7, 2015

kaspar030 commented Jul 8, 2015

PeterKietzmann commented Jul 9, 2015

jnohlgard commented Jul 9, 2015

PeterKietzmann commented Jul 9, 2015

PeterKietzmann commented Jul 9, 2015

ng_nativenet crashes when hammered #3222

ng_nativenet crashes when hammered #3222

Comments

kaspar030 commented Jun 19, 2015

PeterKietzmann commented Jul 6, 2015

PeterKietzmann commented Jul 7, 2015

kaspar030 commented Jul 7, 2015

kaspar030 commented Jul 8, 2015

PeterKietzmann commented Jul 9, 2015

jnohlgard commented Jul 9, 2015

PeterKietzmann commented Jul 9, 2015

PeterKietzmann commented Jul 9, 2015