Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ng_nativenet crashes when hammered #3222

Closed
kaspar030 opened this issue Jun 19, 2015 · 8 comments · Fixed by #3340
Closed

ng_nativenet crashes when hammered #3222

kaspar030 opened this issue Jun 19, 2015 · 8 comments · Fixed by #3340
Assignees
Labels
Platform: native Platform: This PR/issue effects the native platform Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@kaspar030
Copy link
Contributor

using "ping6 -f %tapX" from two threads on linux to riot at the same time, I can reliably crash native:

kernel_init(): This is RIOT! (Version: 2014.12-1938-g910a-booze)
kernel_init(): jumping into first task...
UART0 thread started.
uart0_init() [OK]
RIOT network stack example application
All up, running the shell now
> /home/kaspar/src/riot.8/examples/ng_networking/bin/native/ng_networking.elf: lpm_set(): select(): Resource temporarily unavailable
[kaspar@booze ng_networking (master)]$

Weird thing is that it's crashing in lpm_set() with "Resource temporarily unavailable", which points to EAGAIN in errno, which should have been handled before. Looks like a race.

@LudwigOrtmann, does native share errno between all threads, like "normal" RIOT?

@kaspar030 kaspar030 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: native Platform: This PR/issue effects the native platform labels Jun 19, 2015
@PeterKietzmann
Copy link
Member

Currently I'm running into the same problem with RIOT-RIOT pings after a while (just for the record)

@PeterKietzmann
Copy link
Member

@kaspar030 do you have any further suggestions regarding this issue?

@kaspar030
Copy link
Contributor Author

Well, my gut feeling is that errno is being used in a non-thread-safe way. Other than completely refactoring native, we could try to ignore all errors from select.

@kaspar030
Copy link
Contributor Author

@PeterKietzmann Could you try #3340? It makes hammer-pings a lot more stable, and native crashes differently when applied.

@PeterKietzmann
Copy link
Member

I did. Seems to fix the above described bug. Often when hammering one tap device with pings (RIOT-RIOT in my case) I get SIGTRAPs after isr_cpu_switch_context_exit() and SIGSEVs in __isr_stack[]. Any ideas?

@jnohlgard
Copy link
Member

@PeterKietzmann From what you describe, I would guess a thread control block gets overwritten by a buffer overflow or stack overflow somewhere. Are you close to filling the stack on any thread?

@PeterKietzmann
Copy link
Member

@gebart regarding buffer overflows I hoped that the address-sanitizer tool would have shown me warnings. Long time no warnings. According to the thread stack I remember @haukepetersen once experienced a strange behaviour that the IDLE thread grew in size over the time. I'm currently looking at this, but until now I didn't find anything. See this output+backtrace. Strangely enough idle_stack() was one of the last before SIGSEV.

Ping number: 280000
    pid | name                 | state    Q | pri | stack ( used) | location   
      1 | idle                 | pending  Q |  15 |  8192 ( 1100) | 0x806e1c0 
      2 | main                 | running  Q |   7 | 16384 ( 3044) | 0x806a1c0 
      3 | uart0                | bl rx    _ |   6 |  8192 (  960) | 0x80841c0 
      4 | pktdump              | bl rx    _ |   6 | 16384 (  960) | 0x807d680 
      5 | ipv6                 | bl rx    _ |   4 |  8192 ( 1632) | 0x807b440 
      6 | udp                  | bl rx    _ |   5 |  8192 (  960) | 0x80816c0 
      7 | tapnet               | bl rx    _ |   4 |  8192 ( 1596) | 0x8079400 
        | SUM                  |            |     | 73728 (10252)
Ping number: 290000
    pid | name                 | state    Q | pri | stack ( used) | location   
      1 | idle                 | pending  Q |  15 |  8192 ( 1100) | 0x806e1c0 
      2 | main                 | running  Q |   7 | 16384 ( 3044) | 0x806a1c0 
      3 | uart0                | bl rx    _ |   6 |  8192 (  960) | 0x80841c0 
      4 | pktdump              | bl rx    _ |   6 | 16384 (  960) | 0x807d680 
      5 | ipv6                 | bl rx    _ |   4 |  8192 ( 1632) | 0x807b440 
      6 | udp                  | bl rx    _ |   5 |  8192 (  960) | 0x80816c0 
      7 | tapnet               | bl rx    _ |   4 |  8192 ( 1596) | 0x8079400 
        | SUM                  |            |     | 73728 (10252)

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08070098 in idle_stack ()
#2  0x555e5bcb in setcontext () at ../sysdeps/unix/sysv/linux/i386/setcontext.S:39
#3  0x0804c78f in isr_thread_yield () at /home/kietzmann/Dokumente/RIOT_repo_devel/RIOT/cpu/native/native_cpu.c:196
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@PeterKietzmann
Copy link
Member

PS: What you see is the output of a modified version of the ng_networking example when pinging many times from tap0 to tap1 and printing the stack sizes each 10000 pings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Platform: native Platform: This PR/issue effects the native platform Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants