Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflowed after a while #5001

Closed
sreibs opened this issue Mar 8, 2016 · 13 comments
Closed

Stack overflowed after a while #5001

sreibs opened this issue Mar 8, 2016 · 13 comments
Labels
Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: question The issue poses a question regarding usage of RIOT

Comments

@sreibs
Copy link

sreibs commented Mar 8, 2016

Hi all,

I am facing kind of a problem on a STM32F0 MCU. The program is running pretty well but crashed after a random time. So far, it happens between 2 and 11 hours...

In the test condition, the system is requesting measurement to an external ADC every 2 seconds.

I am using vtimer_now(&timex); and compare timex to count the 2 seconds. This program is based on RIOT commit 0e285e8.

Here is the trace (always the same when it crashes):

2016-03-08 18:30:58,490 - INFO # ADE-DRV- Read register 0b
2016-03-08 18:30:58,494 - INFO # ADE-DRV- Read 16bits-reg 0b, answered: 10 40
2016-03-08 18:30:58,496 - INFO # ADE-DRV- Read register 0b
2016-03-08 18:30:58,497 - INFO # ADE-
2016-03-08 18:30:58,499 - INFO # Context before hardfault:
2016-03-08 18:30:58,527 - INFO #    r0: 0x00000100
2016-03-08 18:30:58,527 - INFO #    r1: 0x00000007
2016-03-08 18:30:58,529 - INFO #    r2: 0x00000440
2016-03-08 18:30:58,529 - INFO #    r3: 0x00000008
2016-03-08 18:30:58,532 - INFO #   r12: 0x00000000
2016-03-08 18:30:58,533 - INFO #    lr: 0x08003e05
2016-03-08 18:30:58,533 - INFO #    pc: 0x0800108c
2016-03-08 18:30:58,534 - INFO #   psr: 0x01000016
2016-03-08 18:30:58,534 - INFO # 
2016-03-08 18:30:58,534 - INFO # Misc
2016-03-08 18:30:58,535 - INFO # EXC_RET: 0xfffffff1
2016-03-08 18:30:58,536 - INFO # Attempting to reconstruct state for debugging...
2016-03-08 18:30:58,536 - INFO # In GDB:
2016-03-08 18:30:58,536 - INFO #   set $pc=0x800108c
2016-03-08 18:30:58,536 - INFO #   frame 0
2016-03-08 18:30:58,537 - INFO #   bt
2016-03-08 18:30:58,537 - INFO # 
2016-03-08 18:30:58,537 - INFO # ISR stack overflowed by at least 128 bytes.

What do you think it could cause that? How can I track down this bug?

Thank you for your help,

PS: By the way, congrats for this awesome work !

@OlegHahm OlegHahm added Type: question The issue poses a question regarding usage of RIOT Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Mar 8, 2016
@OlegHahm
Copy link
Member

OlegHahm commented Mar 8, 2016

I am using vtimer_now(&timex);

vtimer is deprecated and one should use xtimer instead. However, vtimer is using xtimer internally now anyway and this shouldn't be related to your problem.

What do you think it could cause that? How can I track down this bug?

Have you tried
make debug
along with the lines from the output:

set $pc=0x800108c
frame 0
bt

?

PS: By the way, congrats for this awesome work !

Thanks!

@sreibs
Copy link
Author

sreibs commented Mar 8, 2016

Thank you for quick reply.

I am not sure how I sould use the "make debug"... can you provide details?

There is no point to change all vtimer then?

@OlegHahm
Copy link
Member

OlegHahm commented Mar 8, 2016

make debug should work basically the same way you call make flash or make term. You can connect to your board after it crashed with gdb using make debug.

Another try might be to call make objdump and check what's happening at 0x0800108c and 0x08003e05.

@sreibs
Copy link
Author

sreibs commented Mar 8, 2016

Here is the result of objdump:

void gpio_clear(gpio_t pin)
{
    _port(pin)->BRR = (1 << _pin_num(pin));
 8001088:       2001            movs    r0, #1
 800108a:       4098            lsls    r0, r3
 800108c:       6290            str     r0, [r2, #40]   ; 0x28
}
 800108e:       4770            bx      lr

08001090 <gpio_toggle>:

I can't find 0x08003e05, but I supposed it it somewhere around:

int nrf24l01p_set_tx_address(nrf24l01p_t *dev, char *saddr, unsigned int length)
{
 8003e02:       1c16            adds    r6, r2, #0
 8003e04:       1c0d            adds    r5, r1, #0
    int status;

    /* Acquire exclusive access to the bus. */
    spi_acquire(dev->spi);
 8003e06:       f000 fad9       bl      80043bc <spi_acquire>
    gpio_clear(dev->cs);
 8003e0a:       68a0            ldr     r0, [r4, #8]
 8003e0c:       f7fd f938       bl      8001080 <gpio_clear>
    xtimer_spin(DELAY_CS_TOGGLE_TICKS);
 8003e10:       2002            movs    r0, #2
 8003e12:       f7ff fec1       bl      8003b98 <xtimer_spin>
    status = spi_transfer_regs(dev->spi, (CMD_W_REGISTER | (REGISTER_MASK & REG_TX_ADDR)), saddr, NULL, length); /* address width is 5 byte */
 8003e16:       1c2a            adds    r2, r5, #0
 8003e18:       2130            movs    r1, #48 ; 0x30
 8003e1a:       2300            movs    r3, #0

Anyway, this function should not be used at the time of the crash...

Does it give some hint?

@OlegHahm
Copy link
Member

OlegHahm commented Mar 8, 2016

Hm, since you've said that the problem happens only after some hours of runtime, I would assume that both, dev->cs and _pin_num(pin) are initially correct. A common reason would be an overflow somewhere. Maybe you can check by calling the ps() command periodically - or just increase the stack sizes for this MCU in https://github.com/RIOT-OS/RIOT/blob/master/cpu/cortexm_common/include/cpu_conf_common.h by some random value and see if this changes anything.

@immesys
Copy link
Contributor

immesys commented Mar 9, 2016

Also, I get crashes with my xtimer code that (I think) go away with #4903. Perhaps give it a try.

EDIT: they don't really look related, but it won't hurt to try.

@sreibs
Copy link
Author

sreibs commented Mar 9, 2016

Ok. I'll try this tonight or tomorrow and let run for a while. Let you know.

Le mer. 9 mars 2016 01:12, Michael Andersen notifications@github.com a
écrit :

Also, I get crashes with my xtimer code that (I think) go away with #4903
#4903. Perhaps give it a try


Reply to this email directly or view it on GitHub
#5001 (comment).

@sreibs
Copy link
Author

sreibs commented Mar 9, 2016

I rebased on master. Let's go for a test overnight.

Le mer. 9 mars 2016 06:41, Sebastien Risler sebastien.risler@gmail.com a
écrit :

Ok. I'll try this tonight or tomorrow and let run for a while. Let you
know.

Le mer. 9 mars 2016 01:12, Michael Andersen notifications@github.com a
écrit :

Also, I get crashes with my xtimer code that (I think) go away with #4903
#4903. Perhaps give it a try


Reply to this email directly or view it on GitHub
#5001 (comment).

@OlegHahm OlegHahm added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Mar 9, 2016
@sreibs
Copy link
Author

sreibs commented Mar 10, 2016

The device is running since 7:30AM and no problem. It is a record !

I let it run over the night and close the issue if it is still on.

Thank you for your help

@OlegHahm
Copy link
Member

I'll keep my fingers crossed!

@sreibs
Copy link
Author

sreibs commented Mar 11, 2016

Still working after more than 24h running. I close this bug.

Reminder: I rebased on commit: d15bc43

Thank you all

@sreibs sreibs closed this as completed Mar 11, 2016
@OlegHahm
Copy link
Member

Glad to hear! :)

To clarifiy, you didn't have to include #4903 as well?

@sreibs
Copy link
Author

sreibs commented Mar 11, 2016

No

Le ven. 11 mars 2016 10:24, Oleg Hahm notifications@github.com a écrit :

Glad to hear! :)

To clarifiy, you didn't have to include #4903
#4903 as well?


Reply to this email directly or view it on GitHub
#5001 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: question The issue poses a question regarding usage of RIOT
Projects
None yet
Development

No branches or pull requests

3 participants