Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logger stopped completely during flight? #12565

Closed
JMshare opened this issue Jul 26, 2019 · 10 comments · Fixed by #12679
Closed

Logger stopped completely during flight? #12565

JMshare opened this issue Jul 26, 2019 · 10 comments · Fixed by #12679

Comments

@JMshare
Copy link

JMshare commented Jul 26, 2019

Describe the bug
We did 4 flight tests on Tuesday, the first two logged fine, the last two stopped logging during fight. The other parts of the system were working as usual so I didn't even notice anything until reading the logs and finding out most of it is missing.
I am logging everything at the highest rates so I am not surprised by the frequent dropouts, but why did it just go off completely?

Now I am trying to increase the buffer size by multiplying it in line 283. https://github.com/PX4/Firmware/blob/36534d5bab3d6a73f146e671df1b8f40772174b7/src/modules/logger/logger.cpp#L283
Multiplying it by 4 makes the dropouts less frequent but sometimes the buffer still saturates. If I try to multiply it by 5 the logger won't start at all. Why there is a limit?
The stack size seems unaffected but I try to increase it to 5000 anyway, making no difference.

Expected behavior
I would expect the logging to resume after the buffer clears up, not just to stop completely.
Also maybe the way to increase the buffer correctly is not completely clear to me.

Log Files and Screenshots
Here the logging stopped even before takeoff. There is 10+ minutes of flight missing.
https://review.px4.io/plot_app?log=0fd05bfa-c092-49f5-8cf6-c89c3d950f51

This is the last flight, there should be another 5+ minutes tracked.
https://review.px4.io/plot_app?log=fdb9b50f-cc70-48d0-92d1-04215f50c64c

System
Pixhawk 4

@dagar
Copy link
Member

dagar commented Jul 28, 2019

Just to confirm, you're only seeing logger stop with the buffer increased and logging everything?

What's the closest upstream commit to my_branch00? Any changes other than the buffer size in that branch?

@dagar
Copy link
Member

dagar commented Jul 28, 2019

FYI @bkueng

@JMshare
Copy link
Author

JMshare commented Jul 28, 2019

Thanks for the reply,
to be more clear I see this when logging everything at high rates (the sdlog param set to 27), without increasing the buffer size. I tried to increase it later to see if it has anything to do with that, not sure about the answer yet. It probably helps but I can't make it big enough to avoid dropouts completely is what I was trying to say.

The closest commit was something after the 1.9.0 stable release https://github.com/PX4/Firmware/commit/bef7a9ba8e9382a1497cf35cfe2f9a4b5ff97a28.
There is number of changes including custom module, airframe, mixer and uORB messages.

This is a log when testing it at home. You can see the logger stops at 30th minute. The vehicle stopped responding to the mavlink shell commands (top, free, logger status) from the QGC so I disarmed. I suspect that if I didn't disarm it would just keep logging nothing indefinitely. It seems that fortunately other controls stay functional.
https://review.px4.io/plot_app?log=9d67fa57-54f2-49c5-a1d3-94798ad9850e

@bkueng
Copy link
Member

bkueng commented Jul 29, 2019

The vehicle stopped responding to the mavlink shell commands (top, free, logger status) from the QGC so I disarmed.

Did you still have a QGC connection?

https://review.px4.io/plot_app?log=9d67fa57-54f2-49c5-a1d3-94798ad9850e

This log contains an 38 second dropout at the end. Which SD card do you use, and can you try another one?

@JMshare
Copy link
Author

JMshare commented Jul 29, 2019

To make it more obvious, I reproduced the problem with the current master and the default quad_x airframe. Calibrate and set the sdlog_profile to 255, then arm and wait.

https://review.px4.io/plot_app?log=d289f799-ca81-4b81-8824-c5e723dd495f

I arm at 9:33 am and the logger starts. You can see that after around 26 minutes the logger stops logging data. I disarmed only at around 11:00 am so even though you see "only" 30 seconds dropout at the end in the log, it is in fact around an hour and it would go on indefinitely.

I am using the SD card that comes with Pixhawk 4, so SanDisk Edge 8GB. I know it's not a great SD card hence the frequent dropouts, nevertheless, unless the SD card is full I wouldn't expect the logger to just get jammed completely.

Btw I had a full communication with the QGC this time. The logger status informed me that the logger is active, but the "Wrote" line stayed at 173 MiB constant over the last hour and the last line gave all zeroes.

@bkueng
Copy link
Member

bkueng commented Aug 8, 2019

Thanks for the additional testing. I tried to reproduce (Pixhawk 4 with I think a stock SD) for several hours and several times, but without any issues. The logging rate was even higher than in your case (~200KB/s).

I suspect a faulty SD card and possibly incorrect handling in NuttX, since from what you observe it looks like the write call to the file never returns. Did you try with another SD card?

@JMshare
Copy link
Author

JMshare commented Aug 8, 2019

Very strange, it happens to me very consistently. Did you also get as many dropouts as me when you tested it?

I tried with the recommended SanDisk Extreme for 2 hours without issues.

The problem is that this happened to me on two independent devices (different Pixhawks 4 with stock SD cards) so I don't think the SD card is just faulty. Maybe I should try to format it first and try again?

Any way to prevent this completely by reseting the locked write call if it doesn't respond in a set time period?

@bkueng
Copy link
Member

bkueng commented Aug 10, 2019

I was able to reproduce with a brand new Pixhawk 4, and therefore find the problem. It's an interesting case: #12679.
I let it run for several hours without issues anymore. Maybe you can quickly cross-test?

@JMshare
Copy link
Author

JMshare commented Aug 10, 2019

Yes, that seems to fix it! Well done.
The log looks fine and I don't see any change in performance or memory usage in top or free.

@bkueng
Copy link
Member

bkueng commented Aug 12, 2019

Thanks for testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants