-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
px_uploader.py: write timeout workaround #11719
Conversation
This is a workaround for the write timeout that we have seen for some host computers trying to flash the firmware. We don't know the root cause of the problem but we do observed the following: - For blocking writes with timeout (Pyserial write_timeout=0.5): write() throws SerialTimeoutException. In systrace we see that the select() call after write waiting for the write to be finished hangs and finally times out. - For blocking writes without timeout (Pyserial write_timeout=None): write() hangs indefinitely. In systrace we see that the select() call after write waiting for the write to be finished hangs. - For non-blocking writes: write() works but flush() hangs. In systrace we see that ioctl(fd, TCSBRK, 1) which is (correctly) triggered by termios tcdrain hangs. Inspecting USB traffic using usbmon, we can see that the data which is written actually seems to be sent and looking at responses from the Pixhawk bootloader and the timings it looks like all the data has arrived. This workaround uses non-blocking writes without flushing and this seemed to prevent the issue from happening so far. Debugging was done in collaboration with Beat Küng and David Sidrane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM: Thank you!
@MaEtUgR could you test this on Windows please? Thanks. |
I also want to double-check this on high-speed serial under Linux and a few other targets |
Tested on macOS although I could not reproduce the issue there. |
@julianoes - I tested on VM and NUC with master and could not replicate this bug. Then I tested pr on FMUv5 @ 2Mpbs on the NUC and it is working fine. (PR did not break hi speed serial window mode is working)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julianoes - good to go from my testing. NUC Centos 7 @2 Mbps
After this was mentioned in the dev call I tested on Windows Cygwin two times each with Pixracer and Pixhawk 4 all fine:
|
Phew, great thanks @MaEtUgR! |
This is a workaround for the write timeout that we have seen for some
host computers trying to flash the firmware.
We don't know the root cause of the problem but we do observed the
following:
write_timeout=0.5
):write()
throwsSerialTimeoutException
. In systrace we see that theselect()
call after write waiting for the write to be finished hangsand finally times out.
write_timeout=None
):write()
hangs indefinitely. In systrace we see that theselect()
call after write waiting for the write to be finished hangs.write_timeout=0
):write()
works but flush() hangs. In systrace we see thatioctl(fd, TCSBRK, 1)
which is (correctly) triggered by termios tcdrainhangs.
Inspecting USB traffic using usbmon, we can see that the data which is
written actually seems to be sent and looking at responses from the
Pixhawk bootloader and the timings it looks like all the data has
arrived.
This workaround uses non-blocking writes without flushing and this
seemed to prevent the issue from happening so far.
Debugging was done in collaboration with @bkueng and @davids5.
Fixes #11704.