Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

px_uploader.py: write timeout workaround #11719

Merged
merged 1 commit into from
Mar 27, 2019
Merged

px_uploader.py: write timeout workaround #11719

merged 1 commit into from
Mar 27, 2019

Conversation

julianoes
Copy link
Contributor

@julianoes julianoes commented Mar 26, 2019

This is a workaround for the write timeout that we have seen for some
host computers trying to flash the firmware.

We don't know the root cause of the problem but we do observed the
following:

  • For blocking writes with timeout (Pyserial write_timeout=0.5):
    write() throws SerialTimeoutException. In systrace we see that the
    select() call after write waiting for the write to be finished hangs
    and finally times out.
  • For blocking writes without timeout (Pyserial write_timeout=None):
    write() hangs indefinitely. In systrace we see that the
    select() call after write waiting for the write to be finished hangs.
  • For non-blocking writes (Pyserial write_timeout=0):
    write() works but flush() hangs. In systrace we see that
    ioctl(fd, TCSBRK, 1) which is (correctly) triggered by termios tcdrain
    hangs.

Inspecting USB traffic using usbmon, we can see that the data which is
written actually seems to be sent and looking at responses from the
Pixhawk bootloader and the timings it looks like all the data has
arrived.

This workaround uses non-blocking writes without flushing and this
seemed to prevent the issue from happening so far.

Debugging was done in collaboration with @bkueng and @davids5.

Fixes #11704.

This is a workaround for the write timeout that we have seen for some
host computers trying to flash the firmware.

We don't know the root cause of the problem but we do observed the
following:

- For blocking writes with timeout (Pyserial write_timeout=0.5):
  write() throws SerialTimeoutException. In systrace we see that the
  select() call after write waiting for the write to be finished hangs
  and finally times out.
- For blocking writes without timeout (Pyserial write_timeout=None):
  write() hangs indefinitely. In systrace we see that the
  select() call after write waiting for the write to be finished hangs.
- For non-blocking writes:
  write() works but flush() hangs. In systrace we see that
  ioctl(fd, TCSBRK, 1) which is (correctly) triggered by termios tcdrain
  hangs.

Inspecting USB traffic using usbmon, we can see that the data which is
written actually seems to be sent and looking at responses from the
Pixhawk bootloader and the timings it looks like all the data has
arrived.

This workaround uses non-blocking writes without flushing and this
seemed to prevent the issue from happening so far.

Debugging was done in collaboration with Beat Küng and David Sidrane.
@davids5 davids5 self-requested a review March 26, 2019 14:33
Copy link
Member

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: Thank you!

@julianoes
Copy link
Contributor Author

julianoes commented Mar 26, 2019

@MaEtUgR could you test this on Windows please? Thanks.

@davids5
Copy link
Member

davids5 commented Mar 26, 2019

I also want to double-check this on high-speed serial under Linux and a few other targets

@julianoes
Copy link
Contributor Author

Tested on macOS although I could not reproduce the issue there.

@davids5 davids5 self-requested a review March 26, 2019 16:02
@davids5
Copy link
Member

davids5 commented Mar 26, 2019

@julianoes - I tested on VM and NUC with master and could not replicate this bug. Then I tested pr on FMUv5 @ 2Mpbs on the NUC and it is working fine. (PR did not break hi speed serial window mode is working)

ound board id: 50,0 bootloader version: 5 on /dev/serial/by-id/usb-FTDI_TTL232R-3V3_FTHFFGO7-if00-port0
sn: 003400323137510136353937
chip: 10016451
family: STM32F7[6|7]x
revision: Z
flash: 2064384 bytes
Windowed mode: True

Erase  : [====================] 100.0%
Program: [====================] 100.0%
Verify : [====================] 100.0%
Rebooting. Elapsed Time 26.863

Copy link
Member

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julianoes - good to go from my testing. NUC Centos 7 @2 Mbps

@julianoes julianoes merged commit be8ad46 into master Mar 27, 2019
@julianoes julianoes deleted the pr-fix-flashing branch March 27, 2019 13:53
@MaEtUgR
Copy link
Member

MaEtUgR commented Mar 27, 2019

After this was mentioned in the dev call I tested on Windows Cygwin two times each with Pixracer and Pixhawk 4 all fine:

MaEtUgR@Speedy ~/Firmware
$ make px4_fmu-v4_default upload
[1164/1165] uploading px4
Loaded firmware for board id: 11,0 size: 1610916 bytes (77.42%), waiting for the bootloader...

non-standard baudrates are not supported on this platform -> could not check for FTDI device, assuming USB connection
Attempting reboot on /dev/ttyS0 with baudrate=57600...
If the board does not respond, check the connection to the Flight Controller
non-standard baudrates are not supported on this platform -> could not check for FTDI device, assuming USB connection
non-standard baudrates are not supported on this platform -> could not check for FTDI device, assuming USB connection

Found board id: 11,0 bootloader version: 5 on /dev/ttyS5
sn: 003400223236510836363331
chip: 20016419
family: STM32F42x
revision: 3
flash: 2080768 bytes
Windowed mode: False

Erase  : [====================] 100.0%
Program: [====================] 100.0%
Verify : [====================] 100.0%
Rebooting. Elapsed Time 39.719


MaEtUgR@Speedy ~/Firmware
$ make px4_fmu-v5_default upload
[1150/1151] uploading px4
Loaded firmware for board id: 50,0 size: 1638488 bytes (79.37%), waiting for the bootloader...

non-standard baudrates are not supported on this platform -> could not check for FTDI device, assuming USB connection
Attempting reboot on /dev/ttyS0 with baudrate=57600...
If the board does not respond, check the connection to the Flight Controller
non-standard baudrates are not supported on this platform -> could not check for FTDI device, assuming USB connection
non-standard baudrates are not supported on this platform -> could not check for FTDI device, assuming USB connection

Found board id: 50,0 bootloader version: 5 on /dev/ttyS2
sn: 0038001b3137510636353937
chip: 10016451
family: STM32F7[6|7]x
revision: Z
flash: 2064384 bytes
Windowed mode: False

Erase  : [====================] 100.0%
Program: [====================] 100.0%
Verify : [====================] 100.0%
Rebooting. Elapsed Time 40.499

@julianoes
Copy link
Contributor Author

Phew, great thanks @MaEtUgR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants