Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mavlink send helpers refactor to prevent writing partial messages #12967

Merged
merged 2 commits into from
Apr 15, 2020

Conversation

dagar
Copy link
Member

@dagar dagar commented Sep 16, 2019

Possible quick work around for #12957.

This pull request rearranges the mavlink helpers slightly to shift the the buffer, should_transmit, error count updates into the single final mavlink mavlink_end_uart_send call rather than each mavlink_send_uart_bytes (called 3 times per mavlink message send).

This is accomplished by unifying some of the uart/udp logic differences. Each write (mavlink_send_uart_bytes) goes into a buffer, then that buffer is written out once (mavlink_end_uart_send). This should prevent broken mavlink messages from going out over the wire, and will also be slightly more efficient by minimizing work for each send.

TODO:

  • move the get_free_tx_buf() check to mavlink_start_uart_send, store the result and skip mavlink_send_uart_bytes if there's not enough buffer space left.

@dagar dagar requested review from jkflying, MaEtUgR and a team September 16, 2019 19:14
@dagar dagar added this to the Release v1.10.0 milestone Sep 16, 2019
@dagar dagar changed the title [WIP]: mavlink write hacks mavlink send helpers refactor to prevent writing partial messages Sep 17, 2019
@dagar dagar marked this pull request as ready for review September 17, 2019 00:02
@dagar dagar mentioned this pull request Sep 17, 2019
Copy link
Contributor

@julianoes julianoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please copy the PR description in to the commit? (Or squash it so it goes into the git history.)

src/modules/mavlink/mavlink_main.cpp Show resolved Hide resolved
src/modules/mavlink/mavlink_main.cpp Outdated Show resolved Hide resolved
src/modules/mavlink/mavlink_main.cpp Outdated Show resolved Hide resolved
src/modules/mavlink/mavlink_main.cpp Outdated Show resolved Hide resolved
src/modules/mavlink/mavlink_main.h Outdated Show resolved Hide resolved
@dagar
Copy link
Member Author

dagar commented Sep 17, 2019

For context this is the imposed mavlink library helper structure we're dancing around.

https://github.com/mavlink/c_library_v2/blob/ac40c0329e88b70ae5db4c1467ed5853d305af54/mavlink_helpers.h#L352-L359

	MAVLINK_START_UART_SEND(chan, header_len + 3 + (uint16_t)length + (uint16_t)signature_len);
	_mavlink_send_uart(chan, (const char *)buf, header_len+1);
	_mavlink_send_uart(chan, packet, length);
	_mavlink_send_uart(chan, (const char *)ck, 2);
	if (signature_len != 0) {
		_mavlink_send_uart(chan, (const char *)signature, signature_len);
	}
	MAVLINK_END_UART_SEND(chan, header_len + 3 + (uint16_t)length + (uint16_t)signature_len);

@dagar
Copy link
Member Author

dagar commented Sep 17, 2019

Can you also please copy the PR description in to the commit? (Or squash it so it goes into the git history.)

One of the reasons I like "Squash and merge" is we get one last chance to clean it up.

@dagar dagar requested a review from julianoes September 17, 2019 15:06
@jorge789
Copy link

jorge789 commented Sep 17, 2019

Tested on CUAV+ V5:
Modes Tested

Position Mode: Good.
Altitude Mode: Good.
Stabilized Mode: Good.
Mission Plan Mode (Automated): Good.
RTL: Good.

- Procedure
Arm and Take off in position mode, after flying for approximately one minute, switched to altitude then stabilized mode proceed to switch to mission plan mode then make sure that vehicle follows all waypoints as shown in QGC, once completed all waypoin activate RTL and see landing behaviour.

Notes:
No issues noted, good flight in general.

Log:
https://review.px4.io/plot_app?log=25cc3296-2428-41dc-a6a4-04e1c6251f7c

@dannyfpv
Copy link

dannyfpv commented Sep 17, 2019

Tested on Pixhawk 4 v5 f-450

Mode tested:
mission mode
altitude mode
no issues, smooth flight

log:
https://review.px4.io/plot_app?log=77ed5ecb-c1fc-4b66-be47-4007dc83640d

Tested on Pixhawk4 mini v5 f-450

Mode tested:
mission mode
altitude mode
no issues, smooth flight

log:
https://review.px4.io/plot_app?log=2f1378df-d18e-4de6-88f2-c45cab4ca07b

@dannyfpv
Copy link

@dagar we tried to test on PixRacer V4, no connection by the telemetry on QGC. We tried stable and it worked fine.

@dagar
Copy link
Member Author

dagar commented Sep 19, 2019

Looks like I screwed up the rebase when extracting the other PRs. Attempting to fix.

@dagar dagar force-pushed the pr-mavlink_write branch 2 times, most recently from ae262b5 to 7cbe9c6 Compare September 19, 2019 03:13
julianoes
julianoes previously approved these changes Sep 19, 2019
Copy link
Contributor

@julianoes julianoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct now.

Copy link
Member

@bkueng bkueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and will also be slightly more efficient by minimizing work for each send.

It is not more efficient since you have another memcpy. It also requires considerably more memory (I see almost 1KB on pixracer) due to the additional buffer.
Can we please find a better solution? I know the mavlink API is not great for that.

@jkflying
Copy link
Contributor

The only way I can think of getting around the extra memory is for each message, while still using the helper functions, is to allocate a buffer bigger than just the payload data, use the mavlink_msg_XXXXX_out_dynamic_send_buf helper instead, with an offset into the buffer, and then let it copy the data back onto itself. 🤢

@dagar
Copy link
Member Author

dagar commented Sep 19, 2019

and will also be slightly more efficient by minimizing work for each send.

It is not more efficient since you have another memcpy.

Yes, but overall it's still slightly faster, although not enough to get excited about.

master (top)

PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
167 mavlink_if0                  3900  6.137  1824/ 2532 100 (100)  w:sig  4
168 mavlink_rcv_if0               255  0.327  2424/ 3916 175 (175)  w:sem  4

PR (top)

PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
167 mavlink_if0                  2153  5.400  1816/ 2532 100 (100)  w:sig  4
168 mavlink_rcv_if0               178  0.409  2608/ 3916 175 (175)  w:sem  4

master (mavlink status)

instance #0:
        GCS heartbeat:  394441 us ago
        mavlink chan: #0
        type:           USB CDC
        flow control: OFF
        rates:
          tx: 23.249 kB/s
          txerr: 0.000 kB/s
          tx rate mult: 1.000
          tx rate max: 800000 B/s
          rx: 0.527 kB/s
        FTP enabled: YES, TX enabled: YES
        mode: Config
        MAVLink version: 2
        transport protocol: serial (/dev/ttyACM0 @2000000)
        ping statistics:
          last: 0.81 ms
          mean: 0.92 ms
          max: 103.86 ms
          min: 0.27 ms
          dropped packets: 0

PR (mavlink status)

instance #0:
        GCS heartbeat:  8932 us ago
        mavlink chan: #0
        type:           USB CDC
        flow control: OFF
        rates:
          tx: 23.367 kB/s
          txerr: 0.000 kB/s
          tx rate mult: 1.000
          tx rate max: 800000 B/s
          rx: 0.527 kB/s
        FTP enabled: YES, TX enabled: YES
        mode: Config
        MAVLink version: 2
        transport protocol: serial (/dev/ttyACM0 @2000000)
        ping statistics:
          last: 0.51 ms
          mean: 0.94 ms
          max: 27.88 ms
          min: 0.23 ms
          dropped packets: 0

@dagar
Copy link
Member Author

dagar commented Sep 19, 2019

It also requires considerably more memory (I see almost 1KB on pixracer) due to the additional buffer.
Can we please find a better solution? I know the mavlink API is not great for that.

Unless you have another idea I don't think the memory is that much of a concern. This buffer is 280 bytes (MAVLINK_MAX_PACKET_LEN) per mavlink instance. I'd still much rather fix this in mavlink, and I even considered pulling more of that code PX4 side. We can come back to this later if the mavlink api situation changes/improves.

This doesn't really excuse the waste, but we're still going to be somewhere around 5-10 K ahead of v1.9.2 on baseline configurations, and the cost of each additional module is lower. If we consider the system overall and the annoying constraints like the mavlink api I think I'd rather invest the time in other areas of the system where we can easily save quite a bit more than 1 KB (finish uORB updates, further reduce FDs, drop NuttX HPWORK/LPWORK, move more tasks to WQs, minimize logger subscriptions array, etc).

@dagar dagar force-pushed the pr-mavlink_write branch from f700f81 to 6484ec8 Compare April 14, 2020 14:53
@stale stale bot removed the stale label Apr 14, 2020
@dagar dagar changed the title [WIP]: mavlink send helpers refactor to prevent writing partial messages mavlink send helpers refactor to prevent writing partial messages Apr 14, 2020
@PX4 PX4 deleted a comment from stale bot Apr 14, 2020
@julianoes
Copy link
Contributor

It turns out that this PR can prevent PARAM_VALUE messages from getting dropped when the ground station requests them.

I tested this using just FTDI without flow control connected to Telem1 using a small tester based on MAVSDK: mavlink/MAVSDK#1038

For the results, see: https://docs.google.com/spreadsheets/d/1aAT58x1zVd6_5bL9NjCOnc87Eoy4Aa3QMd-cmwZXI8A/edit?usp=sharing

@dagar dagar requested a review from julianoes April 14, 2020 15:41
jkflying
jkflying previously approved these changes Apr 14, 2020
@jkflying
Copy link
Contributor

@julianoes very cool test you did there.

Just out of curiosity, does anyone know why the loading time doesn't decrease further moving from 115200 to 460800? Is there some kind of rate limiter?

@dagar
Copy link
Member Author

dagar commented Apr 14, 2020

Just out of curiosity, does anyone know why the loading time doesn't decrease further moving from 115200 to 460800? Is there some kind of rate limiter?

I had the same question. I'm guessing it's the rate limiting here.

https://github.com/PX4/Firmware/blob/e2b8fd7b118a232d5ffa406041acb58cde1c2543/src/modules/mavlink/mavlink_receiver.cpp#L2916

Plenty of room for future optimization...

@julianoes
Copy link
Contributor

julianoes commented Apr 15, 2020

@bkueng some measurements regarding CPU usage, on Pixhawk/fmu-v2:

master : baudrate:  57600, rate: 5000, mode: 0: 4.5%, 0.37%
master : baudrate:  57600, rate: 5000, mode: 2: 6.3%, 0.37%
master : baudrate:  57600, rate: 5000, mode: 8: 6.8%, 0.37%

this PR: baudrate:  57600, rate: 5000, mode: 0: 4.6%, 0.37%
this PR: baudrate:  57600, rate: 5000, mode: 2: 6.1%, 0.37%
this PR: baudrate:  57600, rate: 5000, mode: 8: 6.3%, 0.37%

master : baudrate: 460800, rate: 40000, mode: 0: 4.5%, 0.37%
master : baudrate: 460800, rate: 40000, mode: 2: 6.6%, 0.37%
master : baudrate: 460800, rate: 40000, mode: 8: 8.2%, 0.37%

this PR: baudrate: 460800, rate: 40000, mode: 0: 4.5%, 0.37%
this PR: baudrate: 460800, rate: 40000, mode: 2: 6.3%, 0.37%
this PR: baudrate: 460800, rate: 40000, mode: 8: 7.5%, 0.37%

It actually looks like the PR improves CPU...

Edit: and it's believable because we do 3x memcpy and 1x write instead of 3x write.

Copy link
Contributor

@julianoes julianoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to merge this.

@bkueng
Copy link
Member

bkueng commented Apr 15, 2020

@bkueng some measurements regarding CPU usage, on Pixhawk/fmu-v2:

It's the additional memory usage I'm concerned with.

@LorenzMeier
Copy link
Member

@bkueng We can recuperate this with smaller UART TX buffers on some boards. I think we have the ~1K free RAM and if not this can be squeezed out where needed. This is a pretty significant concern overall and a constant issue for a long time.

@LorenzMeier LorenzMeier merged commit b761060 into master Apr 15, 2020
@LorenzMeier LorenzMeier deleted the pr-mavlink_write branch April 15, 2020 14:22
@LorenzMeier
Copy link
Member

LorenzMeier commented Apr 15, 2020

@dagar I reviewed the diff and I saw the last commit message. I haven't reviewed the first commit message up until now, not expecting anything other than the usual Daniel commit messages. I'm on the fence if they should be reviewed or if we rather would like to just make sure that individual commit messages for PRs that are not flagged as WIP or draft should always be clean in wording. I personally don't think that reviewing commit messages adds value and it should be on the proposer to make sure those are in good shape when a PR is not marked as WIP.

Given that I've written a good share of the original code I could tell that your changes were sound.

@dagar
Copy link
Member Author

dagar commented Apr 15, 2020

@dagar I reviewed the diff and I saw the last commit message. I haven't reviewed the first commit message up until now, not expecting anything other than the usual Daniel commit messages. I'm on the fence if they should be reviewed or if we rather would like to just make sure that individual commit messages for PRs that are not flagged as WIP or draft should always be clean in wording.

My preference is to squash & merge by default unless the pull request is a sprawling change that touches many parts of the system and each incremental change was actually tested. That way we can keep the ugly history ("how the sausage was made") including review and test history and follow up changes. Then with squash & merge we get one last chance to verify the commit message actually reflects what the overall change evolved into and decide if it should reference the PR # or has enough information on its own.

@julianoes
Copy link
Contributor

@bkueng some measurements regarding CPU usage, on Pixhawk/fmu-v2:

It's the additional memory usage I'm concerned with.

Oops, I remembered that wrongly. But I do agree with the fact that this optimization potentially allows to make the tx buffers smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants