Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(transport): remove Server::timers #1784

Merged
merged 4 commits into from
Apr 4, 2024
Merged

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Apr 1, 2024

The current neqo_transport::server::Server::timers has a large performance overhead, especially when serving small amount of connections. See #1780 for details.

This commit optimizes for the small-number-of-connections case, keeping a single callback timestamp only, iterating each connection when there is no other work to be done.


Draft for now.

I need more context to know what to optimize for.

Do I understand correctly that the main goal of neqo-server is to be able to test neqo-client and thus neqo as it is used in Firefox? If so, isn't the 16384 slot timer wheel overkill and a simple std::collections::BTreeMap (or the likes) suffices?

See #1780 (comment).

Opening up early to get some performance stats from the CI benchmark machine.

@mxinden mxinden force-pushed the no-timer branch 3 times, most recently from 6c768e7 to cb62ff3 Compare April 1, 2024 19:13
@larseggert
Copy link
Collaborator

I'd be OK to rip this out and limit our demo server to only being useful with a small number of active connections.

Copy link

codecov bot commented Apr 2, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 93.05%. Comparing base (3151adc) to head (1a3b664).
Report is 5 commits behind head on main.

Files Patch % Lines
neqo-transport/src/server.rs 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1784      +/-   ##
==========================================
- Coverage   93.05%   93.05%   -0.01%     
==========================================
  Files         117      116       -1     
  Lines       36368    36101     -267     
==========================================
- Hits        33843    33592     -251     
+ Misses       2525     2509      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Apr 2, 2024

Benchmark results

Performance differences relative to 0751429.

  • coalesce_acked_from_zero 1+1 entries
    time: [191.25 ns 191.73 ns 192.24 ns]
    change: [-2.7224% -2.3791% -2.0519%] (p = 0.00 < 0.05)
    💚 Performance has improved.

  • coalesce_acked_from_zero 3+1 entries
    time: [233.36 ns 233.94 ns 234.55 ns]
    change: [-1.4181% -0.8920% -0.4809%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 10+1 entries
    time: [233.01 ns 233.59 ns 234.33 ns]
    change: [-0.7481% -0.3848% +0.0508%] (p = 0.04 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 1000+1 entries
    time: [215.52 ns 220.98 ns 233.68 ns]
    change: [-1.1436% +2.4308% +8.7684%] (p = 0.63 > 0.05)
    No change in performance detected.

  • RxStreamOrderer::inbound_frame()
    time: [119.37 ms 119.44 ms 119.52 ms]
    change: [+0.5853% +0.6760% +0.7710%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • transfer/Run multiple transfers with varying seeds
    time: [119.08 ms 119.36 ms 119.64 ms]
    thrpt: [33.433 MiB/s 33.512 MiB/s 33.592 MiB/s]
    change:
    time: [+1.0043% +1.3398% +1.6760%] (p = 0.00 < 0.05)
    thrpt: [-1.6484% -1.3220% -0.9944%]
    Change within noise threshold.

  • transfer/Run multiple transfers with the same seed
    time: [120.06 ms 120.22 ms 120.38 ms]
    thrpt: [33.228 MiB/s 33.272 MiB/s 33.316 MiB/s]
    change:
    time: [+0.9817% +1.1849% +1.3779%] (p = 0.00 < 0.05)
    thrpt: [-1.3592% -1.1711% -0.9722%]
    Change within noise threshold.

  • 1-conn/1-100mb-resp (aka. Download)/client
    time: [1.0788 s 1.1060 s 1.1422 s]
    thrpt: [87.552 MiB/s 90.415 MiB/s 92.696 MiB/s]
    change:
    time: [-3.7705% -1.2259% +1.7163%] (p = 0.47 > 0.05)
    thrpt: [-1.6873% +1.2411% +3.9182%]
    No change in performance detected.

  • 1-conn/10_000-1b-seq-resp (aka. RPS)/client
    time: [383.00 ms 385.60 ms 388.20 ms]
    thrpt: [25.760 Kelem/s 25.933 Kelem/s 26.110 Kelem/s]
    change:
    time: [-10.449% -9.7346% -8.9766%] (p = 0.00 < 0.05)
    thrpt: [+9.8619% +10.784% +11.668%]
    💚 Performance has improved.

  • 100-seq-conn/1-1b-resp (aka. HPS)/client
    time: [3.3786 s 3.3819 s 3.3851 s]
    thrpt: [29.541 elem/s 29.569 elem/s 29.598 elem/s]
    change:
    time: [+0.5547% +0.6769% +0.7943%] (p = 0.00 < 0.05)
    thrpt: [-0.7881% -0.6723% -0.5516%]
    Change within noise threshold.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 622.3 ± 218.5 383.7 1004.2 1.00
neqo msquic reno on 2021.4 ± 90.6 1931.5 2226.5 1.00
neqo msquic reno 2147.0 ± 295.2 1897.0 2653.3 1.00
neqo msquic cubic on 2043.0 ± 165.6 1894.8 2437.6 1.00
neqo msquic cubic 1992.5 ± 185.8 1814.9 2463.3 1.00
msquic neqo reno on 3508.2 ± 286.1 3231.9 3858.6 1.00
msquic neqo reno 3260.7 ± 162.1 3177.2 3718.7 1.00
msquic neqo cubic on 3251.6 ± 179.6 3099.5 3734.9 1.00
msquic neqo cubic 3295.6 ± 234.9 3160.7 3743.7 1.00
neqo neqo reno on 3014.2 ± 256.5 2805.7 3442.4 1.00
neqo neqo reno 3100.0 ± 360.2 2723.3 3816.3 1.00
neqo neqo cubic on 3238.8 ± 268.9 2979.5 3664.5 1.00
neqo neqo cubic 3045.4 ± 73.8 2983.5 3241.2 1.00

⬇️ Download logs

The current `neqo_transport::server::Server::timers` has a large performance
overhead, especially when serving small amount of connections. See
mozilla#1780 for details.

This commit optimizes for the small-number-of-connections case, keeping a single
callback timestamp only, iterating each connection when there is no other work
to be done.
@mxinden
Copy link
Collaborator Author

mxinden commented Apr 2, 2024

The Server::process* functions now add little to no additional CPU time (Self Time). See e.g. latest Benchmark Run above, uploaded to profiler.firefox.com (neqo-neqo-cubic-nopacing.server.perf.fx).

https://share.firefox.dev/3xkTbT8

@mxinden mxinden marked this pull request as ready for review April 2, 2024 12:45
neqo-transport/src/server.rs Outdated Show resolved Hide resolved
neqo-transport/src/server.rs Outdated Show resolved Hide resolved
neqo-transport/src/server.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Collaborator Author

mxinden commented Apr 3, 2024

Thank you for the review @martinthomson. Can you take another look?

@martinthomson martinthomson added this pull request to the merge queue Apr 4, 2024
Merged via the queue into mozilla:main with commit 61fcd28 Apr 4, 2024
14 of 15 checks passed
KershawChang added a commit to KershawChang/neqo that referenced this pull request Apr 8, 2024
@larseggert larseggert mentioned this pull request Apr 9, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 9, 2024
mxinden added a commit to mxinden/neqo that referenced this pull request May 13, 2024
…lla#1800)

This reverts commit 342e4e7.

With mozilla#1878 merged and
https://bugzilla.mozilla.org/show_bug.cgi?id=1895319 available, one can now
reapply the patch removing `Server::timers`.

More specifically, the actual bug fix on mozilla-central side:

``` rust
let output = if self.response_to_send.is_empty() {
    output
} else {
    // In case there are pending responses to send, make sure a reasonable
    // callback is returned.
    const MIN_INTERVAL: Duration = Duration::from_millis(100);

    match output {
        Output::None => Output::Callback(MIN_INTERVAL),
        o @ Output::Datagram(_) => o,
        Output::Callback(d) => Output::Callback(min(d, MIN_INTERVAL)),
    }
};
```

See https://phabricator.services.mozilla.com/D209574.
github-merge-queue bot pushed a commit that referenced this pull request May 15, 2024
This reverts commit 342e4e7.

With #1878 merged and
https://bugzilla.mozilla.org/show_bug.cgi?id=1895319 available, one can now
reapply the patch removing `Server::timers`.

More specifically, the actual bug fix on mozilla-central side:

``` rust
let output = if self.response_to_send.is_empty() {
    output
} else {
    // In case there are pending responses to send, make sure a reasonable
    // callback is returned.
    const MIN_INTERVAL: Duration = Duration::from_millis(100);

    match output {
        Output::None => Output::Callback(MIN_INTERVAL),
        o @ Output::Datagram(_) => o,
        Output::Callback(d) => Output::Callback(min(d, MIN_INTERVAL)),
    }
};
```

See https://phabricator.services.mozilla.com/D209574.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants