Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Global performance-optimization overhaul of micro-operations and rotation flow #291

Merged
merged 6 commits into from
Apr 6, 2024

Conversation

fako1024
Copy link
Collaborator

@fako1024 fako1024 commented Apr 4, 2024

Major steps undertaken:

Fundamental changes:

  • Implemented monomorphic functions / methods for IPv4 / IPv6 handling to allow process flow separation once instead of continuously having to if-then-else
  • Removed duplicate information from flows, now all hash information is carried by the key (value now basically simply is a types.Counters) and handed over as string (which is uses for compiler optimizations in map access anyway and can be copied to the hashmap during rotation a little faster)
  • Revamped global packet buffer (used during flow map rotation) to distinguish between IPv4 / IPv6 (and sped up Put / Get operations in the process), allowing it to carry more than twice the flows for the same allocated memory in standard scenarios (i.e. when IPv4 is dominant) and never less than before, even if all traffic were IPv6
  • Removed concept of MaybeRemains and MaybeReverts for direction detection (was a nice idea, but factually all rules determine the direction from the first observed packet, so the additional tracking / overhead is simply not required - but could be added later again should we ever come across more complex heuristics)
  • Added heuristic to "guess" if a packet is a "request" or "response" packet and based on that choose the most probable hash map lookup path (to minimize the number of cases where we have to check twice)

Micro-optimizations:

  • (Brute) Force inlined a few methods / functions to reduce call graph depth (but at the same time put some stuff into functions that can be inlined): Call graph now has a maximum depth of 2 beyond the capture loop itself, everything else is inlined.
  • Replaced common port logic by array-based lookup table to achieve constant-time lookups (makes readability & extension much easier, see below) that are almost as good as the best case scenario before, most of the times significantly better
  • Changed memory alignment of EPHash (both new IPv4 and IPv6 versions) to allow copying / transferring / reversing them with fewer operations (due to contiguous memory areas that can be copied in one go)
  • Avoid conversions for ports and operate directly on their least and most significant bytes to speed up operations

Misc:

  • Added 445/TCP and 8080/TCP to list of common (destination) ports that are pre-aggegated (based on their abundance in productive environments) - Note: This will be on top of the posted benchmarks here and OSAG internal (didn't want to skew the measurements - reducing the cardinality obviously makes stuff faster) !! 😉
  • Added Prometheus gauge tracking the relative usage of the global packet buffer (per interface) during rotation
  • Improved tests & benchmarks for coverage and meaningfulness
  • Support benchmarks from PCAP file data (will be used to fill non-draining memory buffer up to its capacity, then replay over and over)
  • Updated all upstream dependencies to address CVEs

Result summary:

  • Faster read-analyze-store cycle for each packet read from the wire (mostly due to micro-optimizations across the board), anything between few to almost 50% depending on scenario (see benchmarks in Improve flow map / rotation performance #284).
  • More buffer for your bytes 😉 - in real-life the buffer can now carry something like twice as many packets during rotation without the need to allocate / provide more memory (in addition to being able populate & drain the data more quickly)
  • Slightly faster rotation (mostly due to less data being processed slightly more efficiently, a combination of the above fundamental and micro-optimizations)
  • Practically (tested under heavy production load), CPU usage is reduce by around 20% and packet drops were reduced from several 100k (1h) to zero (of course that's just a single scenario, but it shows the potential)

Closes #284

@fako1024 fako1024 linked an issue Apr 4, 2024 that may be closed by this pull request
4 tasks
@fako1024 fako1024 self-assigned this Apr 4, 2024
@fako1024 fako1024 added feature New feature or request performance Performance / optimization related topics observability Telemetry sent from the tools labels Apr 4, 2024
@fako1024 fako1024 requested a review from els0r April 4, 2024 15:15
@fako1024 fako1024 marked this pull request as ready for review April 4, 2024 15:40
Copy link
Owner

@els0r els0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely love the PR / effort. Go for it.

Some nitpicks:

  • have a look at never-nesting in some of the functions (see comments)
  • constants for the used indices in EPHash

pkg/capture/buffer.go Outdated Show resolved Hide resolved
pkg/capture/capture.go Show resolved Hide resolved
pkg/capture/capture.go Show resolved Hide resolved
pkg/capture/capturetypes/classify.go Outdated Show resolved Hide resolved
pkg/capture/capturetypes/packet.go Show resolved Hide resolved
pkg/capture/flow.go Show resolved Hide resolved
pkg/capture/flow.go Outdated Show resolved Hide resolved
pkg/capture/metrics.go Show resolved Hide resolved
fako1024 and others added 2 commits April 5, 2024 16:29
Co-authored-by: Lennart Elsen <els0r@users.noreply.github.com>
@fako1024 fako1024 merged commit d435333 into main Apr 6, 2024
5 checks passed
@fako1024 fako1024 deleted the 284-improve-flow-map-rotation-performance branch April 6, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request observability Telemetry sent from the tools performance Performance / optimization related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve flow map / rotation performance
2 participants