Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for extended clientside aggregation #1048

Merged
merged 10 commits into from
Feb 27, 2023
Merged

Conversation

praboud-stripe
Copy link
Contributor

Summary

This adds Veneur serverside support for statsd clients using extended clientside aggregation. This allows clients to pack multiple values into the same metrics packet - see the example from the docs:

Without extended clientside aggregation:

my_distribution_metric:21|d|#all,my,tags
my_distribution_metric:43.2|d|#all,my,tags
my_distribution_metric:1657|d|#all,my,tags

With extended clientside aggregation:

my_distribution_metric:21:43.2:1657|d|#all,my,tags

NB: clients still need to enable this manually; the Veneur server does not eg: negotiate a protocol version its clients to enable this, or anything fancy like that. This PR just adds support for parsing the packets produced by clients using option.

There are three main changes made by this PR:

  1. We're changing the interface of ParseMetric to take a func(*UDPMetric) callback called multiple times, instead of returning a single *UDPMetric value. Why do it this way? The other alternatives I considered were to either return a slice, or to publish the results back to a channel. Returning a slice requires allocating a slice - I'd rather avoid the extra GC churn if we can, since the caller would just immediately discard the slice after iterating over it. Publishing to a channel is tempting, but also has some problems. The caller is just publishing to a channel anyway, so it seems like this should work - but we determine which channel to publish to based on the Digest of the metric. Internalizing that routing logic into the parsing code seems like a layering violation. We could publish to an intermediate channel, but this requires an extra goroutine to shunt the metrics around. Overall, a callback is definitely a bit less ergonomic than the current pattern of directly returning a *UDPMetric, but it seems like the best option available (and we can add some test helpers to improve the ergonomics in our tests).
  2. The ParseMetric code needs to be re-ordered a bit - instead of parsing the value in essentially the order it which it arrives in the packet, now that there are multiple values, it's more convenient to parse them last, immediately before the metric is published via the callback function. This has changed the exact error message produced by some of the tests checking for invalid packets. This doesn't really matter, though, for two reasons: a) the error is just logged internally - there's no API contract around which error is returned and b) since this is a case where there's more than one thing wrong with the same packet, and it's just a question of which error we hit first, both errors are equally correct.
  3. I've also chosen to convert the ParseMetric code to call bytes.IndexByte directly, rather than using the SplitBytes helper. This isn't strictly required for the rest of this change, but it's faster. Without removing SplitBytes, there's a slight performance regression in the parsing code from adding the new multi-value logic. Net of the the SplitBytes change, this PR improves the benchmark performance of that code. I'm not that attached to this part of the PR, and can unwind it if others are opposed to this part, but I wanted to "make room" from a performance standpoint for the rest of these changes. I did do some profiling of SplitBytes itself to see if there's anything obvious we can do to improve the performance of the helper, but didn't have much luck here.

Motivation

Extended clientside aggregation avoids retransmitting the name & tag of the same metric reported multiple times; this can significantly increase the serialization performance on both the client and server, and reduces network traffic. I'd like to experiment with rolling out extended clientside aggregation to some services within Stripe, and this is the first step in doing so.

Test plan

I've added automated tests covering the new multi-value code. I've also stood up a local copy of the Veneur server & client, and verified that metrics published using extended clientside aggregation are correctly ingested by the server.

@praboud-stripe
Copy link
Contributor Author

cc @bobby-stripe

}
nameChunk := pipeSplitter.Chunk()[:startingColon]
valueChunk := pipeSplitter.Chunk()[startingColon+1:]
nameChunk := packet[:valueStart]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably use bytes.Cut here to simplify the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a good suggestion. I tried implementing this locally. It does make the code a fair bit easier to read, but it benchmarks worse than the baseline implementation on master. I'm leaning towards sticking with the bytes.IndexByte implementation.

}
typeChunk := pipeSplitter.Chunk()
typeChunk := packet[typeStart+1 : tagsStart]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again bytes.Cut might be cleaner than bytes.IndexByte for this logic (but maybe not! if you don't think so please ignore)

case 'd', 'h': // consider DogStatsD's "distribution" to be a histogram
ret.Type = "histogram"
metric.Type = "histogram"
case 'm': // We can ignore the s in "ms"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

@praboud-stripe praboud-stripe merged commit b027d67 into master Feb 27, 2023
@praboud-stripe praboud-stripe deleted the praboud-multi branch February 27, 2023 19:36
Oscil8 pushed a commit to draios/veneur that referenced this pull request May 22, 2023
* Report allocs

* Add support for multiple values in metric packet

* Fix error message

* Don't use SplitBytes

* CHANGELOG.md entry

* gofmt

* Rerun tests

* Rerun tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants