-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[awsemfexporter] Group exported metrics by labels #1891
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1891 +/- ##
==========================================
+ Coverage 89.83% 89.92% +0.09%
==========================================
Files 378 380 +2
Lines 18213 18344 +131
==========================================
+ Hits 16361 16496 +135
+ Misses 1388 1386 -2
+ Partials 464 462 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@kohrapha Thanks for the change! It sounds nice but is pretty huge - is it possible to split into
? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clean code. Pls address the comments, otherwise everything LGTM!
I'm removing myself as assignee for this issue, as I'm unavailable until Jan 11. |
Great work and exciting functionality, @kohrapha! The code looks good. LGTM! |
Hi @anuraaga, @bogdandrutu, @kohrapha has done his internship on this project. I have spent a good amount of time to review this PR and I see @gramidt also help on the review. Could we get an approval from you to merge the code? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxiamxia Sorry for the late review, though it's because of the size of the PR. I understand the tension between final PRs and intern deadlines, but since we can generally expect feedback to be required anyways, I don't think it's really a reason to lump up multiple changes into a single huge PR.
I did just a quick skim so far and found a few issues. @mxiamxia will you be taking ownership of this PR, ideally splitting it up?
type DataPoint struct { | ||
Value interface{} | ||
Labels map[string]string | ||
Timestamp int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not uint64
or even Time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not Time
since it's millis. Please name the field TimestampMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this and agree with @anuraaga .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll work on it and do the PR split.
type rateCalculationMetadata struct { | ||
needsCalculateRate bool | ||
rateKeyParams map[string]string | ||
timestamp int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for all timestamps
func (dps IntDataPointSlice) At(i int) DataPoint { | ||
metric := dps.IntDataPointSlice.At(i) | ||
labels := createLabels(metric.LabelsMap(), dps.instrumentationLibraryName) | ||
timestamp := unixNanoToMilliseconds(metric.Timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp := unixNanoToMilliseconds(metric.Timestamp()) | |
timestampMS := unixNanoToMilliseconds(metric.Timestamp()) |
} | ||
|
||
// createMetricKey generates a hashed key from metric labels and additional parameters | ||
func createMetricKey(labels map[string]string, parameters map[string]string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use label.Distinct
as we did in the statsd receiver.
return | ||
} | ||
|
||
rateKeyParams := map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this can be a struct instead of a map. There should just be one struct containing all of these fields along with label.Distinct
@anuraaga since you already reviewed this I am assigning the PR to you so that you can facilitate. Thanks. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@mxiamxia Can we close this PR for now until we get time for it? |
Yes, please close this one and I'll split this PR to smaller ones next week. |
Closing per @mxiamxia request |
* Removed groupbytraceprocessor Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de> * Removed link to the groupbytrace processor Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
We sent a large PR #1891 to support batching the metrics on the same dimensions for AWS EMF Log request to save the customers' billing cost and request throughput. At the same time, there was a fairly large code refactor on EMFExporter. For better code review purpose, I plan to split #1891 to 2 PRs. (This is PR#1) In this PR, We refactored EMFExporter without introducing any new feature. For each OTel metrics data point, we defined `DataPoint` file, it wraps `pdata.DataPointSlice` to the custom structures for each type of metric data point. we also moved the metric data handling functions - data conversion and rate calculation to `datapoint`. It also fixed the metric `timestamp` bug.
We sent a large PR open-telemetry#1891 to support batching the metrics on the same dimensions for AWS EMF Log request to save the customers' billing cost and request throughput. At the same time, there was a fairly large code refactor on EMFExporter. For better code review purpose, I plan to split open-telemetry#1891 to 2 PRs. (This is PR#1) In this PR, We refactored EMFExporter without introducing any new feature. For each OTel metrics data point, we defined `DataPoint` file, it wraps `pdata.DataPointSlice` to the custom structures for each type of metric data point. we also moved the metric data handling functions - data conversion and rate calculation to `datapoint`. It also fixed the metric `timestamp` bug.
* Add semantic convention generator Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Update semantic conventions from generator Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Use existing internal/tools module Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Fix lint issues, more initialisms Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Update changelog Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * semconvgen: Faas->FaaS Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Fix a few more key names with replacements * Update replacements from PR feedback Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * rename commonInitialisms to capitalizations, move some capitalizations there Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Regenerate semantic conventions with updated capitalizations and replacements Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Generate semantic conventions from spec v1.3.0 Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Cleanup semconv generator util a bit Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * No need to put internal tooling additions in the CHANGELOG Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Fix HTTP semconv tests Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Add semconv generation notes to RELEASING.md Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Description
Currently, each incoming metric is pushed to CloudWatch logs as a separate log. However, many metrics share the same labels so this results in a lot of duplicate data. To solve this, this PR implements batching of metrics by their labels such that metrics with the same set of labels will be exported together.
Additionally, this PR fixes a long-standing bug where the incoming metric's timestamp isn't used for the exported metric. Now, we use the incoming metric's timestamp and default to the current timestamp if it is not available.
Specifically, metrics are batched together if they have the same:
The batched metrics are further split up if
metric_declarations
are defined. Currently, the filtered metrics are split up by the metric declaration rules they match. Since they have the same labels, they will have the same dimensions if they match the same metric declaration rules.Caveat: 2 groups of filtered metrics can still share the same dimension sets if their metric declarations result in the same dimension set. We currently don't perform this check to group the 2 groups together.
Implementation Details
Since this PR includes a lot of refactoring, I will give an overview of how the new metric translation logic works. Given a list of
ResourceMetrics
viaemfExporter.pushMetricsData
,ResourceMetrics
in the list, we will add its metrics intogroupedMetrics
(a map consisting of batched metrics).ResourceMetrics
, we create aCWMetricMetadata
which consists of metadata (i.e. namespace, timestamp, log group, log stream, instrumentation library name) associated with the given metric. This will be added togroupedMetrics
for future processing.DataPoints
from each metric. For eachDataPoint
, we define its "group key" using its labels, namespace, timestamp, log group, and log stream. We use this group key to add the metric to its corresponding group ingroupedMetrics
.groupedMetrics
, we iterate through each group and translate it intoCWMetric
. In this stage, we will filter out metrics if there are metric declarations defined and set the dimensions for exported metrics (w/ rolled-up dimensions).CWMetric
into an EMF log and push it to CloudWatch using the appropriate log group and log stream found in the group'sCWMetricMetadata
.Testing:
Tests were added for new functions and tests for modified functions were updated. Additionally, this PR was tested in a sample environment using an NGINX server on EKS. Given the following config (same as in #2):
we get the following cases: