[awsemfexporter] Group exported metrics by labels #1891

kohrapha · 2020-12-22T19:24:50Z

Description

Currently, each incoming metric is pushed to CloudWatch logs as a separate log. However, many metrics share the same labels so this results in a lot of duplicate data. To solve this, this PR implements batching of metrics by their labels such that metrics with the same set of labels will be exported together.

Additionally, this PR fixes a long-standing bug where the incoming metric's timestamp isn't used for the exported metric. Now, we use the incoming metric's timestamp and default to the current timestamp if it is not available.

Specifically, metrics are batched together if they have the same:

label names + values
namespace
timestamp
log group name
log stream name

The batched metrics are further split up if metric_declarations are defined. Currently, the filtered metrics are split up by the metric declaration rules they match. Since they have the same labels, they will have the same dimensions if they match the same metric declaration rules.
Caveat: 2 groups of filtered metrics can still share the same dimension sets if their metric declarations result in the same dimension set. We currently don't perform this check to group the 2 groups together.

Implementation Details

Since this PR includes a lot of refactoring, I will give an overview of how the new metric translation logic works. Given a list of ResourceMetrics via emfExporter.pushMetricsData,

For each ResourceMetrics in the list, we will add its metrics into groupedMetrics (a map consisting of batched metrics).
For each metric within the ResourceMetrics, we create a CWMetricMetadata which consists of metadata (i.e. namespace, timestamp, log group, log stream, instrumentation library name) associated with the given metric. This will be added to groupedMetrics for future processing.
We extract the DataPoints from each metric. For each DataPoint, we define its "group key" using its labels, namespace, timestamp, log group, and log stream. We use this group key to add the metric to its corresponding group in groupedMetrics.
After translating all OT Metrics into groupedMetrics, we iterate through each group and translate it into CWMetric. In this stage, we will filter out metrics if there are metric declarations defined and set the dimensions for exported metrics (w/ rolled-up dimensions).
Finally, we translate the CWMetric into an EMF log and push it to CloudWatch using the appropriate log group and log stream found in the group's CWMetricMetadata.

Testing:
Tests were added for new functions and tests for modified functions were updated. Additionally, this PR was tested in a sample environment using an NGINX server on EKS. Given the following config (same as in #2):

exporters:
  awsemf:
    log_group_name: 'awscollector-test'
    region: 'us-west-2'
    log_stream_name: metric-declarations
    dimension_rollup_option: 'NoDimensionRollup'
    metric_declarations:
    - dimensions: [['Service', 'Namespace'], ['pod_name', 'container_name']]
      metric_name_selectors:
      - '^go_memstats_alloc_bytes_total$'
    - dimensions: [['app_kubernetes_io_component', 'Namespace'], ['app_kubernetes_io_name'], ['Invalid', 'Namespace']]
      metric_name_selectors:
      - '^go_goroutines$'
    - dimensions: [['Namespace', 'app_kubernetes_io_component', 'Namespace']]
      metric_name_selectors:
      - '^go_.+$'

we get the following cases:

batch with matched metrics

{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "kubernetes-service-endpoints",
                "Dimensions": [
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_memstats_heap_alloc_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_threads",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_alloc_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_gc_cpu_fraction",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_heap_released_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_mcache_inuse_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_objects",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_last_gc_time_seconds",
                        "Unit": "s"
                    },
                    {
                        "Name": "go_memstats_mcache_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_frees_total",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_stack_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_buck_hash_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_idle_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_lookups_total",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_mallocs_total",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_mspan_inuse_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_next_gc_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_other_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_gc_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_inuse_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_mspan_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_stack_inuse_bytes",
                        "Unit": "By"
                    }
                ]
            },
            {
                "Namespace": "kubernetes-service-endpoints",
                "Dimensions": [
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ],
                    [
                        "app_kubernetes_io_name"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_goroutines",
                        "Unit": ""
                    }
                ]
            },
            {
                "Namespace": "kubernetes-service-endpoints",
                "Dimensions": [
                    [
                        "Namespace",
                        "Service"
                    ],
                    [
                        "container_name",
                        "pod_name"
                    ],
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_memstats_alloc_bytes_total",
                        "Unit": ""
                    }
                ]
            }
        ],
        "Timestamp": 1606931694465
    },
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "go_goroutines": 89,
    "go_memstats_alloc_bytes": 8168512,
    "go_memstats_alloc_bytes_total": 78897.33333333333,
    "go_memstats_buck_hash_sys_bytes": 1504910,
    "go_memstats_frees_total": 939.7833333333333,
    "go_memstats_gc_cpu_fraction": 0.000016842131408600387,
    "go_memstats_gc_sys_bytes": 5698672,
    "go_memstats_heap_alloc_bytes": 8168512,
    "go_memstats_heap_idle_bytes": 54452224,
    "go_memstats_heap_inuse_bytes": 10690560,
    "go_memstats_heap_objects": 58592,
    "go_memstats_heap_released_bytes": 51896320,
    "go_memstats_heap_sys_bytes": 65142784,
    "go_memstats_last_gc_time_seconds": 1606931634.4573667,
    "go_memstats_lookups_total": 0,
    "go_memstats_mallocs_total": 866.4166666666666,
    "go_memstats_mcache_inuse_bytes": 3472,
    "go_memstats_mcache_sys_bytes": 16384,
    "go_memstats_mspan_inuse_bytes": 149192,
    "go_memstats_mspan_sys_bytes": 229376,
    "go_memstats_next_gc_bytes": 12224112,
    "go_memstats_other_sys_bytes": 760066,
    "go_memstats_stack_inuse_bytes": 1966080,
    "go_memstats_stack_sys_bytes": 1966080,
    "go_memstats_sys_bytes": 75318272,
    "go_threads": 15,
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "kubernetes_node": "ip-192-168-46-33.us-west-2.compute.internal",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-ld9wg",
    "process_cpu_seconds_total": 0.0016666666666666757,
    "process_max_fds": 1048576,
    "process_open_fds": 38,
    "process_resident_memory_bytes": 46612480,
    "process_start_time_seconds": 1606928481.44,
    "process_virtual_memory_bytes": 761430016,
    "process_virtual_memory_max_bytes": -1,
    "promhttp_metric_handler_requests_in_flight": 1
}

batch with no matched metrics

{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "controller_class": "nginx",
    "controller_namespace": "eks-aoc",
    "controller_pod": "my-nginx-ingress-nginx-controller-77d5fd6977-ld9wg",
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "host": "a7710ecaa12b540be99c5bfd5ee07a1f-266546424.us-west-2.elb.amazonaws.com",
    "ingress": "ingress-nginx-demo",
    "kubernetes_node": "ip-192-168-46-33.us-west-2.compute.internal",
    "method": "GET",
    "namespace": "eks-traffic",
    "nginx_ingress_controller_bytes_sent": {
        "Max": 10000000,
        "Min": 10,
        "Count": 114,
        "Sum": 21888
    },
    "nginx_ingress_controller_request_duration_seconds": {
        "Max": 10,
        "Min": 0.005,
        "Count": 114,
        "Sum": 0.029000000000000026
    },
    "nginx_ingress_controller_request_size": {
        "Max": 100,
        "Min": 10,
        "Count": 114,
        "Sum": 15960
    },
    "nginx_ingress_controller_response_duration_seconds": {
        "Max": 10,
        "Min": 0.005,
        "Count": 114,
        "Sum": 0.020000000000000018
    },
    "nginx_ingress_controller_response_size": {
        "Max": 10,
        "Min": 0.005,
        "Count": 114,
        "Sum": 21888
    },
    "path": "/banana",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-ld9wg",
    "service": "banana-service",
    "status": "200"
}

kohrapha · 2020-12-22T19:25:11Z

cc: @hdj630, @mxiamxia

codecov · 2020-12-22T19:39:47Z

Codecov Report

Merging #1891 (eb1cde8) into master (a20a6f4) will increase coverage by 0.09%.
The diff coverage is 99.46%.

@@            Coverage Diff             @@
##           master    #1891      +/-   ##
==========================================
+ Coverage   89.83%   89.92%   +0.09%     
==========================================
  Files         378      380       +2     
  Lines       18213    18344     +131     
==========================================
+ Hits        16361    16496     +135     
+ Misses       1388     1386       -2     
+ Partials      464      462       -2

Flag	Coverage Δ
integration	`69.77% <ø> (ø)`
unit	`88.63% <99.46%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
exporter/awsemfexporter/metric_translator.go	`98.42% <98.23%> (+0.32%)`	⬆️
exporter/awsemfexporter/datapoint.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/emf_exporter.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/groupedmetric.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/metric_declaration.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/util.go	`100.00% <100.00%> (ø)`
receiver/k8sclusterreceiver/watcher.go	`97.64% <0.00%> (+2.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a20a6f4...eb1cde8. Read the comment docs.

anuraaga · 2020-12-23T05:05:19Z

@kohrapha Thanks for the change! It sounds nice but is pretty huge - is it possible to split into

Timestamp bugfix
Refactoring without change in behavior
Add batching

?

exporter/awsemfexporter/datapoint.go

kohrapha · 2020-12-23T16:15:44Z

@anuraaga Thanks for the suggestion! I can definitely split it up, but as my internship is ending on Thursday, I might not be able to follow up.

@mxiamxia @hdj630 any thoughts?

mxiamxia

Thanks for the clean code. Pls address the comments, otherwise everything LGTM!

jpkrohling · 2020-12-28T15:12:31Z

I'm removing myself as assignee for this issue, as I'm unavailable until Jan 11.

gramidt · 2020-12-28T17:00:47Z

Great work and exciting functionality, @kohrapha!

The code looks good. LGTM!

mxiamxia · 2021-01-05T01:22:14Z

Hi @anuraaga, @bogdandrutu, @kohrapha has done his internship on this project. I have spent a good amount of time to review this PR and I see @gramidt also help on the review. Could we get an approval from you to merge the code? Thanks.

anuraaga

@mxiamxia Sorry for the late review, though it's because of the size of the PR. I understand the tension between final PRs and intern deadlines, but since we can generally expect feedback to be required anyways, I don't think it's really a reason to lump up multiple changes into a single huge PR.

I did just a quick skim so far and found a few issues. @mxiamxia will you be taking ownership of this PR, ideally splitting it up?

anuraaga · 2021-01-05T06:44:18Z

exporter/awsemfexporter/datapoint.go

+type DataPoint struct {
+	Value     interface{}
+	Labels    map[string]string
+	Timestamp int64


Why not uint64 or even Time?

I guess not Time since it's millis. Please name the field TimestampMS

I missed this and agree with @anuraaga .

Thanks. I'll work on it and do the PR split.

anuraaga · 2021-01-05T06:44:42Z

exporter/awsemfexporter/datapoint.go

+type rateCalculationMetadata struct {
+	needsCalculateRate bool
+	rateKeyParams      map[string]string
+	timestamp          int64


Ditto for all timestamps

anuraaga · 2021-01-05T06:45:10Z

exporter/awsemfexporter/datapoint.go

+func (dps IntDataPointSlice) At(i int) DataPoint {
+	metric := dps.IntDataPointSlice.At(i)
+	labels := createLabels(metric.LabelsMap(), dps.instrumentationLibraryName)
+	timestamp := unixNanoToMilliseconds(metric.Timestamp())


Suggested change

timestamp := unixNanoToMilliseconds(metric.Timestamp())

timestampMS := unixNanoToMilliseconds(metric.Timestamp())

anuraaga · 2021-01-05T07:04:12Z

exporter/awsemfexporter/util.go

+}
+
+// createMetricKey generates a hashed key from metric labels and additional parameters
+func createMetricKey(labels map[string]string, parameters map[string]string) string {


We should use label.Distinct as we did in the statsd receiver.

#1670 (comment)

anuraaga · 2021-01-05T07:05:22Z

exporter/awsemfexporter/datapoint.go

+		return
+	}
+
+	rateKeyParams := map[string]string{


It looks like this can be a struct instead of a map. There should just be one struct containing all of these fields along with label.Distinct

tigrannajaryan · 2021-01-07T17:24:24Z

@anuraaga since you already reviewed this I am assigning the PR to you so that you can facilitate. Thanks.

github-actions · 2021-01-15T06:02:29Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

bogdandrutu · 2021-01-21T17:49:10Z

@kohrapha @anuraaga friendly ping on this

anuraaga · 2021-01-22T06:58:57Z

@mxiamxia Can we close this PR for now until we get time for it?

gramidt · 2021-01-22T16:03:02Z

@kohrapha @anuraaga - I'm happy to help out where needed or even take it to completion. Let me know your thoughts.

mxiamxia · 2021-01-23T06:20:31Z

@mxiamxia Can we close this PR for now until we get time for it?

Yes, please close this one and I'll split this PR to smaller ones next week.

bogdandrutu · 2021-01-25T18:58:07Z

Closing per @mxiamxia request

* Removed groupbytraceprocessor Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de> * Removed link to the groupbytrace processor Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

We sent a large PR #1891 to support batching the metrics on the same dimensions for AWS EMF Log request to save the customers' billing cost and request throughput. At the same time, there was a fairly large code refactor on EMFExporter. For better code review purpose, I plan to split #1891 to 2 PRs. (This is PR#1) In this PR, We refactored EMFExporter without introducing any new feature. For each OTel metrics data point, we defined `DataPoint` file, it wraps `pdata.DataPointSlice` to the custom structures for each type of metric data point. we also moved the metric data handling functions - data conversion and rate calculation to `datapoint`. It also fixed the metric `timestamp` bug.

We sent a large PR open-telemetry#1891 to support batching the metrics on the same dimensions for AWS EMF Log request to save the customers' billing cost and request throughput. At the same time, there was a fairly large code refactor on EMFExporter. For better code review purpose, I plan to split open-telemetry#1891 to 2 PRs. (This is PR#1) In this PR, We refactored EMFExporter without introducing any new feature. For each OTel metrics data point, we defined `DataPoint` file, it wraps `pdata.DataPointSlice` to the custom structures for each type of metric data point. we also moved the metric data handling functions - data conversion and rate calculation to `datapoint`. It also fixed the metric `timestamp` bug.

* Add semantic convention generator Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Update semantic conventions from generator Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Use existing internal/tools module Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Fix lint issues, more initialisms Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Update changelog Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * semconvgen: Faas->FaaS Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Fix a few more key names with replacements * Update replacements from PR feedback Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * rename commonInitialisms to capitalizations, move some capitalizations there Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Regenerate semantic conventions with updated capitalizations and replacements Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Generate semantic conventions from spec v1.3.0 Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Cleanup semconv generator util a bit Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * No need to put internal tooling additions in the CHANGELOG Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Fix HTTP semconv tests Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com> * Add semconv generation notes to RELEASING.md Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>

kohrapha added 2 commits December 7, 2020 17:50

Implement batching logic in EMF Exporter

fd1a25b

Use metric timestamp

9cd18ee

kohrapha requested a review from anuraaga as a code owner December 22, 2020 19:24

kohrapha requested a review from a team December 22, 2020 19:24

github-actions bot assigned jpkrohling Dec 22, 2020

mxiamxia reviewed Dec 23, 2020

View reviewed changes

exporter/awsemfexporter/datapoint.go Show resolved Hide resolved

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

mxiamxia reviewed Dec 23, 2020

View reviewed changes

exporter/awsemfexporter/datapoint.go Show resolved Hide resolved

mxiamxia reviewed Dec 23, 2020

View reviewed changes

Remove max and min from histogram datapoint

eb1cde8

mxiamxia approved these changes Dec 23, 2020

View reviewed changes

kohrapha changed the title ~~Kohrapha/integration~~ [awsemfexporter] Group exported metrics by labels Dec 23, 2020

jpkrohling removed their assignment Dec 28, 2020

anuraaga reviewed Jan 5, 2021

View reviewed changes

tigrannajaryan assigned anuraaga Jan 7, 2021

github-actions bot added the Stale label Jan 15, 2021

github-actions bot removed the Stale label Jan 22, 2021

bogdandrutu closed this Jan 25, 2021

mxiamxia mentioned this pull request Feb 2, 2021

Enhance EMFExporter for Metrics Batching in AWS EMF Logs #2271

Merged

mxiamxia mentioned this pull request Feb 10, 2021

[awsemfexporter] Group exported metrics by labels #2317

Merged

gramidt mentioned this pull request Feb 12, 2021

REQUEST: New membership for @gramidt open-telemetry/community#648

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[awsemfexporter] Group exported metrics by labels #1891

[awsemfexporter] Group exported metrics by labels #1891

kohrapha commented Dec 22, 2020

kohrapha commented Dec 22, 2020

codecov bot commented Dec 22, 2020 •

edited

Loading

anuraaga commented Dec 23, 2020

kohrapha commented Dec 23, 2020

mxiamxia left a comment

jpkrohling commented Dec 28, 2020

gramidt commented Dec 28, 2020

mxiamxia commented Jan 5, 2021

anuraaga left a comment

anuraaga Jan 5, 2021

anuraaga Jan 5, 2021

gramidt Jan 5, 2021

mxiamxia Jan 7, 2021

anuraaga Jan 5, 2021

anuraaga Jan 5, 2021

anuraaga Jan 5, 2021

anuraaga Jan 5, 2021

tigrannajaryan commented Jan 7, 2021

github-actions bot commented Jan 15, 2021

bogdandrutu commented Jan 21, 2021

anuraaga commented Jan 22, 2021

gramidt commented Jan 22, 2021 •

edited

Loading

mxiamxia commented Jan 23, 2021 •

edited

Loading

bogdandrutu commented Jan 25, 2021

	timestamp := unixNanoToMilliseconds(metric.Timestamp())
	timestampMS := unixNanoToMilliseconds(metric.Timestamp())

[awsemfexporter] Group exported metrics by labels #1891

[awsemfexporter] Group exported metrics by labels #1891

Conversation

kohrapha commented Dec 22, 2020

Description

Implementation Details

kohrapha commented Dec 22, 2020

codecov bot commented Dec 22, 2020 • edited Loading

Codecov Report

anuraaga commented Dec 23, 2020

kohrapha commented Dec 23, 2020

mxiamxia left a comment

Choose a reason for hiding this comment

jpkrohling commented Dec 28, 2020

gramidt commented Dec 28, 2020

mxiamxia commented Jan 5, 2021

anuraaga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Jan 7, 2021

github-actions bot commented Jan 15, 2021

bogdandrutu commented Jan 21, 2021

anuraaga commented Jan 22, 2021

gramidt commented Jan 22, 2021 • edited Loading

mxiamxia commented Jan 23, 2021 • edited Loading

bogdandrutu commented Jan 25, 2021

codecov bot commented Dec 22, 2020 •

edited

Loading

gramidt commented Jan 22, 2021 •

edited

Loading

mxiamxia commented Jan 23, 2021 •

edited

Loading