Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXPORTER] Prometheus: Add unit to names, convert to word #2213

Merged
merged 17 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,79 @@ class PrometheusExporterUtils
*/
static std::string SanitizeNames(std::string name);

static std::string MapToPrometheusName(const std::string &name,
const std::string &unit,
::prometheus::MetricType prometheus_type);

/**
* A utility function that returns the equivalent Prometheus name for the provided OTLP metric
* unit.
*
* @param raw_metric_unitName The raw metric unit for which Prometheus metric unit needs to be
* computed.
* @return the computed Prometheus metric unit equivalent of the OTLP metric un
esigo marked this conversation as resolved.
Show resolved Hide resolved
*/
static std::string GetEquivalentPrometheusUnit(const std::string &raw_metric_unitName);

/**
* This method retrieves the expanded Prometheus unit name for known abbreviations. OTLP metrics
* use the c/s notation as specified at <a href="https://ucum.org/ucum.html">UCUM</a>. The list of
* mappings is adopted from <a
* href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/9a9d4778bbbf242dba233db28e2fbcfda3416959/pkg/translator/prometheus/normalize_name.go#L30">OpenTelemetry
* Collector Contrib</a>.
*
* @param unit_abbreviation The unit that name that needs to be expanded/converted to Prometheus
* units.
* @return The expanded/converted unit name if known, otherwise returns the input unit name as-is.
*/
static std::string GetPrometheusUnit(const std::string &unit_abbreviation);

/**
* This method retrieves the expanded Prometheus unit name to be used with "per" units for known
* units. For example: s => per second (singular)
*
* @param per_unit_abbreviation The unit abbreviation used in a 'per' unit.
* @return The expanded unit equivalent to be used in 'per' unit if the input is a known unit,
* otherwise returns the input as-is.
*/
static std::string GetPrometheusPerUnit(const std::string &per_unit_abbreviation);

/**
* Replaces all characters that are not a letter or a digit with '_' to make the resulting string
* Prometheus compliant. This method also removes leading and trailing underscores - this is done
* to keep the resulting unit similar to what is produced from the collector's implementation.
*
* @param str The string input that needs to be made Prometheus compliant.
* @return the cleaned-up Prometheus compliant string.
*/
static std::string CleanUpString(const std::string &str);

/**
* This method is used to convert the units expressed as a rate via '/' symbol in their name to
* their expanded text equivalent. For instance, km/h => km_per_hour. The method operates on the
* input by splitting it in 2 parts - before and after '/' symbol and will attempt to expand any
* known unit abbreviation in both parts. Unknown abbreviations & unsupported characters will
* remain unchanged in the final output of this function.
*
* @param rate_expressed_unit The rate unit input that needs to be converted to its text
* equivalent.
* @return The text equivalent of unit expressed as rate. If the input does not contain '/', the
* function returns it as-is.
*/
static std::string ConvertRateExpressedToPrometheusUnit(const std::string &rate_expressed_unit);

/**
* This method drops all characters enclosed within '{}' (including the curly braces) by replacing
* them with an empty string. Note that this method will not produce the intended effect if there
* are nested curly braces within the outer enclosure of '{}'.
*
* <p>For instance, {packet{s}s} => s}.
*
* @param unit The input unit from which text within curly braces needs to be removed.
* @return The resulting unit after removing the text within '{}'.
*/
static std::string RemoveUnitPortionInBraces(const std::string &unit);

static opentelemetry::sdk::metrics::AggregationType getAggregationType(
const opentelemetry::sdk::metrics::PointType &point_type);

Expand Down
168 changes: 159 additions & 9 deletions exporters/prometheus/src/exporter_utils.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

#include <regex>
#include <sstream>
#include <string>
#include <utility>
#include <vector>
#include "prometheus/metric_family.h"
Expand Down Expand Up @@ -38,13 +40,7 @@ std::vector<prometheus_client::MetricFamily> PrometheusExporterUtils::TranslateT
{
for (const auto &metric_data : instrumentation_info.metric_data_)
{
auto origin_name = metric_data.instrument_descriptor.name_;
auto unit = metric_data.instrument_descriptor.unit_;
auto sanitized = SanitizeNames(origin_name);
prometheus_client::MetricFamily metric_family;
metric_family.name = sanitized + "_" + unit;
metric_family.help = metric_data.instrument_descriptor.description_;
auto time = metric_data.end_ts.time_since_epoch();
auto time = metric_data.end_ts.time_since_epoch();
for (const auto &point_data_attr : metric_data.point_data_attr_)
{
auto kind = getAggregationType(point_data_attr.point_data);
Expand All @@ -55,7 +51,11 @@ std::vector<prometheus_client::MetricFamily> PrometheusExporterUtils::TranslateT
nostd::get<sdk::metrics::SumPointData>(point_data_attr.point_data).is_monotonic_;
}
const prometheus_client::MetricType type = TranslateType(kind, is_monotonic);
metric_family.type = type;
prometheus_client::MetricFamily metric_family;
metric_family.type = type;
metric_family.name = MapToPrometheusName(metric_data.instrument_descriptor.name_,
metric_data.instrument_descriptor.unit_, type);
metric_family.help = metric_data.instrument_descriptor.description_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code moved inside the loop on data points.

The metric name, type, help, etc should be constant for all data points, so can this be moved outside ?

Even if type is needed and not available elsewhere, peek at the first data point before entering the loop on data points ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. As long as it is safe to assume all data points have the same type, peeking at the first seems like the right thing to do. I took that approach in https://github.com/open-telemetry/opentelemetry-cpp/pull/2288/files#diff-e92aebc994eed5c134b09ecde972d9db39a9bfc46edc6d93b0b5a4c98e4ff27fR46.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, moved back to where it was.

if (type == prometheus_client::MetricType::Histogram) // Histogram
{
auto histogram_point_data =
Expand Down Expand Up @@ -114,8 +114,8 @@ std::vector<prometheus_client::MetricFamily> PrometheusExporterUtils::TranslateT
"invalid SumPointData type");
}
}
output.emplace_back(metric_family);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the prometheus C++ library merge metric families? We definitely don't want one metric family for each data point.

Copy link
Member Author

@esigo esigo Sep 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, creating new family only if it doesn't exist already.

}
output.emplace_back(metric_family);
}
}
return output;
Expand Down Expand Up @@ -167,6 +167,156 @@ std::string PrometheusExporterUtils::SanitizeNames(std::string name)
return name;
}

std::regex INVALID_CHARACTERS_PATTERN("[^a-zA-Z0-9]");
std::regex CHARACTERS_BETWEEN_BRACES_PATTERN("\\{(.*?)\\}");
std::regex SANITIZE_LEADING_UNDERSCORES("^_+");
std::regex SANITIZE_TRAILING_UNDERSCORES("_+$");
std::regex SANITIZE_CONSECUTIVE_UNDERSCORES("[_]{2,}");

std::string PrometheusExporterUtils::GetEquivalentPrometheusUnit(
const std::string &raw_metric_unit_name)
{
if (raw_metric_unit_name.empty())
{
return raw_metric_unit_name;
}

std::string converted_metric_unit_name = RemoveUnitPortionInBraces(raw_metric_unit_name);
converted_metric_unit_name = ConvertRateExpressedToPrometheusUnit(converted_metric_unit_name);

return CleanUpString(GetPrometheusUnit(converted_metric_unit_name));
}

std::string PrometheusExporterUtils::GetPrometheusUnit(const std::string &unit_abbreviation)
{
static std::map<std::string, std::string> units{// Time
owent marked this conversation as resolved.
Show resolved Hide resolved
{"d", "days"},
{"h", "hours"},
{"min", "minutes"},
{"s", "seconds"},
{"ms", "milliseconds"},
{"us", "microseconds"},
{"ns", "nanoseconds"},
// Bytes
{"By", "bytes"},
{"KiBy", "kibibytes"},
{"MiBy", "mebibytes"},
{"GiBy", "gibibytes"},
{"TiBy", "tibibytes"},
{"KBy", "kilobytes"},
{"MBy", "megabytes"},
{"GBy", "gigabytes"},
{"TBy", "terabytes"},
{"B", "bytes"},
{"KB", "kilobytes"},
{"MB", "megabytes"},
{"GB", "gigabytes"},
{"TB", "terabytes"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've removed B and $ from mappings elsewhere, as B means bel in UCUM. See open-telemetry/opentelemetry-java#5719

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, removed.

// SI
{"m", "meters"},
{"V", "volts"},
{"A", "amperes"},
{"J", "joules"},
{"W", "watts"},
{"g", "grams"},
// Misc
{"Cel", "celsius"},
{"Hz", "hertz"},
{"1", ""},
{"%", "percent"},
{"$", "dollars"}};
auto res_it = units.find(unit_abbreviation);
if (res_it == units.end())
{
return unit_abbreviation;
}
return res_it->second;
}

std::string PrometheusExporterUtils::GetPrometheusPerUnit(const std::string &per_unit_abbreviation)
{
static std::map<std::string, std::string> per_units{
{"s", "second"}, {"m", "minute"}, {"h", "hour"}, {"d", "day"},
{"w", "week"}, {"mo", "month"}, {"y", "year"}};
auto res_it = per_units.find(per_unit_abbreviation);
if (res_it == per_units.end())
{
return per_unit_abbreviation;
}
return res_it->second;
}

std::string PrometheusExporterUtils::RemoveUnitPortionInBraces(const std::string &unit)
{
return std::regex_replace(unit, CHARACTERS_BETWEEN_BRACES_PATTERN, "");
}

std::string PrometheusExporterUtils::ConvertRateExpressedToPrometheusUnit(
const std::string &rate_expressed_unit)
{
if (rate_expressed_unit.find("/") == std::string::npos)
{
return rate_expressed_unit;
}

std::vector<std::string> rate_entities;
size_t pos = rate_expressed_unit.find("/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

do not execute the find twice, save the result in pos and test for pos == std::string::npos at the start of this method.

rate_entities.push_back(rate_expressed_unit.substr(0, pos));
rate_entities.push_back(rate_expressed_unit.substr(pos + 1));

if (rate_entities[1].empty())
{
return rate_expressed_unit;
}

std::string prometheus_unit = GetPrometheusUnit(rate_entities[0]);
std::string prometheus_per_unit = GetPrometheusPerUnit(rate_entities[1]);

return prometheus_unit + "_per_" + prometheus_per_unit;
}

std::string PrometheusExporterUtils::CleanUpString(const std::string &str)
{
std::string cleaned_string = std::regex_replace(str, INVALID_CHARACTERS_PATTERN, "_");
cleaned_string = std::regex_replace(cleaned_string, SANITIZE_CONSECUTIVE_UNDERSCORES, "_");
cleaned_string = std::regex_replace(cleaned_string, SANITIZE_TRAILING_UNDERSCORES, "");
cleaned_string = std::regex_replace(cleaned_string, SANITIZE_LEADING_UNDERSCORES, "");

return cleaned_string;
}

std::string PrometheusExporterUtils::MapToPrometheusName(
const std::string &name,
const std::string &unit,
prometheus_client::MetricType prometheus_type)
{
auto sanitized_name = SanitizeNames(name);
std::string prometheus_equivalent_unit = GetEquivalentPrometheusUnit(unit);

// Append prometheus unit if not null or empty.
if (!prometheus_equivalent_unit.empty() &&
sanitized_name.find(prometheus_equivalent_unit) == std::string::npos)
{
sanitized_name += "_" + prometheus_equivalent_unit;
}

// Special case - counter
if (prometheus_type == prometheus_client::MetricType::Counter &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counters MUST end in _total, so we need to be less tolerant here. We can only skip the _total suffix if the name ends in _total. Since we need unit suffixes to come before type suffixes, one approach would be to trim _total suffixes from the name before appending the unit suffix, and finally append the final _total suffix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will not add total unless if it has already.

sanitized_name.find("total") == std::string::npos)
{
sanitized_name += "_total";
}

// Special case - gauge
if (unit == "1" && prometheus_type == prometheus_client::MetricType::Gauge &&
sanitized_name.find("ratio") == std::string::npos)
{
sanitized_name += "_ratio";
}

return CleanUpString(SanitizeNames(sanitized_name));
}

metric_sdk::AggregationType PrometheusExporterUtils::getAggregationType(
const metric_sdk::PointType &point_type)
{
Expand Down
2 changes: 1 addition & 1 deletion exporters/prometheus/test/collector_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ TEST(PrometheusCollector, BasicTests)

// Collection size should be the same as the size
// of the records collection produced by MetricProducer.
ASSERT_EQ(data.size(), 1);
ASSERT_EQ(data.size(), 2);
delete reader;
delete producer;
}
Loading