Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: minor review comments #3279

Merged
merged 12 commits into from
May 7, 2018
9 changes: 5 additions & 4 deletions source/common/stats/thread_local_store.cc
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,9 @@ void ThreadLocalStoreImpl::mergeHistograms(PostMergeCb merge_complete_cb) {
merge_in_progress_ = true;
tls_->runOnAllThreads(
[this]() -> void {
for (const auto& scopes : tls_->getTyped<TlsCache>().scope_cache_) {
for (const auto& name_histogram_pair : scopes.second.histograms_) {
for (const auto& scope : tls_->getTyped<TlsCache>().scope_cache_) {
const TlsCacheEntry& tls_cache_entry = scope.second;
for (const auto& name_histogram_pair : tls_cache_entry.histograms_) {
const TlsHistogramSharedPtr& tls_hist = name_histogram_pair.second;
tls_hist->beginMerge();
}
Expand Down Expand Up @@ -135,7 +136,8 @@ void ThreadLocalStoreImpl::releaseScopeCrossThread(ScopeImpl* scope) {
}
}

std::string ThreadLocalStoreImpl::getTagsForName(const std::string& name, std::vector<Tag>& tags) {
std::string ThreadLocalStoreImpl::getTagsForName(const std::string& name,
std::vector<Tag>& tags) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whitespace change makes me think you are running the format-fixing script with the clang setup on your machine rather than via docker. In particular it re-wrapped the line to fit in 80 columns when Envoy standard is 100. You probably want to run the format-fixer like this:

./ci/run_envoy_docker.sh './ci/do_ci.sh fix_format'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is because of const addition the length went up to 106 that is why it wrapped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I stared at that for 1 less second than needed & missed the 'const' addition :)

return tag_producer_->produceTags(name, tags);
}

Expand Down Expand Up @@ -297,7 +299,6 @@ Histogram& ThreadLocalStoreImpl::ScopeImpl::tlsHistogram(const std::string& name
return **tls_ref;
}

std::unique_lock<std::mutex> lock(parent_.lock_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think parent.tls_ must be not-nullptr above. Can you remove that check? Also, this makes me wonder if we can get in here when we are shutting down? I'm guessing maybe? I kind of wonder if during shutdown we should not create a TLS histogram at all, and return some type of null-histogram to write into, to avoid potentially creating a bunch of TLS histograms that we never store anywhere. TBH I think a similar problem exists for stats/guages but I think it could be much more pronounced for how the histogram code is now written given that we always do this TLS fetch on every write.

Copy link
Contributor Author

@ramaraochavali ramaraochavali May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the parent_tls_ nullptr check. Regarding the creating TLS histogram during shutdown, I think I will need to think through more so not doing this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted this check as this test case https://github.com/envoyproxy/envoy/blob/master/test/common/stats/thread_local_store_test.cc#L208 fails. Do we need this test at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to merge this PR if we are good with metrics service test changes since it removes the flakiness and can do another PR for this parent_tls_ null check and think about creating TLS histogram in shutdown case.

std::vector<Tag> tags;
std::string tag_extracted_name = parent_.getTagsForName(name, tags);
TlsHistogramSharedPtr hist_tls_ptr = std::make_shared<ThreadLocalHistogramImpl>(
Expand Down
2 changes: 1 addition & 1 deletion source/common/stats/thread_local_store.h
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ class ThreadLocalStoreImpl : Logger::Loggable<Logger::Id::stats>, public StoreRo
RawStatDataAllocator& free_;
};

std::string getTagsForName(const std::string& name, std::vector<Tag>& tags);
std::string getTagsForName(const std::string& name, std::vector<Tag>& tags) const;
void clearScopeFromCaches(uint64_t scope_id);
void releaseScopeCrossThread(ScopeImpl* scope);
SafeAllocData safeAlloc(const std::string& name);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,40 +56,47 @@ class MetricsServiceIntegrationTest : public HttpIntegrationTest,
}

void waitForMetricsRequest() {
envoy::service::metrics::v2::StreamMetricsMessage request_msg;
metrics_service_request_->waitForGrpcMessage(*dispatcher_, request_msg);
EXPECT_STREQ("POST", metrics_service_request_->headers().Method()->value().c_str());
EXPECT_STREQ("/envoy.service.metrics.v2.MetricsService/StreamMetrics",
metrics_service_request_->headers().Path()->value().c_str());
EXPECT_STREQ("application/grpc",
metrics_service_request_->headers().ContentType()->value().c_str());
EXPECT_TRUE(request_msg.envoy_metrics_size() > 0);
const Protobuf::RepeatedPtrField<::io::prometheus::client::MetricFamily>& envoy_metrics =
request_msg.envoy_metrics();
bool known_histogram_exists = false;
bool known_counter_exists = false;
bool known_gauge_exists = false;
bool known_histogram_exists = false;
for (::io::prometheus::client::MetricFamily metrics_family : envoy_metrics) {
if (metrics_family.name() == "cluster.cluster_0.membership_change" &&
metrics_family.type() == ::io::prometheus::client::MetricType::COUNTER) {
known_counter_exists = true;
EXPECT_EQ(1, metrics_family.metric(0).counter().value());
}
if (metrics_family.name() == "cluster.cluster_0.membership_total" &&
metrics_family.type() == ::io::prometheus::client::MetricType::GAUGE) {
known_gauge_exists = true;
EXPECT_EQ(1, metrics_family.metric(0).gauge().value());
}
if (metrics_family.name() == "cluster.cluster_0.upstream_rq_time" &&
metrics_family.type() == ::io::prometheus::client::MetricType::SUMMARY) {
known_histogram_exists = true;
Stats::HistogramStatisticsImpl empty_statistics;
EXPECT_EQ(metrics_family.metric(0).summary().quantile_size(),
empty_statistics.supportedQuantiles().size());
}
ASSERT(metrics_family.metric(0).has_timestamp_ms());
if (known_counter_exists && known_gauge_exists && known_histogram_exists) {
break;
// Sometimes stats do not come in the first flush cycle, this loop ensures that we wait till
// required stats are flushed.
// TODO(ramaraochavali): Figure out a more robust way to find out all required stats have been
// flushed.
while (!(known_counter_exists && known_gauge_exists && known_histogram_exists)) {
envoy::service::metrics::v2::StreamMetricsMessage request_msg;
metrics_service_request_->waitForGrpcMessage(*dispatcher_, request_msg);
EXPECT_STREQ("POST", metrics_service_request_->headers().Method()->value().c_str());
EXPECT_STREQ("/envoy.service.metrics.v2.MetricsService/StreamMetrics",
metrics_service_request_->headers().Path()->value().c_str());
EXPECT_STREQ("application/grpc",
metrics_service_request_->headers().ContentType()->value().c_str());
EXPECT_TRUE(request_msg.envoy_metrics_size() > 0);
const Protobuf::RepeatedPtrField<::io::prometheus::client::MetricFamily>& envoy_metrics =
request_msg.envoy_metrics();

for (::io::prometheus::client::MetricFamily metrics_family : envoy_metrics) {
if (metrics_family.name() == "cluster.cluster_0.membership_change" &&
metrics_family.type() == ::io::prometheus::client::MetricType::COUNTER) {
known_counter_exists = true;
EXPECT_EQ(1, metrics_family.metric(0).counter().value());
}
if (metrics_family.name() == "cluster.cluster_0.membership_total" &&
metrics_family.type() == ::io::prometheus::client::MetricType::GAUGE) {
known_gauge_exists = true;
EXPECT_EQ(1, metrics_family.metric(0).gauge().value());
}
if (metrics_family.name() == "cluster.cluster_0.upstream_rq_time" &&
metrics_family.type() == ::io::prometheus::client::MetricType::SUMMARY) {
known_histogram_exists = true;
Stats::HistogramStatisticsImpl empty_statistics;
EXPECT_EQ(metrics_family.metric(0).summary().quantile_size(),
empty_statistics.supportedQuantiles().size());
}
ASSERT(metrics_family.metric(0).has_timestamp_ms());
if (known_counter_exists && known_gauge_exists && known_histogram_exists) {
break;
}
}
}
EXPECT_TRUE(known_counter_exists);
Expand Down