Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: coredump when table name contains '_' and prometheus is enabled #828

Merged
merged 5 commits into from
Oct 19, 2021

Conversation

padmejin
Copy link
Collaborator

What problem does this PR solve?

After we enable prometheus(set "perf_counter_sink" to "prometheus"), pegasus ended up core dumped after restart.
We debuged the core, found out it was cored while analysing table name that contains "_".
This PR fixes the bug by changing the order of formating table names.

What is changed and how does it work?

Prometheus does not support some special charactor in metric name.
This original code first formated the table name by replacing special charactor to "" or "@".
Then, it try to parse gpid as prometheus labels by spliting the foramted name by ":" or "
".
Unfortunately, tables that already contains "_" may results in unexpected group result, which eventually ended up core.

Checklist

Tests
  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code
Code changes
  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change
Side effects
  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility
Related changes
  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to be included in the release note

@levy5307 levy5307 changed the title fix coredump when table name contains '_' and prometheus is enabled fix: coredump when table name contains '_' and prometheus is enabled Oct 18, 2021
@padmejin
Copy link
Collaborator Author

As this picture shows below, when core dumps it was trying to analyze metric naming replica_econ_replica_table_level_RPC_RRDB_RRDB_REMOVE_latency_ns_:impala_historical_profile_p999, where impala_historical_profile is a table we created earlier that contains multiple underscore.

wecom-temp-979f68e2dd409e3c69aadc62fab16057

As the code shown below on line 256, it tries to split metrics_name by : first, then split the last part by _ and put the splited tokens in an array that length is 3. Because this table already contains underscore, there are more than 3 tokens, so it got core dumped after at line 259 trying to set the fourth element of the array.
wecom-temp-b491dd1192d580d28b13da889d23196e

This patch fix it by switching these format order:
Firstly, it try to parse gpid as prometheus labels by spliting the foramted name by : or @.
Then, it formated the table name by replacing special charactor to _ or "@".

In this order tables that contains _ can be parsed expectedly in the first step.

levy5307
levy5307 previously approved these changes Oct 19, 2021
src/reporter/pegasus_counter_reporter.cpp Outdated Show resolved Hide resolved
src/reporter/pegasus_counter_reporter.cpp Outdated Show resolved Hide resolved
src/reporter/pegasus_counter_reporter.cpp Outdated Show resolved Hide resolved
src/reporter/pegasus_counter_reporter.cpp Outdated Show resolved Hide resolved
Copy link
Member

@acelyc111 acelyc111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution @padme-jin .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants