Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more metrics about heartbeat #2800

Closed
nolouch opened this issue Aug 19, 2020 · 7 comments
Closed

Add more metrics about heartbeat #2800

nolouch opened this issue Aug 19, 2020 · 7 comments
Labels
component/metrics Metrics. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/enhancement The issue or PR belongs to an enhancement.

Comments

@nolouch
Copy link
Contributor

nolouch commented Aug 19, 2020

Feature Request

Describe your feature request related problem

We only have some metrics to see the ops about update cache or update kv, but cannot know the whole handle duration and the duration of each update operator. in the large cluster, about 200w regions, we meet the issue like #2783
we only have a method to profile and compare the flame graph.

Describe the feature you'd like

  • Add a histogram about handle region heartbeat
  • Add a histogram about handle store heartbeat
  • Add a histogram about the update region cache.

Describe alternatives you've considered

we should consider the performance, the metrics are in the hot path, especially the region heartbeat.

Teachability, Documentation, Adoption, Migration Strategy

  • Go
@nolouch nolouch added type/enhancement The issue or PR belongs to an enhancement. component/metrics Metrics. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Aug 19, 2020
@lizhemingi
Copy link
Contributor

I would like to give it a try if you don't mind.

@nolouch
Copy link
Contributor Author

nolouch commented Aug 24, 2020

@duduainankai thanks, you can have a try.

@lizhemingi
Copy link
Contributor

@nolouch Would you mind giving some tips where to start ? I am quite new to pd.

@nolouch
Copy link
Contributor Author

nolouch commented Aug 24, 2020

you can read the function RegionHeartbeat in server/grpc_server.go

@nolouch
Copy link
Contributor Author

nolouch commented Aug 26, 2020

Hi, @duduainankai. Do you have a try?

@lizhemingi
Copy link
Contributor

@nolouch

Yep, I was trying. But I have some questions.

  • In my understanding, I am supposed to add metric to record the time cost like in HandleRegionHeartbeat. (maybe also add a panel in grafana?) Correct me if I am wrong.

  • Is there any easy ways for me to test locally ? (using tiup ?)

PS. Sorry for the slow response.

@lizhemingi
Copy link
Contributor

lizhemingi commented Aug 28, 2020

In function RegionHeartBeat when rc.HandleRegionHeartbeat returns an error, sendErr will increase an "err" in regionHeartbeatCounter.

But then the for loop in RegionHeartBeat will also increase an "ok".

I think it's a bug ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/metrics Metrics. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants