Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some worker metrics are not correct #16229

Open
dbw9580 opened this issue Sep 23, 2022 · 7 comments
Open

Some worker metrics are not correct #16229

dbw9580 opened this issue Sep 23, 2022 · 7 comments
Labels
stale The PR/Issue does not have recent activities and will be closed automatically type-bug This issue is about a bug

Comments

@dbw9580
Copy link
Contributor

dbw9580 commented Sep 23, 2022

Alluxio Version:
2.8

Describe the bug

  1. Worker.ActiveClients reports more clients than there actually are.
  2. Cache hit/miss ratios are wrong, with reports showing nonsense values like 64400% (Cache Miss Percentage and Cache Hit Remote Percentage are wrong #16945)

To Reproduce
Perform I/O through Alluxio worker, and observe worker metrics.

Expected behavior
Worker.ActiveClients should reflect active clients that is currently reading or writing data on the worker.

Urgency
Normal

Are you planning to fix it
yes

Additional context
Add any other context about the problem here.

@dbw9580 dbw9580 added the type-bug This issue is about a bug label Sep 23, 2022
@dbw9580
Copy link
Contributor Author

dbw9580 commented Sep 23, 2022

the metric is incremented twice when creating a block reader:

DefaultBlockWorker.Metrics.WORKER_ACTIVE_CLIENTS.inc();

@jiacheliu3 jiacheliu3 changed the title Metric Worker.ActiveClients is not correct Some worker metris are not correct Nov 4, 2022
@jiacheliu3
Copy link
Contributor

Two other metrics that are potentially relevant:

    "Cluster.ActiveRpcReadCount" : {
      "count" : -1
    },

    "Cluster.ActiveRpcWriteCount" : {
      "count" : -559
    },

@secfree
Copy link
Contributor

secfree commented Nov 8, 2022

@dbw9580 may I know if there is a ETA for the PR to resolve this issue? Our clusters have very big values of "Worker.ActiveClients", and we have backported #14088.

@dbw9580
Copy link
Contributor Author

dbw9580 commented Nov 8, 2022

@secfree I'm not actively working on this, so feel free to come up with a PR to fix any metric you find incorrect.

For Worker.ActiveClients specifically, I had a little investigation here, but I suspect there're more subtle places where this metric is not correctly updated.

@secfree
Copy link
Contributor

secfree commented Nov 8, 2022

@dbw9580 thanks for your information.

@github-actions
Copy link

github-actions bot commented Feb 5, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Feb 5, 2023
@dbw9580 dbw9580 removed the stale The PR/Issue does not have recent activities and will be closed automatically label Feb 6, 2023
@dbw9580 dbw9580 changed the title Some worker metris are not correct Some worker metrics are not correct Feb 27, 2023
@github-actions
Copy link

github-actions bot commented Apr 6, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The PR/Issue does not have recent activities and will be closed automatically type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

3 participants