-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a guide to metrics for monitoring Teleport #46645
Conversation
🤖 Vercel preview here: https://docs-hwwmn4qux-goteleport.vercel.app/docs/ver/preview |
7f35bb1
to
7443fd8
Compare
🤖 Vercel preview here: https://docs-3vr59qbq9-goteleport.vercel.app/docs/ver/preview |
The backend throughput metrics discussed in the previous section map on to | ||
latency metrics. Whenever the Auth Service increments one of the throughput | ||
metrics, it reports one of the corresponding latency metrics. See the table | ||
below for which throughput metrics miap to which latency metrics. Each metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below for which throughput metrics miap to which latency metrics. Each metric | |
below for which throughput metrics map to which latency metrics. Each metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b6b8d4f
7443fd8
to
b6b8d4f
Compare
b6b8d4f
to
6a52832
Compare
🤖 Vercel preview here: https://docs-et5246prw-goteleport.vercel.app/docs/ver/preview |
6a52832
to
6f602be
Compare
🤖 Vercel preview here: https://docs-civ0o1ngr-goteleport.vercel.app/docs/ver/preview |
6f602be
to
4d69a4f
Compare
🤖 Vercel preview here: https://docs-g60stgyxr-goteleport.vercel.app/docs/ver/preview |
@evanfreed I've added new information based on your feedback. Checking to make sure it's accurate. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one spelling find
c888cf2
to
dd0d4f9
Compare
Closes #40664 This change turns the Metrics guide in `admin-guides` into a conceptual guide to the most important metrics for monitoring a Teleport cluster. Since Agent metrics have inconsistent comprehensiveness across Teleport services--and to reduce the scope of this change--this guide focuses on self-hosted clusters. To make this a conceptual guide instead of a reference, this change removes the reference table from the `admin-guides` metrics page. There is already a table in the dedicated metrics reference guide. Note that, while the new metrics guide is specific to self-hosted clusters, this change does not move the guide to the subsection of Admin Guides related to self-hosting Teleport. Doing this would mean having one subsection of Admin Guides for diagnostics-related guides and one subsection for self-hosted-specific diagnostics, which is potentially confusing. We may also want to add Agent-specific metrics eventually. Finally, this change does not include alert thresholds for the metrics it describes. We can define these in a subsequent change.
- Describe `backend_write_requests_failed_precondition_total` - Include the precondition metric in the write availability formula. - Turn the `registered_servers` discussion into a discussion of Teleport instance version, since it's not possible to group this metric by service and subtract the count of Auth Service/Proxy Service instances from the count of all registered services.
dd0d4f9
to
da46d40
Compare
🤖 Vercel preview here: https://docs-qjwtm0d7k-goteleport.vercel.app/docs/ver/preview |
🤖 Vercel preview here: https://docs-jj68b6zag-goteleport.vercel.app/docs/ver/preview |
* Add a guide to metrics for monitoring Teleport Closes #40664 This change turns the Metrics guide in `admin-guides` into a conceptual guide to the most important metrics for monitoring a Teleport cluster. Since Agent metrics have inconsistent comprehensiveness across Teleport services--and to reduce the scope of this change--this guide focuses on self-hosted clusters. To make this a conceptual guide instead of a reference, this change removes the reference table from the `admin-guides` metrics page. There is already a table in the dedicated metrics reference guide. Note that, while the new metrics guide is specific to self-hosted clusters, this change does not move the guide to the subsection of Admin Guides related to self-hosting Teleport. Doing this would mean having one subsection of Admin Guides for diagnostics-related guides and one subsection for self-hosted-specific diagnostics, which is potentially confusing. We may also want to add Agent-specific metrics eventually. Finally, this change does not include alert thresholds for the metrics it describes. We can define these in a subsequent change. * Respond to evanfreed feedback - Describe `backend_write_requests_failed_precondition_total` - Include the precondition metric in the write availability formula. - Turn the `registered_servers` discussion into a discussion of Teleport instance version, since it's not possible to group this metric by service and subtract the count of Auth Service/Proxy Service instances from the count of all registered services.
Closes #40664
This change turns the Metrics guide in
admin-guides
into a conceptual guide to the most important metrics for monitoring a Teleport cluster.Since Agent metrics have inconsistent comprehensiveness across Teleport services--and to reduce the scope of this change--this guide focuses on self-hosted clusters.
To make this a conceptual guide instead of a reference, this change removes the reference table from the
admin-guides
metrics page. There is already a table in the dedicated metrics reference guide.Note that, while the new metrics guide is specific to self-hosted clusters, this change does not move the guide to the subsection of Admin Guides related to self-hosting Teleport. Doing this would mean having one subsection of Admin Guides for diagnostics-related guides and one subsection for self-hosted-specific diagnostics, which is potentially confusing. We may also want to add Agent-specific metrics eventually.
Finally, this change does not include alert thresholds for the metrics it describes. We can define these in a subsequent change.