Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time and memory related db operation metrics #1203

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .chloggen/add-db-operation-time-and-memory-metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: db

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add databse operation metrics related to memory and time spent
dnaik-sfx marked this conversation as resolved.
Show resolved Hide resolved

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [ 723 ]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
1 change: 1 addition & 0 deletions docs/database/database-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ of `[ 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 ]`.
| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [5] | `findAndModify`; `HMSET`; `SELECT` | `Conditionally Required` [6] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [7] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [8] | `80`; `8080`; `443` | `Conditionally Required` [9] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`thread.id`](/docs/attributes-registry/thread.md) | int | Current "managed" thread ID (as opposed to OS thread ID). | `42` | `Conditionally Required` If available | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the database node where the operation was performed. [10] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` If applicable for this database system. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`network.peer.port`](/docs/attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | `Recommended` If and only if `network.peer.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [11] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
Expand Down
148 changes: 148 additions & 0 deletions model/metrics/database-metrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,154 @@ groups:
- ref: db.namespace
requirement_level:
conditionally_required: If available.
- ref: thread.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the addition of the thread.id on all of these. Do you have a use case in mind with this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My colleague mentioned the rationale in his comment here

In short, for the case of single-threaded operations, the thread id could be useful for troubleshooting slow/blocked queries, and it might also be helpful to aggregate stats on a certain thread for diagnosing performance issues. But we're open to discussion on the attribute and dropping it from the spec if the consensus is that these use cases are too niche and/or not really practical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does thread id add to the cardinality in an uncontrolled way here?

requirement_level:
conditionally_required: If available

- id: metric.db.client.operation.wait_time
type: metric
metric_name: db.client.operation.wait_time
brief: "Time spent waiting by database client operations, by category (if available)."
instrument: histogram
unit: "s"
stability: experimental
extends: attributes.db.client.minimal
attributes:
- ref: db.collection.name
requirement_level:
conditionally_required: >
If readily available. The collection name MAY be parsed from the query text,
in which case it SHOULD be the first collection name in the query.
- ref: db.system
# TODO: Not adding to the minimal because of https://github.com/open-telemetry/build-tools/issues/192
requirement_level: required
- ref: network.peer.address
brief: Peer address of the database node where the operation was performed.
requirement_level:
recommended: If applicable for this database system.
note: >
Semantic conventions for individual database systems SHOULD document whether `network.peer.*` attributes are applicable.
Network peer address and port are useful when the application interacts with individual database nodes directly.

If a database operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used.
- ref: network.peer.port
requirement_level:
recommended: If and only if `network.peer.address` is set.
- ref: db.namespace
requirement_level:
conditionally_required: If available.
- ref: thread.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe say "if available and stable for the lifetime of the operation".

IF using, e.g. async IO and the handler of the response is different than the calling thread - you don't want to create new timeseries because thread.id changes. Can we make that more explicit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done

requirement_level:
conditionally_required: If available

- id: metric.db.client.operation.cpu_time
type: metric
metric_name: db.client.operation.cpu_time
brief: "CPU time spent for database client operations."
instrument: histogram
unit: "s"
stability: experimental
extends: attributes.db.client.minimal
attributes:
- ref: db.collection.name
requirement_level:
conditionally_required: >
If readily available. The collection name MAY be parsed from the query text,
in which case it SHOULD be the first collection name in the query.
- ref: db.system
# TODO: Not adding to the minimal because of https://github.com/open-telemetry/build-tools/issues/192
requirement_level: required
- ref: network.peer.address
brief: Peer address of the database node where the operation was performed.
requirement_level:
recommended: If applicable for this database system.
note: >
Semantic conventions for individual database systems SHOULD document whether `network.peer.*` attributes are applicable.
Network peer address and port are useful when the application interacts with individual database nodes directly.

If a database operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used.
- ref: network.peer.port
requirement_level:
recommended: If and only if `network.peer.address` is set.
- ref: db.namespace
requirement_level:
conditionally_required: If available.
- ref: thread.id
requirement_level:
conditionally_required: If available

- id: metric.db.client.operation.pages_scanned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since not all databases use pages, I don't think it makes sense to have this on the "general" view of the database. If there are things that only make sense for a specific group, is worth having those split from here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely get the reasoning there and don't necessarily disagree, but if the concept of "pages" (or another example) is shared by multiple databases, either now or in the future, would it make more sense to have a common metric name for them or multiple db-specific names? And could it be more confusing to have some metrics have common names while others would fall under db-specific namespaces? Happy to move these metrics elsewhere if these topics have already been discussed/answered though.

type: metric
metric_name: db.client.operation.pages_scanned
brief: "Pages scanned for database client operations."
instrument: histogram
unit: "{pages}"
stability: experimental
extends: attributes.db.client.minimal
attributes:
- ref: db.collection.name
requirement_level:
conditionally_required: >
If readily available. The collection name MAY be parsed from the query text,
in which case it SHOULD be the first collection name in the query.
- ref: db.system
# TODO: Not adding to the minimal because of https://github.com/open-telemetry/build-tools/issues/192
requirement_level: required
- ref: network.peer.address
brief: Peer address of the database node where the operation was performed.
requirement_level:
recommended: If applicable for this database system.
note: >
Semantic conventions for individual database systems SHOULD document whether `network.peer.*` attributes are applicable.
Network peer address and port are useful when the application interacts with individual database nodes directly.

If a database operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used.
- ref: network.peer.port
requirement_level:
recommended: If and only if `network.peer.address` is set.
- ref: db.namespace
requirement_level:
conditionally_required: If available.
- ref: thread.id
requirement_level:
conditionally_required: If available

- id: metric.db.client.operation.used_memory
type: metric
metric_name: db.client.operation.used_memory
brief: "Memory used for database client operations."
instrument: histogram
unit: "By"
stability: experimental
extends: attributes.db.client.minimal
attributes:
- ref: db.collection.name
requirement_level:
conditionally_required: >
If readily available. The collection name MAY be parsed from the query text,
in which case it SHOULD be the first collection name in the query.
- ref: db.system
# TODO: Not adding to the minimal because of https://github.com/open-telemetry/build-tools/issues/192
requirement_level: required
- ref: network.peer.address
brief: Peer address of the database node where the operation was performed.
requirement_level:
recommended: If applicable for this database system.
note: >
Semantic conventions for individual database systems SHOULD document whether `network.peer.*` attributes are applicable.
Network peer address and port are useful when the application interacts with individual database nodes directly.

If a database operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used.
- ref: network.peer.port
requirement_level:
recommended: If and only if `network.peer.address` is set.
- ref: db.namespace
requirement_level:
conditionally_required: If available.
- ref: thread.id
requirement_level:
conditionally_required: If available

- id: metric.db.client.connection.count
type: metric
metric_name: db.client.connection.count
Expand Down
Loading