You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.
Running dcgm-exporter 2.1.8 connecting to nv-hostengine via DCGM_REMOTE_HOSTENGINE_INFO=localhost:5555
If nv-hostengine is restarted, dcgm-exporter starts repeating the below message every 30 secs. Meanwhile it continues to serve up old metrics from the last point prior to the restart. The /health endpoint indicates that everything is fine.
time="2021-05-12T17:31:12Z" level=error msg="Failed to collect metrics with error: Failed to collect metrics with error: Error getting the latest value for fields: Host engine connection invalid/disconnected"
time="2021-05-12T17:31:42Z" level=error msg="Failed to collect metrics with error: Failed to collect metrics with error: Error getting the latest value for fields: Host engine connection invalid/disconnected"
dcgm-exporter should either crash hard in response to this error, or re-connect to nv-hostengine. It should not continue to report stale metrics.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Running dcgm-exporter 2.1.8 connecting to nv-hostengine via
DCGM_REMOTE_HOSTENGINE_INFO=localhost:5555
If nv-hostengine is restarted, dcgm-exporter starts repeating the below message every 30 secs. Meanwhile it continues to serve up old metrics from the last point prior to the restart. The
/health
endpoint indicates that everything is fine.dcgm-exporter should either crash hard in response to this error, or re-connect to nv-hostengine. It should not continue to report stale metrics.
The text was updated successfully, but these errors were encountered: