-
Notifications
You must be signed in to change notification settings - Fork 54
Troubleshooting
SNMPCollector has a complete set of log files to review all what is happening while gathering snmp data from our infrastructure. All logs are located in the same directory LOG_DIR.
Default LOG_DIR is at /var/log/snmpcollector
if snmpcollector has been installed with debian and redhat based packages. is at /opt/snmpcollector/log
in docker and always can be set with -log option passed to the snmpcollector binary.
If installed with debian/redhat packages you can also tune this parameters in these files
- rpm /etc/sysconfig/snmpcollector
- deb /etc/default/snmpcollector
$LOG_DIR/snmpcollector.log
Show basic initialisation process and the result of runtime administration from the web ui Default Level: set in the main config.toml file under the general section Supported Levels: panic,fatal,error,warn,info,debug Can be changed online?: no
$LOG_DIR/http_access.log
Show us all http access request the result and response time. This log has not levelling support
$LOG_DIR/<device_id>.log
This is the main log when you if you have problems with only a set of devices and only under certain conditions.
Default Level: set in the device configuration section on the configuration database Supported Levels: panic,fatal,error,warn,info,debug Can be changed online?: yes in the runtime webui
$LOG_DIR/snmpdebug_<device_id>_<measurement_id>.log
This log is disabled by default and can be enabled online on the webui , when stabilising snmp links with remote devices snmpcollector has one link by measurement. When enabling snmpdebug log each measurement on the device will create a new file with snmp protocol related debug. This debug will help us to review connection and or snmp protocol related problems.
$LOG_DIR/sql.log
This log is disabled by default.
Default Level: set in the main config.toml file under the general section Supported Levels: on/off ( debug = true / debug = false) Can be changed online?: no
When snmpcollector has self-monitoring activate it can send data from itself to the "default" backend (you should have both selfmon active and one influx backend configured with id = "default".
You can activate on the main config file config.toml on the [selfmon] section.
[selfmon]
#enable true/false enable/disable self monitoring
enabled = true
#send data Frequency
freq = 60
#prefix for measurement naming
prefix = ""
#inherit device tags (only apply to the selfmon_device_stats measurements)
inheritdevicetags = true
#adds extra tags to the measurement config should be set as a csv - tag=value1,tag2=value2,...,tagN=valN
extratags = [ "instance=snmpcollector01" ]
When active it will send 2 measurements.
The following dashboards allow the user to see the internal statistics of SNMPCollector to know the status of the platform
Dashboard | Descripton | Required version |
---|---|---|
snmpcollector_platform_instance | Overview metrics to know the SNMPCollector instance status |
|
snmpcollector_platform_device | Detailed device view to show the device stats on a SNMPCollector instance |
|
snmpcollector_platform_measurement | Detailed measurement view to show the device stats on a SNMPCollector instance |
|
These are the defined measurements, where user can add prefix in the config.toml if needed.
measurement | description |
---|---|
selfmon_gvm | send statistics about the Go Virtual Machine. |
selfmon_device_stats | send statistic data form each gathering device |
selfmon_outdb_stats | statistics measurement for each output db |
FieldName | Source | Unit | Description |
---|---|---|---|
runtime_goroutines | runtime.NumGoroutine() | number | Number of currently running goroutines |
mem.alloc | runtime.ReadMemStats.Alloc | bytes | Total bytes allocated |
mem.mallocs | runtime.ReadMemStats.Mallocs | mallocs per second | Number of Mallocs issued to the system |
mem.frees | runtime.ReadMemStats.Frees | frees per second | Number of frees issued to the system |
mem.heapAlloc | runtime.ReadMemStats.HeapAlloc | bytes | allocated heap objects. |
mem.stackInuse | runtime.ReadMemStats.StackInuse | bytes in stack spans. In-use stack spans have at least one stack in them. These spans can only be used for other stacks of the same size. There is no StackIdle because unused stack spans are returned to the heap (and hence counted toward HeapIdle). | |
gc.total_pause_ns | memStats.PauseTotalNs | ms | accumulated paused in ms |
gc.pause_per_interval | memStats.PauseTotalNs | ms/interval | accumulated paused in ms since last gathered statistic |
gc.pause_per_second | memStats.PauseTotalNs | ms/second | accumulated paused in ms per second (normalized) |
gc.gc_per_interval | memStats.NumGC | #gc/second | number of gc's since last gathered statistic |
gc.gc_per_second | memStats.NumGC | #gc/second | number of gc's per second ( normalized) |
From 0.12.0 statistics are taken from each measurement, could apply only on the measurement (M) or also could apply on device with some special kind of consolidation for the device period ( M/D )
FieldName | type | Apply on (M/D) | Description in Measurement Context | description y device context |
---|---|---|---|---|
active_value | integer | M/D | 0 not active / 1 active, ,where activation depends on the device config | active for configuration |
connected_value | integer | M/D | 0 not connected / 1 connected | 0 not connected if all measurements in the last gather period appears as not connected |
snmp_oid_get_all | integer | M/D | All Gathered snmp metrics ( sum of snmpget oid's and all received oid's in snmpwalk queries) in this measurement | som of gathered snmp metrics for all measurements in the device period |
snmp_oid_get_processed | integer | M/D | Gathered and processed snmp metrics after filters are applied ( not always sent to the backend it depends on the report flag) in this measurement | sum of processed metrics for all measurements in the device period |
snmp_oid_get_errors | integer | M/D | number of oid with errors for this measurements | sum of oid with errors for all measurements in the device period |
cycle_gather_start_time | integer | M/D | Last gathered time in unix timestamp | minimum timestamp for all gathered measurements in the device period |
cycle_gather_duration | float | M/D | elapsed time taken to get this measurement data ( in seconds) | maximum elapsed time for all measurements finished in the device period |
filter_start_time | integer | M/D | Last Applied Filter time in unix timestamp | minimum timestamp for all filtered measurements done in the device period |
filter_duration | float | M/D | elapsed time taken to compute all applicable filters on the measurement in seconds | Sum of elapsed time for all filters applied on all measurements in the device period |
backend_sent_start_time | integer | M/D | Last sent time to the internal output buffer ( as UNIX TIMESTAMP) | minimum timestamp for all sent done in the device period |
backend_sent_duration | float | M/D | elapsed time taken to send data to the internal output buffer backend ( in seconds ) | Sum of elapsed time for all process sent duration applied on all measurements in the device period |
metric_sent | integer | M/D | number of metrics sent (taken as fields) for the measurement | Sum of metrics for each measurement in device period |
metric_sent_errors | integer | M/D | number of metrics (taken as fields) with errors for all measurements | Sum of metrics with errors for each measurement in device period |
measurement_sent | integer | M/D | number of series build to send as a single request sent to the backend | Sum of series for all measurements in the device period |
measurement_sent_errors | integer | M/D | number of series with errors for this measurements | Sum of series with error in the device period |
TagName | Description |
---|---|
active | true ( 1 for active_value field) or false( 0 on active_value field) as tag for fast filtering purposes |
connected | "true" ( 1 for connected_value field) or "false" ( 0 on connected_value field) as tag for fast filtering purposes |
device | device name where this statistic will apply |
meas_name | the name of the measurement where this statistic will apply (only for measurement) |
type | could be of type measurement (applied on the measurement context) o device (applied on the device context as a consolidation for the measurement stats) |
NOTE each statistic could have more tags from config file extratags
and from any device if inheritdevicetags = true
FieldName | Description |
---|---|
snmp_oid_get_all | All Gathered snmp metrics ( sum of snmpget oid's and all received oid's in snmpwalk queries) |
snmp_oid_get_processed | Gathered and processed snmp metrics after filters are applied ( not always sent to the backend it depends on the report flag) |
snmp_oid_get_errors | number of oid with errors for all measurements |
cycle_gather_start_time | Last gathered time in unix timestamp |
cycle_gather_duration | elapsed time taken to get all measurement info in seconds |
filter_start_time | Last Applied Filter time in unix timestamp |
filter_duration | elapsed time taken to compute all applicable filters on the device in seconds |
backend_sent_start_time | Last sent time to the internal output buffer |
backend_sent_Duration | elapsed time taken to send data to the internal output buffer backend |
metric_sent | number of metrics sent (taken as fields) for all measurements |
metric_sent_errors | number of metrics (taken as fields) with errors for all measurements |
measurement_sent | (number of series build to send as a single request sent to the backend) |
measurement_sent_errors | number of series build to send as a single request with errors for all measurements |
field | description |
---|---|
write_sent | number of HTTP writes sent to the DB (each write sends a batchPoint) on the last period |
write_error | number of HTTP write errors on the period |
points_sent | number of Points sent on each Write (on each BatchPoint) on the last period |
points_sent_max | max number of points sent on all writes on the last period |
points_sent_avg | (only if write_sent > 0) averaged points sent for all writes on the last period |
write_time | sum of all HTTP response times on all writes on the last period |
write_time_max | max HTTP response time in all writes on the last period |
write_time_avg | (only if write_sent > 0) average response time for all writes on the last period |
fields_sent | number of fields sent to the DB on the last period |
fields_sent_max | max number of fields sent to the DB on the last period |
buffer_percent_used | percent of the usage of the total buffer used for each. |