Troubleshooting

Logs

SNMPCollector has a complete set of log files to review all what is happening while gathering snmp data from our infrastructure. All logs are located in the same directory LOG_DIR.

Default LOG_DIR is at /var/log/snmpcollector if snmpcollector has been installed with debian and redhat based packages. is at /opt/snmpcollector/log in docker and always can be set with -log option passed to the snmpcollector binary.

If installed with debian/redhat packages you can also tune this parameters in these files

rpm /etc/sysconfig/snmpcollector
deb /etc/default/snmpcollector

Main agent logs

$LOG_DIR/snmpcollector.log

Show basic initialisation process and the result of runtime administration from the web ui Default Level: set in the main config.toml file under the general section Supported Levels: panic,fatal,error,warn,info,debug Can be changed online?: no

HTTP access logs

$LOG_DIR/http_access.log

Show us all http access request the result and response time. This log has not levelling support

Device specific logs

$LOG_DIR/<device_id>.log

This is the main log when you if you have problems with only a set of devices and only under certain conditions.

Default Level: set in the device configuration section on the configuration database Supported Levels: panic,fatal,error,warn,info,debug Can be changed online?: yes in the runtime webui

SNMP debug logs

$LOG_DIR/snmpdebug_<device_id>_<measurement_id>.log

This log is disabled by default and can be enabled online on the webui , when stabilising snmp links with remote devices snmpcollector has one link by measurement. When enabling snmpdebug log each measurement on the device will create a new file with snmp protocol related debug. This debug will help us to review connection and or snmp protocol related problems.

SQL debug log

$LOG_DIR/sql.log

This log is disabled by default.

Default Level: set in the main config.toml file under the general section Supported Levels: on/off ( debug = true / debug = false) Can be changed online?: no

Self Monitoring

When snmpcollector has self-monitoring activate it can send data from itself to the "default" backend (you should have both selfmon active and one influx backend configured with id = "default".

You can activate on the main config file config.toml on the [selfmon] section.

[selfmon]
 #enable true/false enable/disable self monitoring
 enabled = true
 #send data Frequency
 freq = 60
 #prefix for measurement naming
 prefix = ""
 #inherit device tags (only apply to the selfmon_device_stats measurements)
 inheritdevicetags = true
 #adds extra tags to the measurement config should be set as a csv - tag=value1,tag2=value2,...,tagN=valN
 extratags = [ "instance=snmpcollector01" ]

When active it will send 2 measurements.

Dashboards

The following dashboards allow the user to see the internal statistics of SNMPCollector to know the status of the platform

Dashboard	Descripton	Required version
snmpcollector_platform_instance	Overview metrics to know the SNMPCollector instance status	SNMPCollector: 0.12.0+ Grafana: 7.5.5+
snmpcollector_platform_device	Detailed device view to show the device stats on a SNMPCollector instance	SNMPCollector: 0.12.0+ Grafana: 7.5.5+
snmpcollector_platform_measurement	Detailed measurement view to show the device stats on a SNMPCollector instance	SNMPCollector: 0.12.0+ Grafana: 7.5.5+

Defined Measurements

These are the defined measurements, where user can add prefix in the config.toml if needed.

measurement	description
selfmon_gvm	send statistics about the Go Virtual Machine.
selfmon_device_stats	send statistic data form each gathering device
selfmon_outdb_stats	statistics measurement for each output db

selfmon_gvm

FieldName	Source	Unit	Description
runtime_goroutines	runtime.NumGoroutine()	number	Number of currently running goroutines
mem.alloc	runtime.ReadMemStats.Alloc	bytes	Total bytes allocated
mem.mallocs	runtime.ReadMemStats.Mallocs	mallocs per second	Number of Mallocs issued to the system
mem.frees	runtime.ReadMemStats.Frees	frees per second	Number of frees issued to the system
mem.heapAlloc	runtime.ReadMemStats.HeapAlloc	bytes	allocated heap objects.
mem.stackInuse	runtime.ReadMemStats.StackInuse		bytes in stack spans. In-use stack spans have at least one stack in them. These spans can only be used for other stacks of the same size. There is no StackIdle because unused stack spans are returned to the heap (and hence counted toward HeapIdle).
gc.total_pause_ns	memStats.PauseTotalNs	ms	accumulated paused in ms
gc.pause_per_interval	memStats.PauseTotalNs	ms/interval	accumulated paused in ms since last gathered statistic
gc.pause_per_second	memStats.PauseTotalNs	ms/second	accumulated paused in ms per second (normalized)
gc.gc_per_interval	memStats.NumGC	#gc/second	number of gc's since last gathered statistic
gc.gc_per_second	memStats.NumGC	#gc/second	number of gc's per second ( normalized)

selfmon_device_stats [> 0.12.0]

From 0.12.0 statistics are taken from each measurement, could apply only on the measurement (M) or also could apply on device with some special kind of consolidation for the device period ( M/D )

FieldName	type	Apply on (M/D)	Description in Measurement Context	description y device context
active_value	integer	M/D	0 not active / 1 active, ,where activation depends on the device config	active for configuration
connected_value	integer	M/D	0 not connected / 1 connected	0 not connected if all measurements in the last gather period appears as not connected
snmp_oid_get_all	integer	M/D	All Gathered snmp metrics ( sum of snmpget oid's and all received oid's in snmpwalk queries) in this measurement	som of gathered snmp metrics for all measurements in the device period
snmp_oid_get_processed	integer	M/D	Gathered and processed snmp metrics after filters are applied ( not always sent to the backend it depends on the report flag) in this measurement	sum of processed metrics for all measurements in the device period
snmp_oid_get_errors	integer	M/D	number of oid with errors for this measurements	sum of oid with errors for all measurements in the device period
cycle_gather_start_time	integer	M/D	Last gathered time in unix timestamp	minimum timestamp for all gathered measurements in the device period
cycle_gather_duration	float	M/D	elapsed time taken to get this measurement data ( in seconds)	maximum elapsed time for all measurements finished in the device period
filter_start_time	integer	M/D	Last Applied Filter time in unix timestamp	minimum timestamp for all filtered measurements done in the device period
filter_duration	float	M/D	elapsed time taken to compute all applicable filters on the measurement in seconds	Sum of elapsed time for all filters applied on all measurements in the device period
backend_sent_start_time	integer	M/D	Last sent time to the internal output buffer ( as UNIX TIMESTAMP)	minimum timestamp for all sent done in the device period
backend_sent_duration	float	M/D	elapsed time taken to send data to the internal output buffer backend ( in seconds )	Sum of elapsed time for all process sent duration applied on all measurements in the device period
metric_sent	integer	M/D	number of metrics sent (taken as fields) for the measurement	Sum of metrics for each measurement in device period
metric_sent_errors	integer	M/D	number of metrics (taken as fields) with errors for all measurements	Sum of metrics with errors for each measurement in device period
measurement_sent	integer	M/D	number of series build to send as a single request sent to the backend	Sum of series for all measurements in the device period
measurement_sent_errors	integer	M/D	number of series with errors for this measurements	Sum of series with error in the device period

TagName	Description
active	true ( 1 for active_value field) or false( 0 on active_value field) as tag for fast filtering purposes
connected	"true" ( 1 for connected_value field) or "false" ( 0 on connected_value field) as tag for fast filtering purposes
device	device name where this statistic will apply
meas_name	the name of the measurement where this statistic will apply (only for measurement)
type	could be of type measurement (applied on the measurement context) o device (applied on the device context as a consolidation for the measurement stats)

NOTE each statistic could have more tags from config file extratags and from any device if inheritdevicetags = true

selfmon_device_stats [< 0.12.0]

FieldName	Description
snmp_oid_get_all	All Gathered snmp metrics ( sum of snmpget oid's and all received oid's in snmpwalk queries)
snmp_oid_get_processed	Gathered and processed snmp metrics after filters are applied ( not always sent to the backend it depends on the report flag)
snmp_oid_get_errors	number of oid with errors for all measurements
cycle_gather_start_time	Last gathered time in unix timestamp
cycle_gather_duration	elapsed time taken to get all measurement info in seconds
filter_start_time	Last Applied Filter time in unix timestamp
filter_duration	elapsed time taken to compute all applicable filters on the device in seconds
backend_sent_start_time	Last sent time to the internal output buffer
backend_sent_Duration	elapsed time taken to send data to the internal output buffer backend
metric_sent	number of metrics sent (taken as fields) for all measurements
metric_sent_errors	number of metrics (taken as fields) with errors for all measurements
measurement_sent	(number of series build to send as a single request sent to the backend)
measurement_sent_errors	number of series build to send as a single request with errors for all measurements

selfmon_outdb_stats

field	description
write_sent	number of HTTP writes sent to the DB (each write sends a batchPoint) on the last period
write_error	number of HTTP write errors on the period
points_sent	number of Points sent on each Write (on each BatchPoint) on the last period
points_sent_max	max number of points sent on all writes on the last period
points_sent_avg	(only if write_sent > 0) averaged points sent for all writes on the last period
write_time	sum of all HTTP response times on all writes on the last period
write_time_max	max HTTP response time in all writes on the last period
write_time_avg	(only if write_sent > 0) average response time for all writes on the last period
fields_sent	number of fields sent to the DB on the last period
fields_sent_max	max number of fields sent to the DB on the last period
buffer_percent_used	percent of the usage of the total buffer used for each.

SNMPCollector

Components

Runtime

Runtime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting

Troubleshooting

Logs

Main agent logs

HTTP access logs

Device specific logs

SNMP debug logs

SQL debug log

Self Monitoring

Dashboards

Defined Measurements

selfmon_gvm

selfmon_device_stats [> 0.12.0]

selfmon_device_stats [< 0.12.0]

selfmon_outdb_stats

SNMPCollector

Components

Runtime

Data management

TroubleShooting

Examples

Public Template DB

Clone this wiki locally