Basicstats counter not working as expected #5595

toni-moreno · 2019-03-17T18:52:54Z

Description

I've trying to config an easy way to get apache statistics as described in #3991 , but while doing some testing with tail / logparser , I noticed something is not working to me , perhaps there is some missing config ? If not could be becouse of a bug in the agregator (perhaps there is some point filtering before send them to the aggregation process)?

Relevant telegraf.conf:

This is the config to put in /etc/telegraf/telegraf.d/testlog.conf to reproduce

[[inputs.tail]]
  name_override = "apachelog2"
  files = ["/var/log/apache2/test.log"]
  from_beginning = false
  pipe = false
  data_format = "grok"
  grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"]

[[aggregators.basicstats]]
  namepass = ["apachelog2"]
  period = "60s"
  drop_original = true
  fieldpass = ["resp_time"]
  stats = ["count","max","min","mean"]

System info:

root@ipasdev01:~# telegraf --version
Telegraf 1.10.0 (git: HEAD fe33ee89)
root@ipasdev01:~# cat /etc/debian_version 
buster/sid
root@ipasdev01:~# uname -a
Linux ipasdev01 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Steps to reproduce:

1.- Create the previous config file (testlog.conf) , put in the telegraf.d directory and restart telegraf

2.- Create this script , to simulate a log file ( always with the same input )

#!/bin/bash
while true
do
	D=`date +"%D-%T"`
	echo "$D /200_1sec.php 1000000">>/var/log/apache2/test.log
	echo "$D /200_1sec.php 1000000">>/var/log/apache2/test.log
	echo "$D /200_2sec.php 2000000">>/var/log/apache2/test.log
	sleep 2
done

Execute in background

nohup ~/telegraf_log_test.sh &

Expected behavior:

I expect to have:

60 times/minute the request 200_1sec.php
30 times/minute the request 200_2sec.php

As counted with grep and wc -l

root@ipasdev01:$ grep "19:29:"  /var/log/apache2/test.log  | grep "200_1sec" | wc -l
60
root@ipasdev01:$ grep "19:29:"  /var/log/apache2/test.log  | grep "200_2sec" | wc -l
30

Actual behavior:

Basicstats agregator is counting only:

8 times/min the request "200_1sec.php"
4 times/min the request "200_2sec.php"

> select sum(resp_time_count) from apachelog2 where time > now()  - 2m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time                 sum
----                 ---
2019-03-17T18:28:00Z 
2019-03-17T18:29:00Z 8
2019-03-17T18:30:00Z 8

name: apachelog2
tags: request=/200_2sec.php
time                 sum
----                 ---
2019-03-17T18:28:00Z 
2019-03-17T18:29:00Z 4
2019-03-17T18:30:00Z 4

The text was updated successfully, but these errors were encountered:

toni-moreno · 2019-03-18T05:32:11Z

I forgot include the complete config and log

Telegraf Full config

/etc/telegraf/telegraf.conf

[global_tags]

   l_tag1 = "a" 
   l_tag2 = "b" 
   l_tag3 = "c" 

[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "10s"
  debug = true 
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  omit_hostname = false 
[[inputs.internal]]
  # collect_memstats = true
  [inputs.internal.tags]
  ifx_db = "internal_telegraf"
[[outputs.influxdb]]
  urls = ["http://influxdb.mydomain.org:8086"] 
  database = "_telegraf" 
   username = "xxxxxxxxx"
   password = "xxxxxxxxx"

  
  #Filter influx output
  tagexclude = ["ifx_db"]
  [outputs.influxdb.tagpass]
  ifx_db = ["internal_telegraf"]

/etc/telegraf/telegraf.d/input_apache.conf

[[inputs.tail]]
  name_override = "apachelog2"
  files = ["/var/log/apache2/test.log"]
  from_beginning = false 
  pipe = false
  data_format = "grok"
  grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"]
  [inputs.tail.tags]
    instance="instance1"
    ifx_db="ws_apache"
[[aggregators.basicstats]]
  namepass = ["apachelog2"]
  period = "60s"
  drop_original = true
  fieldpass = ["resp_time"]
  stats = ["count","max","min","mean"]

/etc/telegraf/telegraf.d/output_apache.conf

[[outputs.influxdb]]
  urls = ["http://influxdb.mydomain.org:8086"]
  database = "apache_metrics" 
  username = "xxxxxxx"
  password = "xxxxxxx"
  #Filter metrics
  tagexclude = ["ifx_db"]
  [outputs.influxdb.tagpass]
  ifx_db = ["ws_apache"]

Output Log

2019-03-18T05:18:09Z I! Loaded inputs: internal tail
2019-03-18T05:18:09Z I! Loaded aggregators: basicstats
2019-03-18T05:18:09Z I! Loaded processors: 
2019-03-18T05:18:09Z I! Loaded outputs: influxdb influxdb
2019-03-18T05:18:09Z I! Tags enabled: host=ipasdev01 l_tag1=a l_tag2=b l_tag3=c
2019-03-18T05:18:09Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ipasdev01", Flush Interval:10s
2019-03-18T05:18:09Z D! [agent] Connecting outputs
2019-03-18T05:18:09Z D! [agent] Attempting connection to output: influxdb
2019-03-18T05:18:09Z D! [agent] Successfully connected to output: influxdb
2019-03-18T05:18:09Z D! [agent] Attempting connection to output: influxdb
2019-03-18T05:18:09Z D! [agent] Successfully connected to output: influxdb
2019-03-18T05:18:09Z D! [agent] Starting service inputs
2019-03-18T05:18:09Z D! [inputs.tail] tail added for file: /var/log/apache2/test.log
2019-03-18T05:18:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:18:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:00Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.384325ms
2019-03-18T05:19:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:19:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 11.926215ms
2019-03-18T05:20:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.559029ms
2019-03-18T05:20:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:20:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 24.70911ms
2019-03-18T05:21:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 28.525874ms
2019-03-18T05:21:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:21:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 15.273403ms
2019-03-18T05:22:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 15.221005ms
2019-03-18T05:22:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:22:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 11.496482ms
2019-03-18T05:23:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.704723ms
2019-03-18T05:23:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:23:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 12.81991ms
2019-03-18T05:24:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 52.024487ms
2019-03-18T05:24:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:24:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 14.451988ms
2019-03-18T05:25:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 14.894181ms
2019-03-18T05:25:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:25:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 15.351588ms
2019-03-18T05:26:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 20.242445ms
2019-03-18T05:26:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:26:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 12.704176ms
2019-03-18T05:27:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.389107ms
2019-03-18T05:27:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:27:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 10.827497ms
2019-03-18T05:28:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 15.290605ms
2019-03-18T05:28:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:28:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:29:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:29:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:29:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 14.191643ms
2019-03-18T05:29:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:29:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 18.257915ms
2019-03-18T05:29:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:29:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2019-03-18T05:29:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics.

toni-moreno · 2019-03-18T10:06:09Z

More interesting data ...

I't seems like counting changes with time..

In the morning...

> select sum(resp_time_count) from apachelog2 where time > now()  - 2m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time                 sum
----                 ---
2019-03-18T05:29:00Z 
2019-03-18T05:30:00Z 10
2019-03-18T05:31:00Z 10

name: apachelog2
tags: request=/200_2sec.php
time                 sum
----                 ---
2019-03-18T05:29:00Z 
2019-03-18T05:30:00Z 5
2019-03-18T05:31:00Z 5

Some hours after

> select sum(resp_time_count) from apachelog2 where time > now()  - 2m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time                 sum
----                 ---
2019-03-18T09:35:00Z 
2019-03-18T09:36:00Z 36
2019-03-18T09:37:00Z 36

name: apachelog2
tags: request=/200_2sec.php
time                 sum
----                 ---
2019-03-18T09:35:00Z 
2019-03-18T09:36:00Z 18
2019-03-18T09:37:00Z 18

danielnelson · 2019-03-19T00:17:25Z

@toni-moreno I'm not able to replicate, can you try to remove the tag routing to see if it helps?:

select sum(resp_time_count) from apachelog2 where time > now()  - 4m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time                 sum
----                 ---
2019-03-19T00:00:00Z 
2019-03-19T00:01:00Z 52
2019-03-19T00:02:00Z 60
2019-03-19T00:03:00Z 60
2019-03-19T00:04:00Z 60

name: apachelog2
tags: request=/200_2sec.php
time                 sum
----                 ---
2019-03-19T00:00:00Z 
2019-03-19T00:01:00Z 26
2019-03-19T00:02:00Z 30
2019-03-19T00:03:00Z 30
2019-03-19T00:04:00Z 30

Output with a file output:

apachelog2,host=loaner,path=test.log,request=/200_2sec.php resp_time_count=30,resp_time_max=2000000,resp_time_mean=2000000,resp_time_min=2000000 1552954029000000000
apachelog2,host=loaner,path=test.log,request=/200_1sec.php resp_time_count=60,resp_time_max=1000000,resp_time_mean=1000000,resp_time_min=1000000 1552954029000000000
apachelog2,host=loaner,path=test.log,request=/200_2sec.php resp_time_count=30,resp_time_max=2000000,resp_time_mean=2000000,resp_time_min=2000000 1552954089000000000
apachelog2,host=loaner,path=test.log,request=/200_1sec.php resp_time_count=60,resp_time_max=1000000,resp_time_mean=1000000,resp_time_min=1000000 1552954089000000000
apachelog2,host=loaner,path=test.log,request=/200_2sec.php resp_time_count=30,resp_time_max=2000000,resp_time_mean=2000000,resp_time_min=2000000 1552954149000000000
apachelog2,host=loaner,path=test.log,request=/200_1sec.php resp_time_count=60,resp_time_max=1000000,resp_time_mean=1000000,resp_time_min=1000000 1552954149000000000

sbengo · 2019-03-19T10:58:05Z

Hi @danielnelson , I'm working with @toni-moreno and after some tests I have seen the same wrong results as he had described on this issue.

Version: Telegraf 1.10.0 (git: HEAD fe33ee8)
Config: Same
Data: Same script
Routing tag: disabled/enabled, no difference seen

Test:

I have done the tests on 2 different hosts with invalid results.
In some restarts the values changed, but they were still invalid.

A: Debian VM on VirtualBox:

Linux snmpcoldev01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
debian_version: 8.4

Result:

> select sum(resp_time_count) from apachelog2 where time > now() - 4m group by time(1m), request;
name: apachelog2
tags: request=/200_1sec.php
time                sum
----                ---
1552992300000000000 
1552992360000000000 24
1552992420000000000 24
1552992480000000000 24
1552992540000000000 24

name: apachelog2
tags: request=/200_2sec.php
time                sum
----                ---
1552992300000000000 
1552992360000000000 12
1552992420000000000 12
1552992480000000000 12
1552992540000000000 12

B: Redhat 7.5 VM on VMWare vSphere:

Red Hat Enterprise Linux Server release 7.5 (Maipo)

Result:

> select sum(resp_time_count) from apachelog2 where time > now() - 4m group by time(1m), request;
name: apachelog2
tags: request=/200_1sec.php
time                sum
----                ---
1552992060000000000
1552992120000000000 26
1552992180000000000 26
1552992240000000000 26
1552992300000000000 26

name: apachelog2
tags: request=/200_2sec.php
time                sum
----                ---
1552992060000000000
1552992120000000000 13
1552992180000000000 13
1552992240000000000 13
1552992300000000000 13

danielnelson · 2019-03-20T03:29:30Z

Can you try using the watch_method = "poll" option to tail and see if it has any effect.

If nothing changes, can you place the 3 log lines into a file and substitute the tail input with the file input:

[[inputs.file]]
  interval = "2s"
  name_override = "apachelog2"
  files = ["test.log"]
  data_format = "grok"
  grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"]

test.log:

03/18/19-17:00:11 /200_1sec.php 1000000
03/18/19-17:00:11 /200_1sec.php 1000000
03/18/19-17:00:11 /200_2sec.php 2000000

sbengo · 2019-03-20T11:21:12Z

Hi @danielnelson, thanks for the answer!

First of all, on each of the test done, we have check that setting up drop_original to false seems that the log parser retrieved all lines and they had been written OK to the DB.
The problem seems to be related with the aggregator.

Following with your recommendations:

Set up `watch_method = poll`

Same wrong result as described on the issue.

Test with `input.file` plugin

Result:

It seems that the results are wrong. We expect to have 30 and 60

> select sum(resp_time_count) from apachelog2 where time > now() - 4m group by time(1m),request
name: apachelog2
tags: request=/200_1sec.php
time                sum
----                ---
1553079840000000000
1553079900000000000 20
1553079960000000000 20
1553080020000000000 20
1553080080000000000 20

name: apachelog2
tags: request=/200_2sec.php
time                sum
----                ---
1553079840000000000
1553079900000000000 10
1553079960000000000 10
1553080020000000000 10
1553080080000000000 10

More info:

With drop_original = false and with the following query we can see that the points are being read OK from file.
Note: in order to have all points, we changed the precision = 10s to precision = 's'

> select count(resp_time), sum(resp_time_count) from apachelog2 where time > now() - 3m group by time(1m),request
name: apachelog2
tags: request=/200_1sec.php
time                count sum
----                ----- ---
1553079900000000000 6
1553079960000000000 30    20
1553080020000000000 30    20
1553080080000000000 20    20

name: apachelog2
tags: request=/200_2sec.php
time                count sum
----                ----- ---
1553079900000000000 6
1553079960000000000 30    10
1553080020000000000 30    10
1553080080000000000 20    10

Raw:

time                request       resp_time resp_time_count resp_time_max resp_time_mean resp_time_min
----                -------       --------- --------------- ------------- -------------- -------------
1553080082000000000 /200_1sec.php 1000000
1553080082000000000 /200_2sec.php 2000000
1553080084000000000 /200_1sec.php 1000000
1553080084000000000 /200_2sec.php 2000000
1553080086000000000 /200_1sec.php 1000000
1553080086000000000 /200_2sec.php 2000000
1553080088000000000 /200_1sec.php 1000000
1553080088000000000 /200_2sec.php 2000000
1553080090000000000 /200_1sec.php 1000000
1553080090000000000 /200_2sec.php 2000000
1553080092000000000 /200_1sec.php 1000000
1553080092000000000 /200_2sec.php 2000000
1553080094000000000 /200_1sec.php 1000000
1553080094000000000 /200_2sec.php 2000000
1553080096000000000 /200_1sec.php 1000000
1553080096000000000 /200_2sec.php 2000000
1553080098000000000 /200_1sec.php 1000000
1553080098000000000 /200_2sec.php 2000000
1553080100000000000 /200_1sec.php 1000000
1553080100000000000 /200_2sec.php 2000000
1553080102000000000 /200_1sec.php 1000000
1553080102000000000 /200_2sec.php 2000000
1553080104000000000 /200_1sec.php 1000000
1553080104000000000 /200_2sec.php 2000000
1553080106000000000 /200_1sec.php 1000000
1553080106000000000 /200_2sec.php 2000000
1553080108000000000 /200_1sec.php 1000000
1553080108000000000 /200_2sec.php 2000000
1553080110000000000 /200_1sec.php 1000000
1553080110000000000 /200_2sec.php 2000000
1553080112000000000 /200_1sec.php 1000000
1553080112000000000 /200_2sec.php 2000000
1553080114000000000 /200_1sec.php 1000000
1553080114000000000 /200_2sec.php 2000000
1553080116000000000 /200_1sec.php 1000000
1553080116000000000 /200_2sec.php 2000000
1553080118000000000 /200_1sec.php 1000000
1553080118000000000 /200_2sec.php 2000000
1553080120000000000 /200_1sec.php 1000000
1553080120000000000 /200_2sec.php 2000000
1553080122000000000 /200_1sec.php 1000000
1553080122000000000 /200_2sec.php 2000000
1553080124000000000 /200_1sec.php 1000000
1553080124000000000 /200_2sec.php 2000000
1553080126000000000 /200_1sec.php 1000000
1553080126000000000 /200_2sec.php 2000000
1553080128000000000 /200_1sec.php 1000000
1553080128000000000 /200_2sec.php 2000000
1553080130000000000 /200_1sec.php 1000000
1553080130000000000 /200_2sec.php 2000000
1553080132000000000 /200_1sec.php 1000000
1553080132000000000 /200_2sec.php 2000000
1553080134000000000 /200_1sec.php 1000000
1553080134000000000 /200_2sec.php 2000000
1553080136000000000 /200_1sec.php 1000000
1553080136000000000 /200_2sec.php 2000000
1553080138000000000 /200_1sec.php 1000000
1553080138000000000 /200_2sec.php 2000000
1553080140000000000 /200_1sec.php 1000000   20              1000000       1000000        1000000
1553080140000000000 /200_2sec.php 2000000   10              2000000       2000000        2000000

danielnelson · 2019-03-20T19:38:20Z

If you enable the internal plugin what are the metrics_dropped on internal_aggregate?

internal_aggregate,aggregator=basicstats metrics_dropped=0i

sbengo · 2019-03-21T08:24:31Z

Hi @danielnelson ,

We enabled the internal plugin and, as you can see on the following results, seems that:

<metrics_dropped> = <expected_value> - <sum(resp_time_count)>

Test with `internal`

We did a restart to see if the metrics_dropped would change

Expected : 90
T1: 39 = 90 - (34+17)
T2: 78 = 90 - (8+4)

We did the test with the input_file, with the same config than the above comment

Result

select non_negative_derivative(last(metrics_dropped),1m) from "internal_aggregate" where time  > now() - 20m group by time(1m);
select sum(resp_time_count) from "apachelog2" where time > now() - 20m group by time(1m),request

name: internal_aggregate
time                non_negative_derivative
----                -----------------------
1553154840000000000 39
1553154900000000000 39
1553154960000000000 39
1553155020000000000 39
1553155080000000000 39
1553155140000000000 39
1553155200000000000 39
1553155260000000000 39
1553155320000000000 39
<telegraf_restart>
1553155500000000000 78
1553155560000000000 78
1553155620000000000 78
1553155680000000000 78
1553155740000000000 78
1553155800000000000 78
1553155860000000000 78

name: apachelog2
tags: request=/200_1sec.php
time                sum
----                ---
1553154840000000000 34
1553154900000000000 34
1553154960000000000 34
1553155020000000000 34
1553155080000000000 34
1553155140000000000 34
1553155200000000000 34
1553155260000000000 34
1553155320000000000 34
<telegraf_restart>
1553155500000000000 8
1553155560000000000 8
1553155620000000000 8
1553155680000000000 8
1553155740000000000 8
1553155800000000000 8
1553155860000000000 8

name: apachelog2
tags: request=/200_2sec.php
time                sum
----                ---
1553154840000000000 17
1553154900000000000 17
1553154960000000000 17
1553155020000000000 17
1553155080000000000 17
1553155140000000000 17
1553155200000000000 17
1553155260000000000 17
1553155320000000000 17
<telegraf_restart>
1553155500000000000 4
1553155560000000000 4
1553155620000000000 4
1553155680000000000 4
1553155740000000000 4
1553155800000000000 4
1553155860000000000 4

toni-moreno · 2019-03-21T10:36:05Z

Hi @danielnelson , I'm still working with the tail plugin.

This is my count while having 21 dropped metrics

> select sum(resp_time_count) from apachelog2 where time > now()  - 5m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time                 sum
----                 ---
2019-03-21T10:19:00Z
2019-03-21T10:20:00Z 46
2019-03-21T10:21:00Z 46
2019-03-21T10:22:00Z 46
2019-03-21T10:23:00Z 46
2019-03-21T10:24:00Z 48

name: apachelog2
tags: request=/200_2sec.php
time                 sum
----                 ---
2019-03-21T10:19:00Z
2019-03-21T10:20:00Z 23
2019-03-21T10:21:00Z 23
2019-03-21T10:22:00Z 23
2019-03-21T10:23:00Z 23
2019-03-21T10:24:00Z 24

As in the previous @sbengo example with the file input plugin , dropped added with counted are correct total number of lines in the file per minute.

23 + 46 + 21 = 90 metrics/min

Could you explain us why aggregator is dropping metrics, and how can I fix it?

danielnelson · 2019-03-21T21:00:21Z

There are 2 possibilities:

The selected metric does not have any fields; fieldpass has removed all the fields. I think this is actually a bug and they should be marked as metrics_filtered, but right now they are being marked under metrics_dropped.
The metric does not have a timestamp in the current aggregation window running_aggregator.go#L125-L128.

I'm not sure which case we are running into, though the code setting the aggregation window has been problematic, and it could possible have something to do with the agent precision rounding. I can fix the bug where we report metrics as dropped when they should be filtered, and that will help narrow it down.

danielnelson · 2019-03-21T21:34:38Z

I see what the issue is, working on a fix now.

toni-moreno · 2019-03-26T06:53:35Z

Hi @danielnelson any news about the fix ? . do you need any other test we can do?.

When ready we would like to test the fix and validate it if you want.

Thank you again!

ushuz · 2019-03-26T09:42:52Z

I'm also testing aggregators and find that drop_original doesn't work well with max_undelivered_messages. It feels like drop_original = true won't reset some internal counters about how many messages were processed. Maybe these problems share the same root cause?

danielnelson · 2019-03-27T04:03:14Z

@toni-moreno I will definitely take you up on that offer to test, but I need a bit more time. Should have the fix ready on Wednesday.

ushuz mentioned this issue Mar 26, 2019

Fix aggregator metric drop #5632

Merged

3 tasks

danielnelson self-assigned this Mar 26, 2019

danielnelson added area/agent bug unexpected problem or unintended behavior labels Mar 26, 2019

danielnelson added this to the 1.10.2 milestone Mar 26, 2019

danielnelson mentioned this issue Mar 27, 2019

Fix aggregator window and shutdown of multiple aggregators #5644

Merged

3 tasks

danielnelson closed this as completed in #5644 Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basicstats counter not working as expected #5595

Basicstats counter not working as expected #5595

toni-moreno commented Mar 17, 2019

toni-moreno commented Mar 18, 2019

toni-moreno commented Mar 18, 2019

danielnelson commented Mar 19, 2019

sbengo commented Mar 19, 2019

danielnelson commented Mar 20, 2019

sbengo commented Mar 20, 2019 •

edited

Loading

danielnelson commented Mar 20, 2019

sbengo commented Mar 21, 2019

toni-moreno commented Mar 21, 2019

danielnelson commented Mar 21, 2019

danielnelson commented Mar 21, 2019

toni-moreno commented Mar 26, 2019

ushuz commented Mar 26, 2019

danielnelson commented Mar 27, 2019

Basicstats counter not working as expected #5595

Basicstats counter not working as expected #5595

Comments

toni-moreno commented Mar 17, 2019

Description

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

toni-moreno commented Mar 18, 2019

Telegraf Full config

/etc/telegraf/telegraf.conf

/etc/telegraf/telegraf.d/input_apache.conf

/etc/telegraf/telegraf.d/output_apache.conf

Output Log

toni-moreno commented Mar 18, 2019

danielnelson commented Mar 19, 2019

sbengo commented Mar 19, 2019

Test:

A: Debian VM on VirtualBox:

Result:

B: Redhat 7.5 VM on VMWare vSphere:

Result:

danielnelson commented Mar 20, 2019

sbengo commented Mar 20, 2019 • edited Loading

Set up watch_method = poll

Test with input.file plugin

Result:

More info:

danielnelson commented Mar 20, 2019

sbengo commented Mar 21, 2019

Test with internal

Result

toni-moreno commented Mar 21, 2019

danielnelson commented Mar 21, 2019

danielnelson commented Mar 21, 2019

toni-moreno commented Mar 26, 2019

ushuz commented Mar 26, 2019

danielnelson commented Mar 27, 2019

sbengo commented Mar 20, 2019 •

edited

Loading

Set up `watch_method = poll`

Test with `input.file` plugin

Test with `internal`