-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basicstats counter not working as expected #5595
Comments
I forgot include the complete config and log Telegraf Full config/etc/telegraf/telegraf.conf[global_tags]
l_tag1 = "a"
l_tag2 = "b"
l_tag3 = "c"
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = "10s"
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
omit_hostname = false
[[inputs.internal]]
# collect_memstats = true
[inputs.internal.tags]
ifx_db = "internal_telegraf"
[[outputs.influxdb]]
urls = ["http://influxdb.mydomain.org:8086"]
database = "_telegraf"
username = "xxxxxxxxx"
password = "xxxxxxxxx"
#Filter influx output
tagexclude = ["ifx_db"]
[outputs.influxdb.tagpass]
ifx_db = ["internal_telegraf"] /etc/telegraf/telegraf.d/input_apache.conf[[inputs.tail]]
name_override = "apachelog2"
files = ["/var/log/apache2/test.log"]
from_beginning = false
pipe = false
data_format = "grok"
grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"]
[inputs.tail.tags]
instance="instance1"
ifx_db="ws_apache"
[[aggregators.basicstats]]
namepass = ["apachelog2"]
period = "60s"
drop_original = true
fieldpass = ["resp_time"]
stats = ["count","max","min","mean"] /etc/telegraf/telegraf.d/output_apache.conf[[outputs.influxdb]]
urls = ["http://influxdb.mydomain.org:8086"]
database = "apache_metrics"
username = "xxxxxxx"
password = "xxxxxxx"
#Filter metrics
tagexclude = ["ifx_db"]
[outputs.influxdb.tagpass]
ifx_db = ["ws_apache"] Output Log
|
More interesting data ... I't seems like counting changes with time.. In the morning...
Some hours after
|
@toni-moreno I'm not able to replicate, can you try to remove the tag routing to see if it helps?:
Output with a file output:
|
Hi @danielnelson , I'm working with @toni-moreno and after some tests I have seen the same wrong results as he had described on this issue. Version: Telegraf 1.10.0 (git: HEAD fe33ee8) Test:
A: Debian VM on VirtualBox:
Result:
B: Redhat 7.5 VM on VMWare vSphere:
Result:
|
Can you try using the If nothing changes, can you place the 3 log lines into a file and substitute the tail input with the file input:
test.log:
|
Hi @danielnelson, thanks for the answer! First of all, on each of the test done, we have check that setting up Following with your recommendations: Set up
|
If you enable the
|
Hi @danielnelson , We enabled the internal plugin and, as you can see on the following results, seems that: <metrics_dropped> = <expected_value> - <sum(resp_time_count)> Test with
|
Hi @danielnelson , I'm still working with the tail plugin. This is my count while having 21 dropped metrics
As in the previous @sbengo example with the file input plugin , dropped added with counted are correct total number of lines in the file per minute. 23 + 46 + 21 = 90 metrics/min Could you explain us why aggregator is dropping metrics, and how can I fix it? |
There are 2 possibilities:
I'm not sure which case we are running into, though the code setting the aggregation window has been problematic, and it could possible have something to do with the agent precision rounding. I can fix the bug where we report metrics as dropped when they should be filtered, and that will help narrow it down. |
I see what the issue is, working on a fix now. |
Hi @danielnelson any news about the fix ? . do you need any other test we can do?. When ready we would like to test the fix and validate it if you want. Thank you again! |
I'm also testing aggregators and find that |
@toni-moreno I will definitely take you up on that offer to test, but I need a bit more time. Should have the fix ready on Wednesday. |
Description
I've trying to config an easy way to get apache statistics as described in #3991 , but while doing some testing with tail / logparser , I noticed something is not working to me , perhaps there is some missing config ? If not could be becouse of a bug in the agregator (perhaps there is some point filtering before send them to the aggregation process)?
Relevant telegraf.conf:
This is the config to put in /etc/telegraf/telegraf.d/testlog.conf to reproduce
System info:
Steps to reproduce:
1.- Create the previous config file (testlog.conf) , put in the telegraf.d directory and restart telegraf
2.- Create this script , to simulate a log file ( always with the same input )
Execute in background
Expected behavior:
I expect to have:
As counted with grep and wc -l
Actual behavior:
Basicstats agregator is counting only:
The text was updated successfully, but these errors were encountered: