Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing metrics {type="Database size"} for inputs.sqlserver when output to prometheus_client when running telegraf on windows. #3349

Closed
mplisov opened this issue Oct 17, 2017 · 12 comments
Labels
area/prometheus bug unexpected problem or unintended behavior

Comments

@mplisov
Copy link

mplisov commented Oct 17, 2017

Seeing strange behaviour with sqlserver metrics on telegraf 1.4.2 on Windows.
Prometheus output misses {type="Database size"} metrics block completely.
I can see those metrics in --test run of telegraf though.
Second note: Issue doesn't appear when running same version of telegraf on linux.

SQL servers and connection credentials are the same (checked with sql server 2008,2012,2014)

@mplisov mplisov changed the title Missing metrics for inputs.sqlserver when output to prometheus_client when running telegraf on windows. Missing metrics {type="Database size"} for inputs.sqlserver when output to prometheus_client when running telegraf on windows. Oct 17, 2017
@danielnelson
Copy link
Contributor

Can you show what the output looks like on Linux vs Windows? Can you also show the relevant output when using --test?

@danielnelson danielnelson added area/prometheus bug unexpected problem or unintended behavior labels Oct 17, 2017
@mplisov
Copy link
Author

mplisov commented Oct 18, 2017

Output is quite big.
Example of missing data
Here is the sample from --test run (its identical on windows and linux):

> Log\ size\ (bytes),type=Database\ size,servername=SQL 5nine_monitor=1064960i,smg_pmc=72679424i,tempdb=18677760i,enterprise20=33751040i,agroplan_prod=789250048i,master=1835008i,model=1048576i,conversion3=78577664i,smg_itil=1538260992i,conversion2=445382656i,msdb=2621440i,ReportServerTempDB=1064960i,kmszarauto=15162540032i,SQLSentry=158334976i,agroplan_copy=789250048i,smg_docmngcorp=27262976i,smg_hrm_82_regl=838402048i,AgroPlan=789250048i,smg_modeling=251330560i,ReportServer=9633792i,MdxStepByStep=40239104i,psp_dw=174325760i,smg_82=20578304i,smg_buh_20=37158912i,Sqlbi_ManyToMany_Multidimensional=5242880i,courseconstructor=2097152i 1508285358000000000
> Rows\ size\ (bytes),servername=SQL,type=Database\ size SQLSentry=367001600i,agroplan_copy=78708736i,model=4259840i,ReportServerTempDB=4259840i,conversion3=153092096i,smg_modeling=233897984i,smg_hrm_82_regl=319881216i,smg_82=1016070144i,smg_itil=1988165632i,master=5111808i,enterprise20=5460852736i,conversion2=2602631168i,courseconstructor=58785792i,agroplan_prod=133234688i,ReportServer=5308416i,Sqlbi_ManyToMany_Multidimensional=10485760i,tempdb=180682752i,5nine_monitor=4259840i,AgroPlan=133234688i,smg_docmngcorp=375586816i,MdxStepByStep=106561536i,msdb=225640448i,kmszarauto=6821773312i,smg_pmc=29425664i,psp_dw=104857600i,smg_buh_20=7089881088i 1508285358000000000

@mplisov
Copy link
Author

mplisov commented Oct 18, 2017

Figured out than metrics of type="Database IO" also missing on Windows instance. It seems that bug somewhere in parsing data to Prometheus format, because in --test output they are present.
Example of missing data

> Log\ reads\ (bytes/sec),servername=SQL2012,type=Database\ IO,host=sql2012 model=0i,tempdb=0i,breedfarm=0i,newvysokovskoe=0i,crm_demo=0i,trade.standard=0i,erp2_demo=0i,enterprise20_copy=0i,trade11.standard=0i,newzalmilk=0i,erp2_agr_demo=0i,master=0i,erp2_demo1=0i,enterprise20_test=0i,erp_eurocond=0i,nmsztrade=0i,zalfoods=0i,docmngcorp_demo=0i,Total=0i,OpManagerDB=0i,consolidationprof=0i,learn_expert1c=0i,crm3_demo=0i,docmngcorp_test=0i,enterprise20_copy2=0i,msdb=0i,nmsztrade.sample=0i,enterprise20_demo=0i,enterprise_standard=0i 1508282208000000000

@danielnelson
Copy link
Contributor

My access to a Windows machine is limited, but I will try to reproduce this Friday.

@danielnelson
Copy link
Contributor

I created a powershell script that echos some of the lines from the --test output, and ran that as an exec plugin, but I wasn't missing any data from the prometheus output:

# HELP Log_reads__bytes_sec__enterprise20_demo Telegraf collected metric
# TYPE Log_reads__bytes_sec__enterprise20_demo untyped
Log_reads__bytes_sec__enterprise20_demo{host="sql2012",servername="SQL2012",type="Database IO"} 0

Can you enable a file output and check if the missing data is written?

@mplisov
Copy link
Author

mplisov commented Nov 21, 2017

I think I've found the bug. For some strange reason if I specify connection host as fqdn/hostname, I get missing metrics, but when I specify IPv4 address like 127.0.0.1 it works as expected. This explains why metrics collected from telegraf on linux were ok, because linux machine was connecting using IPv4 address. Good question why this happens...

@danielnelson
Copy link
Contributor

Wow, thats odd. Can you duplicate this just using curl or similar and diffing the output?

@mplisov
Copy link
Author

mplisov commented Nov 21, 2017

Well, it was too quick decision. I was wrong on ipv6 thing, sorry.
Attempt #2. I can reproduce the error-behaviour when telegraf is running as windows service as LocalSystem account. When change LocalSystem account to my domain account or run it from comandline everything is ok. I will check it on couple other servers in two days and report back.

@allangood
Copy link

allangood commented Apr 5, 2018

Hi Daniel,

I'm getting this exactly behavior, but after upgrade from 1.5.3 to 1.6.X or 1.7.X for both, Windows or Linux boxes.

Some outputs like inputs.mem and inputs.sqlserver just disappears from the prometheus exposed list (http://myserver/metrics) but works perfectly with --test switch on command line.

I've tried to run as different user (on windows) and using IP address instead of name as well, and none worked.
There's no relevant logs on log file or running with "debug" flag as true.

I will be glad to provide more information with you need.

Thank you.

My configuration:

[global_tags]
  collector = "telegraf"
  fqdn = "myserver.mydomain"
  location = "mylocation"
  ostype = "windows"
[[outputs.prometheus_client]]
  expiration_interval = "60s"
  listen = "0.0.0.0:9273"
[[inputs.mem]]
[[inputs.sqlserver]]
  query_version = 2
  servers = [ "Server=localhost\\instance;User Id=telegraf;Password=telegraf;app name=telegraf;encrypt=disable;log=0"]	

Output from command line:

c:\Program Files\telegraf>telegraf.exe -config "c:\Program Files\telegraf\telegraf.conf" --config-directory "c:\Program Files\telegraf\telegraf.d" --test --input-filter mem

> mem,collector=telegraf,fqdn=myserver,host=myserver,location=mylocation,ostype=windows active=0i,available=8255950848i,available_percent=48.057264930773286,buffered=0i,cached=0i,free=i,inactive=0i,slab=0i,total=17179402240i,used=8923451392i,used_percent=51.942735069226714,wire=0i 1522960875000000000

Output from curl "http://myserver/metrics":

# TYPE mem_available_percent counter
mem_available_percent{collector="telegraf",fqdn="myserver",host="myserver",location="mylocation",ostype="windows"} 48.13911625367473
# HELP mem_used_percent Telegraf collected metric
# TYPE mem_used_percent counter
mem_used_percent{collector="telegraf",fqdn="myserver",host="myserver",location="mylocation",ostype="windows"} 51.86088374632527

Output from version 1.5.3:

curl http://myserver02/metrics

# HELP mem_active Telegraf collected metric
# TYPE mem_active counter
mem_active{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_available Telegraf collected metric
# TYPE mem_available counter
mem_available{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 5.204361216e+09
# HELP mem_available_percent Telegraf collected metric
# TYPE mem_available_percent counter
mem_available_percent{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 17.31076795555045
# HELP mem_buffered Telegraf collected metric
# TYPE mem_buffered counter
mem_buffered{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_cached Telegraf collected metric
# TYPE mem_cached counter
mem_cached{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_free Telegraf collected metric
# TYPE mem_free counter
mem_free{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_inactive Telegraf collected metric
# TYPE mem_inactive counter
mem_inactive{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_slab Telegraf collected metric
# TYPE mem_slab counter
mem_slab{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_total Telegraf collected metric
# TYPE mem_total counter
mem_total{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 3.0064300032e+10
# HELP mem_used Telegraf collected metric
# TYPE mem_used counter
mem_used{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 2.4859938816e+10
# HELP mem_used_percent Telegraf collected metric
# TYPE mem_used_percent counter
mem_used_percent{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 82.68923204444955

@danielnelson
Copy link
Contributor

@allangood Thanks for the help testing, I can reproduce this and will work on a fix. I think this is different from the original issue though, can you open this as a new issue?

@danielnelson
Copy link
Contributor

@mplisov I hope you were able to resolve the original problem, I'm going to assume no new is good news and close this issue, let me know if it should be reopened.

@allangood
Copy link

allangood commented Apr 5, 2018

Hi @danielnelson,

I opened issue number #3977

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prometheus bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants