Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

386fix #4076

Closed
wants to merge 9 commits into from
Closed

386fix #4076

wants to merge 9 commits into from

Conversation

srclosson
Copy link

Required for all PRs:

  • [*] Signed CLA.
  • [*] Associated README.md updated.
  • [*] Has appropriate unit tests.

TECKCOMINCO\sclosso3 and others added 3 commits April 18, 2018 13:10
…re out how to separate a 32bit build from a 64 bit build.

Enhancement: Allow collecting of perfmon data from a remote node
Enhancement: Allow collecting data using the timestamp from the source and don't re-timestamp
Fix: 32 and 64 bit versions split out into separate files
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srclosson for taking the time to look at this code, getting these types right is quite tricky. Below are some comments, but also make sure take a look at the unit test failures on appveyor, they run on a 64-bit Windows system.

if ret == ERROR_SUCCESS {
ret = PdhGetFormattedCounterArrayDouble(metric.counterHandle, &bufSize,
&bufCount, &emptyBuf[0]) // uses null ptr here according to MSDN.
ret = PdhGetFormattedCounterArrayDouble(metric.counterHandle, &bufSize, &bufCount, &emptyBuf[0]) // uses null ptr here according to MSDN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this line since no change and had better wrapping, we try to wrap at about 80chars.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger doger

if ret == PDH_MORE_DATA {
filledBuf := make([]PDH_FMT_COUNTERVALUE_ITEM_DOUBLE, bufCount*size)
if len(filledBuf) == 0 {
continue
}
ret = PdhGetFormattedCounterArrayDouble(metric.counterHandle,
&bufSize, &bufCount, &filledBuf[0])
ret = PdhGetFormattedCounterArrayDouble(metric.counterHandle, &bufSize, &bufCount, &filledBuf[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

@@ -90,6 +97,7 @@ type item struct {
counter string
instance string
measurement string
nodename string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this computer to match the PDH documentation: https://msdn.microsoft.com/en-us/library/windows/desktop/aa373193(v=vs.85).aspx

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Good idea.


if instance == "------" {
query = "\\" + objectname + "\\" + counter
query = "\\\\" + nodename + "\\" + objectname + "\\" + counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the documentation this doesn't seem like a valid path if there is no computer name, you would get:

\\\Process(*)\% Processor Time

Can you verify if this works?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can tell you now. The node name is required to be specified. I personally like this, because the measurement gets tagged with the node name, which is now different from the collecting host, but it's not backwards compatible.

Let me check a different idea in which would be backwards compatible.

[[inputs.win_perf_counters.object]]
# Processor usage, alternative to native, reports on a per core.
# Processor usage, alternative to native, reports on a per core.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 2 space indention in the sample config

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I must have used a tab when I meant a space.

@@ -38,6 +38,12 @@ It is recommended NOT to use this on OSes starting with Vista and newer because
Example for Windows Server 2003, this would be set to true:
`PreVistaSupport=true`

#### UseWinTimestamps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a more descriptive name would be UsePerfCounterTime?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I like it. Changed

}

func PdhUseWinTimestamps() {
pdh_CollectQueryDataWithTime = libpdhDll.MustFindProc("PdhCollectQueryDataWithTime")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do like above on AddEnglishCounterW?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learn something new everyday. Much cleaner. Thank-you

pdh_CollectQueryDataWithTime = libpdhDll.MustFindProc("PdhCollectQueryDataWithTime")
}

func PdhConnectMachine(szMachineName string) uint32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to find where this is called from, is it used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and I neglected to write a comment header stating this. In experimenting I had added the call, but ended up not requiring it (it was actually orders of magnitude slower doing the connect call and then querying for the counters. I've noted this in a comment header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead remove all the calls that are not used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. I can. I just seems a shame, the call works, and may be useful in the future. Perhaps there is an alternative place to put it? I'll remove it for now. At least it will be in my git history if I need to access it again.

// }
// fmt.Println("Converted bytes: ", n)
//
func WideCharToMultiByte(codePage uint32, dwFlags uint32, wchar *uint16, nwchar int32, str *byte, nstr int32) (nwrite int32, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this called from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. It was one of those things I used to try and fix the win32 bug without padding the structures. The call itself works good and I thought it might be useful in the future. I've noted this in a comment, but I can remove it as well.

}

var (
// Library
libpdhDll *syscall.DLL
libkrnDll *syscall.DLL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move all the kernel32 code to a new file: kernel32.go

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done

@@ -50,6 +56,16 @@ beneath the main win_perf_counters entry, `[[inputs.win_perf_counters]]`.

Following this is 3 required key/value pairs and the three optional parameters and their usage.

#### Computer
**Required**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this as optional

- Adding a computer name is optional
- Removed unused code
- Fixed another few instances of uint32 with [4]byte
@srclosson
Copy link
Author

Can this be re-reviewed?

@srclosson srclosson closed this Apr 29, 2018
@srclosson srclosson reopened this Apr 29, 2018
@danielnelson
Copy link
Contributor

Can you take another pass and remove all code that is no actively being used right now. If we need to bring back in this code we can grab it either from your tree or from https://github.com/lxn/win (who authored most of the Windows code).

Also, it looks like #4036 conflicted with the changes here so we will need to sort that out.

@danielnelson
Copy link
Contributor

closes #1899

@danielnelson danielnelson added the area/windows Related to windows plugins (win_eventlog, win_perf_counters, win_services) label May 8, 2018
@danielnelson
Copy link
Contributor

closes #2468

@park-youngjoon
Copy link

Are you in progress now?

func UTF16PtrToString(s *uint16) string {
if s == nil {
return ""
}
return syscall.UTF16ToString((*[1 << 29]uint16)(unsafe.Pointer(s))[0:])

return syscall.UTF16ToString((*[20]uint16)(unsafe.Pointer(s))[:])
Copy link
Contributor

@vlastahajek vlastahajek May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this is not right. You have to cast pointer to large enough array, otherwise you could get out of bounds.
There is 100% probability that strings with more than 20 characters go though this function .
Keeping original will not harm 32-bit support, as array size is type int, which is at least 32 bits, so [1<<29] is OK

@danielnelson
Copy link
Contributor

@srclosson I believe the 32-bit path is fixed by #4189, but I believe you will need to base the remote computer and counter times fixes on the new code. I'm going to close this pull request since I think it will be easier to open a new one.

@vlastahajek
Copy link
Contributor

vlastahajek commented May 25, 2018

@danielnelson, #4189 didn't contain fix for 386 issue. It doesn't work in 32bit build, I tested it.
I deliberately skipped 32bit support, to later merge it with this PR.
As #4189 is merged now, I can:

  • either help @srclosson with merging of current master to 386fix branch
  • or create separated fix for 32bit support in different PR.

Maybe it would be more convenient to keep 386 support separated from other features (collecting from remote computers and adding timestamp to metrics)

@srclosson
Copy link
Author

Okay... So what's the status here? @danielnelson would you comment on how it's best to proceed?

@danielnelson
Copy link
Contributor

I'm not getting the crash anymore when I compile with GOARCH=386 and run on a 64-bit system, without #4189 it crashes immediately. That said I haven't tested anything more than running --test on the default Windows config.

@srclosson Depends on how much time you have available, I'm hoping to release 1.7.0-rc1 at the end of next week, and it would be great if we could have the 32-bit fix in. If your schedule is full until then it might be best for @vlastahajek to adapt your work so we can get it in for 1.7.

Ideally we would have separate pull requests for 32-bit fix and the new features.

@vlastahajek
Copy link
Contributor

@danielnelson: Yes, it's not crashing, at least for several tens minutes, but it writes invalid values for all the stuff, measurement names, fieldnames, fieldvalues.

As discussed in the original issue, there is problem with aligning data within structures with different field types between Go and Wind 32 API.

Crash before occurred fast because there were, IMHO, wrong allocation of the buffer for PDH_FMT_COUNTERVALUE_ITEM_DOUBLE array.

I've prepared separated fix for 386problem, cause I primarily needed to fix my code for 32bit support.
It took me some time to figure out where to place padding in PDH_COUNTER_INFO structure.
It's now ready and tested and all I need to do is create a PR.

@srclosson: Now it depends on you. Did you have a time to look at changes? I did quite a large refactor.

@danielnelson
Copy link
Contributor

@vlastahajek Can you go ahead and open a PR for the 32-bit fix?

@vlastahajek
Copy link
Contributor

Ok, #4206

@vlastahajek
Copy link
Contributor

vlastahajek commented May 29, 2018

I merged @srclosson's branch with master. The question is how to push it.

@srclosson, As those features are your work, I would prefer if you could add me as a contributor to your forked telegraf repo and will commit it there. We can then continue with this PR (rename it) and with comments.

My merges is purely melding my refactor and fixes with @srclosson's enhacements, except one change, that I removed the Computer param from example config, because the default value localhost doesn't work on Windows 10. You have to specify local computer name to have it working.

I would also propose a few improvements:

  • Rename the Computer parameter to RemoteHost, or so.
  • Allow to monitor multiples hosts - if we change monitoring to remote host, we loose ability to monitor host where telegraf is running easily. And if we have to monitor more such nodes? Mutiple instances of telegraf doesn't seem to be nice solution.

Both should be easy to implement. Multiple remote host can be easily added as stacking performance counters for multiple hosts, although solution using more workers would be prettier.

@danielnelson
Copy link
Contributor

@vlastahajek In the meantime if you have time to prepare a branch with your changes that would be nice.

I think the RemoteHost change is a good one, but for multiple hosts doesn't it work with a mix of local and remote hosts?

[[inputs.win_perf_counters.object]]
  RemoteHost = "Foo"
  ObjectName = "Processor"
  Instances = ["*"]
  Counters = ["% Idle Time"]
  Measurement = "win_cpu"

[[inputs.win_perf_counters.object]]
  RemoteHost = "Bar"
  ObjectName = "Processor"
  Instances = ["*"]
  Counters = ["% Idle Time"]
  Measurement = "win_cpu"

[[inputs.win_perf_counters.object]]
  ObjectName = "LogicalDisk"
  Instances = ["*"]
  Counters = ["% Idle Time"]
  Measurement = "win_disk"

@DmitryMaletin
Copy link

DmitryMaletin commented Jun 1, 2018

Hi,
I play with forked version of telegraf

My config looks like

[[inputs.win_perf_counters.object]]
Servers = ["$COMPUTERNAME", "RemoteHost","RemoteHost"]
ObjectName = "Processor"
Instances = ["*"]
Counters = ["% Idle Time"]
Measurement = "win_cpu"

So it's work with mixed set of hosts
Generated path looks like
counterPath = "\\" + server + "\" + objectname + "\" + counter

@vlastahajek
Copy link
Contributor

vlastahajek commented Jun 1, 2018 via email

@DmitryMaletin
Copy link

@vlastahajek
If I understood correctly you propose to have a counter per host config.
In my case it's not very convenient cuz I can have a dozen of host to get exactly the same counters.

My current fork: https://github.com/DmitryMaletin/telegraf/blob/dma/plugins/inputs/win_perf_counters/win_perf_counters.go

@vlastahajek
Copy link
Contributor

Host per counter object is as it is now.

I would vote the same as you, having global remote host(s) for all counters

@danielnelson
Copy link
Contributor

Would plugin level be reasonable? Remember you can always have multiple plugins defined. Obviously not as fine grained as per object but in many cases this will be easier since you don't have to continually redefine the same remote hosts:

[[inputs.win_perf_counters]]
  # If empty, collect only from localhost
  Servers = ["$COMPUTERNAME", "RemoteHost","RemoteHost"]

  [[inputs.win_perf_counters.object]]
    ObjectName = "Processor"
    Instances = ["*"]
    Counters = ["% Idle Time"]
    Measurement = "win_cpu"

@DmitryMaletin
Copy link

yes. for me it sounds reasonable.

@vlastahajek
Copy link
Contributor

So, how to proceed further?

@srclosson, Are going to look at it?

@srclosson
Copy link
Author

@danielnelson/@vlastahajek, I can certianly look at it. @danielnelson, could we get on the phone for 30 minutes? I've got a lot of stuff to contribute, and I think it would make sense to bring you up to speed on how to best go about merging some of it. There is a lot... Everything from a GPS Nmea plugin, to generic alarm parsing, windows event logs, webdav/sftp data transfer etc... I just don't want to hold up the line and would like to just show you through the ones I've developed, get your feedback, and some ideas on how I could/should go about merging them...

@srclosson
Copy link
Author

Another idea I had for the remote nodes is to do something like, (pseudocode)

for _, entry := range PerfCounterItems {
wg.Add(1)
go GetAndProcessData(entry)
}
wg.Wait()

This way, all of the requests get sent off at the same time. Probably more performant than doing each one individually...

@danielnelson
Copy link
Contributor

Yes, this would be a good thing, ideally network operations are done concurrently with a connection limit and with timeouts. Support for cancellation using a Context is also a nice touch but Telegraf doesn't have a way to do this yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/windows Related to windows plugins (win_eventlog, win_perf_counters, win_services)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants