Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Filesystem Metricset: Inconsistent system.filesystem.type values on Windows #22501

Closed
andrewstucki opened this issue Nov 9, 2020 · 20 comments · Fixed by #22531 or #24502
Closed
Assignees
Labels
Metricbeat Metricbeat Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@andrewstucki
Copy link

The filesystem metricset in Metricbeat on Windows appears to have bad type information.

From the docs:

  • system.filesystem.type - The disk type. For example: ext4

What the docs are pointing to is the file system disk format. On Windows, however, this gets filled in by TypeName field from the underlying gosigar library.

In the gosigar library this appears to be referencing the actual drive type (removable, fixed, ramdisk, etc.) rather than the filesystem format. It looks like it's been that way for the past 3 years, starting with this PR: #4717 -- but there's not enough context in the PR to figure out why we're using that field on Windows hosts.

From a quick perusal of the gosigar code, it appears like the filesystem format (SysTypeName) might never get set in the FileSystem structure on Windows, which, coupled with a desire to fill this in on Windows in beats and not realizing the difference between TypeName/SysTypeName might explain the conditional in metricbeat.

I'm pretty sure that getting the right information would require extending gosigar to fill in the SysTypeName information based off of an underlying WinAPI call to GetVolumeInformationW.

@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@fearful-symmetry
Copy link
Contributor

@narph can you look at this? This isn't the first time we've had issues with the API calls used in gosigar and it's probably a sign that we need to look at deprecating that library.

@andrewstucki
Copy link
Author

FYI, in the meantime elastic/gosigar#146 extends gosigar to include the filesystem type support, so think we should be able to cut a release and fix the conditional to get this functional prior to any larger work of deprecating the library.

@fearful-symmetry
Copy link
Contributor

Ah, didn't even notice that PR! We'll see if we can cut a release and go from there.

@narph
Copy link
Contributor

narph commented Nov 11, 2020

@narph can you look at this? This isn't the first time we've had issues with the API calls used in gosigar and it's probably a sign that we need to look at deprecating that library.

@fearful-symmetry , I am not sure why we used GetDriveTypeW for the filesystem type, I do see we are not using SysTypeName and instead we map the drive type to TypeName and use that.
I have tested the PR @andrewstucki merged and it looks good , it returned the ntfs type on my windows machine.

I agree with re-evaluating the gosigar library, we have some open issues and concerns raised as well in the 152 excel doc.

@anderssonjohan
Copy link

@andrewstucki @narph @fearful-symmetry This change broke metricbeat on Windows systems, I posted a script here with some sample output: https://discuss.elastic.co/t/metricbeat-7-11-1-fail-in-system-fsstat-and-filesystem-the-device-is-not-ready/265613/3

Don't know how to fix this, but let me know if I can be to any help. 🙏
We're currently rolling back to 7.10 which still works. We discovered this bug this week when upgrading to 7.11 when we stopped getting file system usage stats (system.filesystem.used.pct) on every windows system in Kibana.

@fearful-symmetry
Copy link
Contributor

The change adds a call to Kernel32.GetVolumeInformationW to all drives on the system, but this will fail for certain drive types.

For example, on our system we get the drives A:\ C:\ D:\ L:\ where A is a floppy and D is a CD-ROM.
Both these two returns false when calling GetVolumeInformationW and that will stop metricbeat from working.

Well, I didn't see that coming. I think @andrewstucki is going to have to take a look at that, since it's a little too windows-heavy for me to be confident, but it seems like we can just filter drive types before we get additional information on them?

@leandroscardua
Copy link

Hi Elastic Team,

Could you please re-open this ticket?
The issue still present on the version 7.11.1

@fearful-symmetry
Copy link
Contributor

@leandroscardua Can you elaborate and post your configuration and related events? The actual type issues present originally, or the failures mentioned in the previous few comments?

@EricDavisX EricDavisX reopened this Mar 1, 2021
@mgaruccio
Copy link

Just a note that we're seeing the same problem that @anderssonjohan is seeing with the 7.11.1 version of metricbeat embedded in the elastic agent. Tracking the issue with Elastic support on case 00688041 if any additional environment-specific details are needed.

@leandroscardua
Copy link

HI @fearful-symmetry,

Same issue reported by @anderssonjohan
#22501 (comment)

@nhasanit
Copy link

nhasanit commented Mar 2, 2021

This issue also exists in Elastic Agent 7.11.1

@mitchelldavis44
Copy link

Bump, any update on this?

@fearful-symmetry
Copy link
Contributor

I'm waiting for some help from @narph, since she's the resident windows expert.

@narph
Copy link
Contributor

narph commented Mar 10, 2021

I was able to reproduce the issue using a portable DVD writer, looks like the GetVolumeInformationW win32 api doesn't handle well these types of ephemeral devices. When the CD-ROM is empty then the function returns Device not ready error.
I've created an initial PR to make sure we ignore these devices if this error is encountered elastic/gosigar#159.

@anderssonjohan
Copy link

Whohoo, that's awesome @narph! Great work! 🙏

@yevgenytrcloudzone
Copy link

Hi, the problem still exist in 7.12.0 Metricbeat for Windows.
I have also tried the filesystem.ignore_types: [cdrom] method for the system module YAML configuration, but it won't help.
Is it suggested to use an older version of Metricbeat or a fix is coming soon (it's a VM on ESX where I can simply disconnect the virtual CDROM)?

@narph
Copy link
Contributor

narph commented Apr 19, 2021

@yevgenytrcloudzone

Is it suggested to use an older version of Metricbeat or a fix is coming soon (it's a VM on ESX where I can simply disconnect the virtual CDROM)?

Unfortunately, there is no workaround for the current version so, indeed, best solution for now is using an earlier version of Metricbeat instead.

@eedugon
Copy link
Contributor

eedugon commented Jun 29, 2021

For reference purposes I believe this issue is fixed on 7.12.1 onwards, and not on 7.12.0.

@scottdfedorov
Copy link

For reference purposes I believe this issue is fixed on 7.12.1 onwards, and not on 7.12.0.

Can confirm. Working in 7.12.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet