Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat: Remove duplicated filesystems in system module #6819

Merged
merged 6 commits into from
Apr 16, 2018

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Apr 10, 2018

Mountpoints whose device name is the same absolute path are considered to be the same and are counted only once. The directory name used is the shorter between them. If the device of a filesystem is a directory, it is considered some kind of bind mount and is ignored. This avoids duplication of filesystems as can happen with bind mounts in Linux (#3384). This mimics the behaviour of df command.

There are still filesystems that can be counted twice if union filesystems are used (as aufs or overlay), but this can be controlled with the filesystem.ignore_types option. Maybe we should blacklist them by default. Update: For that, if the option is left empty, we fill the list of ignored types with all nodev devices in /proc/filesystems where this file exists.


// If a block device is mounted multiple times (e.g. with bind mounts),
// store it only once, and use the shorter mount point path.
if seen, found := devices[fs.DevName]; found {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not completely accurate, there can be different device paths that are really pointing to the same block device, but I don't believe this is common at all. To control this we should check duplicates by device identifier (Major:Minor) but this is not supported in gosigar yet. Does it worth to implement this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it what you have for now but potentially follow up with a change in gosigar to make it possible if it is a case that can happen (more then once :-)). @andrewkroh WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it will help make Metricbeat more accurate then I'm +1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look to this in gosigar, meanwhile I think we are good enough with the checks based on device name.

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch 3 times, most recently from b304e19 to 9c8173b Compare April 10, 2018 16:22
@jsoriano jsoriano added the in progress Pull request is currently in progress. label Apr 10, 2018
@jsoriano
Copy link
Member Author

Please don't merge this, I have just seen that gosigar obtains list of filesystem from mtab, what doesn't include the device name for bind mounts.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having some defaults for filesystem.ignore_types. Someone could still set it to filesystem.ignore_types: [ ] if we wants all types.


// If a block device is mounted multiple times (e.g. with bind mounts),
// store it only once, and use the shorter mount point path.
if seen, found := devices[fs.DevName]; found {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it what you have for now but potentially follow up with a change in gosigar to make it possible if it is a case that can happen (more then once :-)). @andrewkroh WDYT?

},
},
{
description: "Don't repeat devices, sortest of dir names should be used",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shortest, also in all the ones below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

@jsoriano
Copy link
Member Author

On linux we could maybe ignore by default all types marked as nodev in /proc/filesystems.

@ruflin
Copy link
Contributor

ruflin commented Apr 11, 2018

@jsoriano I'm ok with excluding nodev type by default. Your comment reminded me that as part of this PR we should extend the docs for filesystem and explain in detail what we include and exclude. This question will pop up and it's good to have a place to link to and look it up. This should also make it easier to revisit the decision later again.

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch from 9c8173b to cf2673f Compare April 11, 2018 11:13
@jsoriano jsoriano removed the in progress Pull request is currently in progress. label Apr 11, 2018
@jsoriano
Copy link
Member Author

jsoriano commented Apr 11, 2018

Two more changes added:

  • Filesystems whose device name is a directory are not collected, it is assumed to be some kind of bind mount.
  • If no filesystem.ignore_types is set, it is set to the list of known virtual filesystems in the system, it is by now the list of types marked as nodev in systems where /proc/filesystems is available. Documentation has been updated for this.

@jsoriano
Copy link
Member Author

Testing it in a linux machine it looks quite better now with the default settings.

@@ -32,6 +34,13 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
return nil, err
}

if len(config.IgnoreTypes) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour I would have expected that if a user sets filesystem.ignore_types: [] that then no file systems would be ignore. But checking for 0 indicates that it also applies in this case. Could we instead check if the config is set?

If we go with the above suggestion, the reference config file should be adjusted to contain the two values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, checking nil instead of zero lenght. I have added a comment also in reference config.

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch from cf2673f to fe6c17c Compare April 11, 2018 12:14

// If the device name is a directory, this is a bind mount or nullfs,
// don't count it as it'd be counting again its parent filesystem.
devFileInfo, err := os.Stat(fs.DevName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the original reasons we added ignore_types was to avoid doing a statfs on autofs filesystems. Metricbeat was causing autofs mounts to never be unmounted because its continuous metric gathering was making autofs think the mount was being accessed.

Will this cause the same problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Here the stat is done against the device name, on #4823 the mentioned issued seemed to appear when doing statfs on the mount point (DirName). So I don't think it can cause the same problem.

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch from fe6c17c to 5427e7a Compare April 11, 2018 16:30
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewkroh
Copy link
Member

jenkins, test it

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch from 5427e7a to a0acdd5 Compare April 12, 2018 15:32
@ruflin
Copy link
Contributor

ruflin commented Apr 13, 2018

@jsoriano I think the failing windows tests are related to this change.

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch from a0acdd5 to e608ad9 Compare April 13, 2018 07:59
@jsoriano
Copy link
Member Author

Removing filtering on Windows

@ruflin
Copy link
Contributor

ruflin commented Apr 13, 2018

Something went wrong in the helper:

darwin/beat/metricbeat/label/macosx/src/github.com/elastic/beats/metricbeat/module/system/filesystem/helper.go:41:3: not enough arguments to return
08:29:51 	have ([]gosigar.FileSystem)
08:29:51 	want ([]gosigar.FileSystem, error)

@jsoriano jsoriano force-pushed the fsstat-filter-bind-mounts branch from e608ad9 to fb6a7bd Compare April 13, 2018 15:16
@jsoriano
Copy link
Member Author

jenkins, test again please

Mountpoints whose device name is the same absolute path are considered
to be the same and are counted only once. The directory name used is
the shorter between them. This avoids duplication of filesystems as can
happen with bind mounts in Linux (elastic#3384).
In system module, when no filesystem type is set to be ignored, it
tries to make a sane guess. In systems with /proc/filesystems file it
ignores all devices marked as `nodev`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants