Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat wildcard for directories #2084

Closed
runningman84 opened this issue Jul 22, 2016 · 25 comments
Closed

filebeat wildcard for directories #2084

runningman84 opened this issue Jul 22, 2016 · 25 comments

Comments

@runningman84
Copy link

It looks like filebeat does not support wildcards in directories like this:

/opt/codedeploy-agent/deployment-root/*/*/logs/scripts.log
@ruflin
Copy link
Member

ruflin commented Jul 25, 2016

This is related to https://github.com/elastic/filebeat/issues/68 Wildcards for directories should already be supported. What is the behaviour you were expecting? Are you looking for ** which crawls all subdirectories (and is not supported at the moment)?

@runningman84
Copy link
Author

My folders look like this

/opt/codedeploy-agent/deployment-root/4b74562c-cee3-40f5-8d36-6f588eeed802/d-KYVV5AYKF/logs/scripts.log
/opt/codedeploy-agent/deployment-root/b41f722c-80e8-41e9-866e-1f11228f5ab3/d-UIIXR9K9F/logs/scripts.log

This config does not seem to work:

'/opt/codedeploy-agent/deployment-root/*/*/logs/scripts.log'

Do I need to change the config to this?

'/opt/codedeploy-agent/deployment-root/**/logs/scripts.log'

@ruflin
Copy link
Member

ruflin commented Jul 25, 2016

I would expect it to work, but TBH so far I only tested /opt/*/scripts.log examples, means only one directory with a * pattern. Could you briefly tests if /opt/codedeploy-agent/deployment-root/4b74562c-cee3-40f5-8d36-6f588eeed802/*/logs/scripts.log works for you? Which filebeat version are you using?

@runningman84
Copy link
Author

yes
/opt/codedeploy-agent/deployment-root/e8be8394-bb4f-403e-abd8-03045480217d/*/logs/scripts.log
works

@ruflin
Copy link
Member

ruflin commented Jul 26, 2016

Ok, so it seems to work with one directory but not 2 nested directories. We use directly Glob from Golang for the pattern: https://golang.org/pkg/path/filepath/#Glob Currently all pattern supported by Glob are supported by filebeat.

@runningman84
Copy link
Author

do you know a pattern which would work here?

@andrewkroh
Copy link
Member

@ruflin I'm thinking we should list the patterns supported (https://golang.org/pkg/path/filepath/#Match) in our documentation or link to it. Also we should put some info in there to clarify that only one wildcard is supported (assuming I understand correctly)?

If you agree I can open a new issue to add this to the docs.

@ruflin
Copy link
Member

ruflin commented Jul 27, 2016

@andrewkroh Agree. TBH so far it wasn't clear to me that multiple * do not work. It also doesn't seem to be state in the Golang docs (or haven't found it yet).

@wjoel
Copy link

wjoel commented Sep 12, 2016

The lack of multiple wildcards means it's not possible to have a setup as described in Log Management with ELK for Mesos, where paths have the format /var/lib/mesos/slave/slaves/*/frameworks/*/executors/*/runs/latest/stdout

From golang/go#11862 it seems Golang will not support this any time soon, and the discussion ends with a reference to go-zglob. Would it be possible to use that for the glob patterns in filebeat? According to this commit it looks like a simple change.

@ruflin
Copy link
Member

ruflin commented Sep 13, 2016

It's definitively worth a discussion. But it seems to me we are discussing two things here:

  • Support for ** which can go into multiple sub directories
  • And replacing just one directory but multiple times with *

I'm not so much worried about the implementation work itself, more the potential side affects we are not aware of yet of which parts are also in the golang issue.

It especially see the need to support multiple *, I'm a little bit sceptical about ** support. I assume zglob is doing both? To keep the default setup stable and reliable I would suggest instead of replacing the current implementation with zglob or something similar, I would make it a config option that has to be turned on specifically to get support for it. This will make it possible for use to better identify problems which could be related to it and we could test it first.

Are there alternatives to zglob? Could this be done in a few lines in filebeat itself to not have an additional dependency and be able to fix bugs directly?

@runningman84
Copy link
Author

runningman84 commented Sep 13, 2016

** is not need for my use case, this would be enough

'/opt/codedeploy-agent/deployment-root/*/*/logs/scripts.log'

@catjopdx
Copy link

catjopdx commented Oct 6, 2016

Potential workaround:

Currently, unknown depth of subdirectories is not supported.

However, if the depth is known, like it is exactly 4, you could use wildcards to do something like:
`['path////']'

If limit can be 2, 3 or 4 use ['path/*/*', 'path/*/*/*', 'path/*/*/*/*']

You can also create a shared variable to point to the root path, so your config might look like:
['${fb.watch.path}//', '${fb.watch.path}///', '${fb.watch.path}////*']

@tsg
Copy link
Contributor

tsg commented Oct 9, 2016

Multiple * are already supported. I re-confirmed this with 5.0.0-rc1, but should also work with 1.3. ** is not supported.

@runningman84 perhaps something else was wrong in your test, can you re-try it, please?

@runningman84
Copy link
Author

Yes it does work for me now.

@nostrebor
Copy link

nostrebor commented Oct 28, 2016

This is not working for me as well. Single * works, but a matching of the format:
- \\network-share\subdir\*\*\*\*.log
Yields no output. The same behavior is seen in logstash. If I replace it with a only a single wildcard:
- \\network-share\1\2\3\4\*.log
Files are discovered by both logstash and filebeat.

@raiusa
Copy link

raiusa commented Jan 18, 2017

Any update on support of * * in filebeat.
I have logs in following directory structure.

/opt/cloudera/yarn/container-logs/application_*/container_*/*.log

@raiusa
Copy link

raiusa commented Jan 18, 2017

Sorry some how it's stripping * from directory application_* and container_*
/opt/cloudera/yarn/container-logs/application_/container_/*.log

@ruflin
Copy link
Member

ruflin commented Jan 19, 2017

@raiusa I updated your post to have it posted as code. There is no update yet on this.

@kzhangworks
Copy link

kzhangworks commented Jan 25, 2017

I have the log files under the path like /data//_data//////, for filebeat, is it possible to get the real value for these wildcard * in the path, because the * in this path represents some important information, for instance the log file /data/containers/_data/container_1/serverid_123///*/serverid_123.log, can filebeat get the value like serverid_123? Thanks a lot.
Following is snippet of my filebeat.yml for these log files:
`
filebeat.prospectors:

  • input_type: log
    paths:
    • /data/*/_data/*/*/*/*/*/*.log
      `

@ruflin
Copy link
Member

ruflin commented Jan 26, 2017

@zkf9971 As far as I understand you want to add the path names as fields to the event? If yes, this would be a different feature request then this one here.

@7AC 7AC self-assigned this Mar 16, 2017
@7AC 7AC removed their assignment Apr 5, 2017
ruflin pushed a commit that referenced this issue Apr 26, 2017
Expand double wildcards into standard glob patterns, up to a maximum
depth of 8 levels after the wildcard.

Resolves #2084
@ruflin
Copy link
Member

ruflin commented Apr 26, 2017

@nostrebor @zkf9971 @raiusa @runningman84 #3980 ws just merged into master. It would be great if you could check if that works with your use case. The snapshot builds can be found here: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

@Subtalime
Copy link

Am I understanding correctly, that this is intended not to work (Filebeat 5.1.1-1.x86_64)?
Prospector Pattern:
logs/app/archive/**/*.log
Dir:
logs/app/archive/dir1/test.log
logs/app/archive/dir1/dir2/another.log

@ruflin
Copy link
Member

ruflin commented Nov 6, 2017

@Subtalime This change is only in the 6.x releases. For further questions please use discuss.

@hsluoyz
Copy link

hsluoyz commented Sep 27, 2018

Hi @ruflin

This is related to elastic/filebeat#68 Wildcards for directories should already be supported. What is the behaviour you were expecting? Are you looking for ** which crawls all subdirectories (and is not supported at the moment)?

https://github.com/elastic/filebeat/issues/68 is 404 now. What is the latest link?

@andrewkroh
Copy link
Member

That repo no longer exists, but you can read about glob support in the Filebeat documentation.

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#input-paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests