Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default filebeat.prospectors.paths to be commented out #3442

Closed
djschny opened this issue Jan 23, 2017 · 8 comments · Fixed by #4105
Closed

default filebeat.prospectors.paths to be commented out #3442

djschny opened this issue Jan 23, 2017 · 8 comments · Fixed by #4105
Labels
discuss Issue needs further discussion. Filebeat Filebeat

Comments

@djschny
Copy link

djschny commented Jan 23, 2017

Currently in filebeat, a fresh out of the box distribution has a default path of /var/log/*.log enabled:

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

While this has the benefit/intention of helping making an out-of-the-box good experience for some logs, I would say the any benefits of this are outweighed by the negatives and would instead like to advocate for paths being all commented out by default. The rationale behind this is as follows:

  • Folks new to filebeat that want to try it out and start indexing data are likely to have tons of logs from their laptop indexed into Elasticsearch (hopefully their local instance only); the affect is greater if the user starts filebeat as root.
  • If a user is required to pick a location to read logs from, it helps make sure they actively think about where to read logs from
@andrewkroh andrewkroh added the Filebeat Filebeat label Jan 23, 2017
@ruflin ruflin added the discuss Issue needs further discussion. label Jan 25, 2017
@ruflin
Copy link
Member

ruflin commented Jan 25, 2017

As you mentioned we try to make the beats work out of the box, mean directly shipping data without having to modify the configuration first which also means finding the configuration when installed as package. The default output is elasticsearch at localhost:9200. I would not expect data to be sent to the wrong cluster.

I agree we should find ways to give the user some "incentive" to think about which logs he wants to harvest and configure it accordingly. Perhaps an other default could help?

@djschny
Copy link
Author

djschny commented Jan 25, 2017

I would not expect data to be sent to the wrong cluster.

Correct, that is not the issue/point, sorry if I misrepresented.

The "out of the box" experience we target with the stack in general is when a person sets up the stack on their laptop or development environment correct? In this situation ingesting the OS logs of your personal laptop probably not expected and can have negative side affects. For example in training classes we see this happen with students all the time.

As mentioned, since the default output is localhost:9200 there is an expectation that running in something other than locally, would require modification of the config regardless.

Not sure if my above comments make a difference in the viewpoint on the issue?

@ruflin
Copy link
Member

ruflin commented Jan 26, 2017

I agree that we should improve the behaviour here. The thing I worry about when we just comment it out is that people will start filebeat and say: Hm, nothing happens. The nice part now when doing demo's for example is that lots of data directly shows up in Kibana.

Perhaps this problem solves itself with the addition of the filebeat modules. So if you run filebeat and you want to work it out of the box ./filebeat -module=system and you get the same affect as now and we could comment it out by default. This makes is possible to have a working example from just using the command line without having to modify the config file. Changing output is already possible form the command line with -E output.elasticsearch.hosts=[...].

The above would require that we adapt our getting started guide to focus more on modules.

@djschny
Copy link
Author

djschny commented Jan 26, 2017

The thing I worry about when we just comment it out is that people will start filebeat and say: Hm, nothing happens.

Well that would only be true if no other changes are made. For example if no prospector is defined (ie no paths listed) then as a user I would expect either a warning, error, or message to be reported by filebeat and a message suggesting what I might do.

@ruflin
Copy link
Member

ruflin commented Jan 30, 2017

Currently filebeat will stop and print out and error. But a good error is one of the parts I were struggling with pre existence of modules. It's quite tricky to tell users in a short error message what to exactly do in a yaml file and people will get it right (without messing with the indentation) and people would have to touch the config file already for the first time use (which I don't like). But we have that kind of solved now as we can have something like `No prospector defined, use "./filebeat --module=system" to load your system log files or have the system module as defaults.

@djschny
Copy link
Author

djschny commented Jan 31, 2017

It's quite tricky to tell users in a short error message what to exactly do in a yaml file and people will get it right (without messing with the indentation) and people would have to touch the config file already for the first time use (which I don't like).

Sorry if I'm over trivializing, but isn't this really simply as you put all the details in comments in the yaml file and then in the error from command just point to the appropriate unique test to look at in the yaml file?

@ruflin
Copy link
Member

ruflin commented Feb 1, 2017

Theoretically yes, in practice unfortunately no because of YAML and indentation ... Also it does not solve the problem that I want users to be able to get started without having to touch the config file.

@pbkdf3
Copy link

pbkdf3 commented Feb 20, 2017

I found the current behavior unpleasant as a first time user of filebeat, to the point where I immediately stopped experimenting in fear of other unexpected behavior, until I had time to more carefully read documentation. The potential side effects in this case negate any value this default could provide.

Zero required config is a worthy goal, but filebeat is something that, uh, reads files. Users expect to have to tell it what files. As a user of filebeat or most programs I execute on the command line, I expect it to do nothing, maybe throw an error, if I don't tell it which files to process and what to do with them. Exceptions exist, but are very well known and thus expected (ie: "ls" with no arguments) and generally have no side effects. mv with no arguments does nothing but print it's usage.

The error can be concise and ideally with some nice FAQ in an obvious place for search engines to find and use to help your users understand what to do.

"FileBeat: Nothing to harvest/prospect/whatever, provide a valid harvester/prospector/whatever configuration."

I'm sure it could be phrased better.

ps
feel your pain re: yaml. hate it myself, especially for config files. Perhaps the root of this issue is that it's not easy to concisely explain to your users how to configure properly w/yaml.

tsg pushed a commit to tsg/beats that referenced this issue Apr 25, 2017
This does two changes:

* Adds `enabled: false` in the default prospector in the short and long configs.
  (which fixes elastic#3442)
* Updates the short config files of the modules to include the path definitions.
  I think this is better for a "module first" experience, where it gives a bit of
  context on how they work.

This means that the default configuration starts and gives no errors, but also doesn't
publish anything. IMO, this behaviour is fine considering configuration reloading.
ruflin pushed a commit that referenced this issue Apr 26, 2017
This does two changes:

* Adds `enabled: false` in the default prospector in the short and long configs.
  (which fixes #3442)
* Updates the short config files of the modules to include the path definitions.
  I think this is better for a "module first" experience, where it gives a bit of
  context on how they work.

This means that the default configuration starts and gives no errors, but also doesn't
publish anything. IMO, this behaviour is fine considering configuration reloading.

* Print error in case all prospectors are disabled and config reloading is off

Also, add the `auth` fileset to the config samples, it was missing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. Filebeat Filebeat
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants