Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic configuration reloading for modules in Metricbeat #3281

Merged
merged 1 commit into from
Jan 6, 2017

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Jan 4, 2017

Currently if new module configs have to be added or changed, Metricbeat needs to be restarted. This change allows to define a configuration directory where new files can be added, removed or modified and updates will automatically processed by Metricbeat. New modules will started / stopped accordingly. This is especially useful in container environments where 1 container is used to monitor all services in other containers on the same host. New containers appear and disappear dynamically which requires changes to which modules are needed and which hosts must be monitored.

Configuration

The configuration in the main metricbeat.yml config file looks as following:

metricbeat.reload.modules:
  enabled: true
  path: configs/*.yml
  period: 10s

A path with a glob most be defined on which files should be checked for changes. A period is set on how often the files are checked for changes.

The configuration inside the files which are found by the glob look as following:

- module: system
  metricsets: ["cpu"]
  enabled: false
  perdiod: 1s

- module: system
  metricsets: ["network"]
  enabled: true
  period: 10s

Each file directly contains a list of modules. Each file can contain one or multiple module definitions.

How does it work

When Metricbeat is started a process is started that monitors the defined glob files for changes. If a change is detected, all config files are reloaded and module wrappers are created. A hash is created for the module wrapper to identify if it is a module that is already running or a new module. After loading all modules, all modules which are not in the list anymore will be stopped and removed. All new ones will be started.

The effect of the above is that if 2 identical module configuration exists, only 1 is started. It is not expected to have 2 modules with the exact same configuration.

The dynamic modules have no effect on the modules defined in the global configuration file. These modules will never change and keep running. Also if there is a dynamic module with the exact same config as a global module, these two will not be correlated and both will be run.

Changes

  • Introduce configuration variables for config reloading
  • Change metricbeat behaviour that if metricbeat.config.enabled: true, on startup also 0 modules can be defined. Also it is possible that at some point in time no modules are running.
  • Change shutdown to also shut down dynamic modules
  • Expvar stats for reloading added
  • Add common configuration files to metricbeat to have additional configs except from modules
  • Add basic system tests

@ruflin ruflin added in progress Pull request is currently in progress. Metricbeat Metricbeat labels Jan 4, 2017
@urso
Copy link

urso commented Jan 4, 2017

The dynamic modules have no effect on the modules defined in the global configuration file. These modules will never change and keep running.

What if a dynamic module resembles a global module?

Is it an issue to call Connect() for each new module?

No. This is what Connect is for, connecting multiple producers to the publisher pipeline. Just close the client once you're done.

The current hash only takes module level configs into account. Metricset specific options are no added to the hash. Any idea how we could fix this? The problem is that the config for the metricset is only unloaded when the metricset is created (which is good). It would be nice to be able to calculate a "correct" hash based on common.Config or similar.

Hmmm... One solution might be to unpack the full module config into a map[string]interface{} and compute the hash from the map.

go func() {
// Start Config monitoring
configReloader.Start()
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== go configReloader.Start()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

mutex sync.Mutex
}

func NewConfigReloader(config ReloaderConfig, p publisher.Publisher) ConfigReloader {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no pointer? Using NewXXX implies a pointer being returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

}
}

func (r *ConfigReloader) Start() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Start/Run? This method implements the run-loop itself. Using Start implies a go-routine being started, but returns immediately, Run implies the method blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

close(r.done)
r.wg.Wait()
// Stop all running modules
r.StopModules(r.registry.CopyList())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively StopModules can be done via defer in run loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. moved.

continue
}

lastScan = lastScanTemp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set lastScan only if any files have been changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even thought that was lost during refactoring and initially the purpose was to only update lastScan if no error, I think it is still correct like this. If there are no file updates, it does not matter if the comparison of the files is to the most recent lastScan or one of the previous where no files were found. The resulting outcome should be the same and it keeps the code simpler. I will add a comment to the code.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 4, 2017

@urso Thanks for feedback:

  • dynamic modules resemblance. Both will be run. No correlation is done. I will update the PR description accordingly
  • Connect: Great
  • Hash: Will try if the above will work for comparison.

@@ -0,0 +1,252 @@
package beater
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the code in this file could be generalized into a libbeat package. It seems to me that it contains relatively little Metricbeat specific logic, except for the StartModules and StopModules functions which can be easily injected.

I'm thinking that we'll want to make it easy to extend this type of reloading functionality to other beats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree, that we probably should extend this to other beats, especially filebeat. Means this functionality will move to libbeat. I prefer to keep it currently in metricbeat to keep the changes as minimal as possible and do the abstraction in a second step when we add it to other beats.

@@ -10,41 +10,13 @@
#========================== Modules configuration ============================
metricbeat.modules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think metricbeat.modules needs to be removed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monicasarbu This one is WIP. I will restore the old metricbeat.yml as soon as I added automated tests. This file should not change.

@ruflin ruflin force-pushed the module-reloading branch 5 times, most recently from 3cc2565 to 802860d Compare January 4, 2017 19:08
@ruflin ruflin changed the title Dynamic configuration reloading in Metricbeat Dynamic configuration reloading for modules in Metricbeat Jan 4, 2017
@ruflin ruflin force-pushed the module-reloading branch 2 times, most recently from e94bb57 to aad532c Compare January 5, 2017 09:04
@ruflin ruflin added review and removed in progress Pull request is currently in progress. labels Jan 5, 2017
metricbeat.reload:

# Glob pattern for configuration reloading
path: config/*.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with conf.d/*.yml. I think that name might provide a hint about the purpose of the directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could create this directory automatically in /etc/metricbeat/conf.d when we install the rpm/deb. And do the same for Filebeat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to conf.d. I like the idea of creating this automatically. Reminds me to check to what the path is relative. Should the default be ${path.conf}/conf.d/*.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: Creation of the directory I would suggest not to do in this PR and look into later.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! My comments are all just minor issues.


// Config is the root of the Metricbeat configuration hierarchy.
type Config struct {
// Modules is a list of module specific configuration data.
Modules []*common.Config `config:"modules" validate:"required"`
Modules []*common.Config `config:"modules"`
ReloadModules ModulesReloadConfig `config:"reload.modules"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other parts of the config that are disabled by default should this be a pointer along with the Enabled *bool? It's a subtle difference, but it means that setting

metricbeat.reload.modules.path: conf.d/*.yml

would implicitly enable the reloading without having to also specify

metricbeat.reload.modules.enabled: true

An example of this is the tls config in the outputs which have an IsEnabled helper method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to the same logic as the output. +1 on consistency here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that didn't work as I expected. Because I have a default config, which I already load on the metricbeat level, the config is never nil. I plan to extract the reload part anyways. Ok if I do it in a second step?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fine. With using the pointer you'll need handle defaults when reading the individual values (haven't found a better way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to make ReloadModules a pointer too. But with the default present it won't be nil.

continue
}

// Directories are symlinks are skipped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Directories and symlinks"?

// Check if one of the files was changed recently
// File modification time can be in seconds. -1 is to cover for files which
// were created during this second.
if info.ModTime().Sub(gcd.lastScan) >= -1*time.Second {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use if info.ModTime().After(gcd.lastScan) here? That seems more readable. Then to account for the one second resolution you could Truncate the last scan time using if info.ModTime().After(gcd.lastScan.Truncate(time.Second)).

What systems still store the modification time in seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to your suggestion. OS X seems to store mtime in seconds. At least I only see seconds when I use stat.


hash, err := hashstructure.Hash(files, nil)
if err != nil {
logp.Err("Error hashing files list: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method continue if hashing fails? I didn't look to see what can cause an error here, but if it does fail maybe it should return the error OR not return an error and base its decision solely on the updatedFails flag.

Copy link
Contributor Author

@ruflin ruflin Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially ignored the error as I didn't see an option how we ever get here an error. But to be sure we catch it in case it happens, I added the log line.

If there is an error, the hash is 0. So in case hashing fails one time because of reason x, scan would still return changed which is the save way to go. I prefer to return changed once too many. If it continously fails, it will in the future depend on updatedFiles only. The not so nice part about this is that it does not detect removed files with updatedFiles.

Thinking about this again, probably the most consistent behaviour would be directly return files, true, error in case of an error, to make sure in an error case changed is true. Will change it to this.


// Create random temp directory
id := strconv.Itoa(rand.New(rand.NewSource(int64(time.Now().Nanosecond()))).Int())
dir := os.TempDir() + "/" + id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ioutil.TempDir("", id) for this. https://golang.org/pkg/io/ioutil/#TempDir

And you should cleanup the dir with a defer os.RemoveAll(dir).

wg sync.WaitGroup
}

type RunningRegistry struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is used internally only by the ConfigReloader. Can this be unexported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, changed


type RunningRegistry struct {
List map[uint64]ModuleRunner
mutex sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the mutex is required to access List then they should have the same visibility. But I like embedding the mutex in the struct then you can call runningReg.Lock().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

return c, nil
}

func NewRunningRegistry() RunningRegistry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this struct contains a sync.Mutex, returning a *RunningRegistry would better indicate to the caller that this object needs to be passed by reference than by value (as you should not copy a sync.Mutex).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


debugr("Number of module wrappers created: %v", len(s))

startList := []*ModuleWrapper{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use var startList []*ModuleWrapper to avoid allocating an empty map if it doesn't get used. ref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

metricbeat.reload:

# Glob pattern for configuration reloading
path: config/*.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could create this directory automatically in /etc/metricbeat/conf.d when we install the rpm/deb. And do the same for Filebeat.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 6, 2017

@andrewkroh I pushed 2 new commits. One for the general refactoring and one for the config changes to make it relative. Could you have a look?

period: 10s
------------------------------------------------------------------------------

A path with a glob most be defined on which files should be checked for changes. A period is set on how often
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/s/most/must

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


# Config reloading allows to dynamically load modules. Each file which is
# monitored must contain one or multiple modules as a list.
metricbeat.reload:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be metricbeat.reload.modules here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitively, thanks for spotting. Fixed.

metricbeat.reload:

# Glob pattern for configuration reloading
path: conf.d/*.yml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about setting this to:

path: '${path.config}/conf.d/*.yml'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

"github.com/mitchellh/hashstructure"
)

type GlobChangeDetector struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobWatcher, Wacher, or FileWatcher ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like GlobWatcher. Changing ...

// Directories and symlinks are skipped
if !info.Mode().IsRegular() {
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... having symlink support would be nice to have. This way op can enable modules selectively via ln -s... (kinda common practice in init systems, even systemd).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, enabled symlinks

@ruflin
Copy link
Contributor Author

ruflin commented Jan 6, 2017

@andrewkroh @monicasarbu @urso I applied the changes. If ok for all of you I will squash and rebase to update the commit message.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good.


// Config is the root of the Metricbeat configuration hierarchy.
type Config struct {
// Modules is a list of module specific configuration data.
Modules []*common.Config `config:"modules" validate:"required"`
Modules []*common.Config `config:"modules"`
ReloadModules ModulesReloadConfig `config:"reload.modules"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to make ReloadModules a pointer too. But with the default present it won't be nil.

"fmt"
"sync"
"time"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra line here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About Reload: Yeah, tried different versions, but didn't work. Extracting it and having a separate unpack should work (in a later PR). Not sure why I can't comment above -> Github fail.

Will remove the extra line when rebasing.

Currently if new module configs have to be added or changed, Metricbeat needs to be restarted. This change allows to define a configuration directory where new files can be added, removed or modified and updates will automatically processed by Metricbeat. New modules will started / stopped accordingly. This is especially useful in container environments where 1 container is used to monitor all services in other containers on the same host. New containers appear and disappear dynamically which requires changes to which modules are needed and which hosts must be monitored.

**Configuration**

The configuration in the main metricbeat.yml config file looks as following:
```
metricbeat.reload.modules:
  enabled: true
  path: configs/*.yml
  period: 10s
```

A path with a glob most be defined on which files should be checked for changes. A period is set on how often the files are checked for changes.

The configuration inside the files which are found by the glob look as following:
```
- module: system
  metricsets: ["cpu"]
  enabled: false
  perdiod: 1s

- module: system
  metricsets: ["network"]
  enabled: true
  period: 10s
```

Each file directly contains a list of modules. Each file can contain one or multiple module definitions.

**How does it work**

When Metricbeat is started a process is started that monitors the defined glob files for changes. If a change is detected, all config files are reloaded and module wrappers are created. A hash is created for the module wrapper to identify if it is a module that is already running or a new module. After loading all modules, all modules which are not in the list anymore will be stopped and removed. All new ones will be started.

The effect of the above is that if 2 identical module configuration exists, only 1 is started. It is not expected to have 2 modules with the exact same configuration.

The dynamic modules have no effect on the modules defined in the global configuration file. These modules will never change and keep running. Also if there is a dynamic module with the exact same config as a global module, these two will not be correlated and both will be run.

**Changes**

* Introduce configuration variables for config reloading
* Change metricbeat behaviour that if `metricbeat.config.enabled: true`, on startup also 0 modules can be defined. Also it is possible that at some point in time no modules are running.
* Change shutdown to also shut down dynamic modules
* Expvar stats for reloading added
* Add common configuration files to metricbeat to have additional configs except from modules
* Add basic system tests
@andrewkroh
Copy link
Member

jenkins, test it

@andrewkroh andrewkroh merged commit 77cdb8f into elastic:master Jan 6, 2017
@dedemorton dedemorton mentioned this pull request Jan 10, 2017
6 tasks
@dedemorton dedemorton mentioned this pull request Feb 14, 2017
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants