Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auditbeat: initial_scan action for new paths #7954

Merged
merged 1 commit into from
Aug 16, 2018

Conversation

cwurm
Copy link
Contributor

@cwurm cwurm commented Aug 13, 2018

Introduces a new found action that is used by the scanner on startup (when scan_at_start: true) to mark files found in configured paths that it has not seen before.

Addresses #7821

@cwurm
Copy link
Contributor Author

cwurm commented Aug 13, 2018

A bit more detail: The new logic follows Option 2 laid out in a comment in the original issue. Before the scanner is called, the Metricset checks which paths from the config are not contained in the db and are therefore assumed to be new. It passes this list to the scanner which allows it to correctly mark new dirs and files it sees as either found (when they are in a new path) or created (when they are in a previously known path).

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to take a deeper look at this today. Thanks for opening a PR!

@@ -177,6 +179,32 @@ func (ms *MetricSet) init(reporter mb.PushReporterV2) bool {
return true
}

// Determine which - if any - paths have been newly added to the config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are great. I just want to encourage you to use the godoc format (even for private methods) which is to start the sentence with the name of the method (e.g. findNewPaths finds the new paths...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer, I fixed it here and will try to remember going forward.

Copy link
Contributor

@adriansr adriansr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment, otherwise LGTM.

}

// NewFileSystemScanner creates a new EventProducer instance that scans the
// configured file paths.
func NewFileSystemScanner(c Config) (EventProducer, error) {
func NewFileSystemScanner(c Config, newP map[string]bool) (EventProducer, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the function's comment to reflect the meaning of this parameter and maybe give it a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@adriansr
Copy link
Contributor

There's an issue with the CLA related to the e-mail address you used for this commits. I guess it can be fixed by associating this e-mail address with your Github account

@cwurm
Copy link
Contributor Author

cwurm commented Aug 14, 2018

There's an issue with the CLA related to the e-mail address you used for this commits. I guess it can be fixed by associating this e-mail address with your Github account

Thanks, fixed it.

Test failed in CI that didn't fail locally, I'm not sure why. Looking into it.

@cwurm
Copy link
Contributor Author

cwurm commented Aug 14, 2018

Test failed in CI that didn't fail locally, I'm not sure why. Looking into it.

Found it: When the test runs, there's a race about whether the changes are found by the scanner or by filesystem notification (both are started at the same time). The scanner usually wins, and the actions it sets is what the test was originally written for. The filesystem reports slightly different actions (e.g. created|updated instead of created|attributes_modified). I've changed the test to only test for the common actions (e.g. created).

It's a bit of a shame though that the reported actions are not deterministic, and maybe this is something that we can normalize in the future.

I've pushed a commit, let's hope the CI is good with it. It's a non-deterministic failure, so it's a bit tricky. I tested locally with -run 1000 to be sure.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the solution you reached here. I left a few comments.

@@ -177,6 +179,32 @@ func (ms *MetricSet) init(reporter mb.PushReporterV2) bool {
return true
}

// findNewPaths determines which - if any - paths have been newly added to the config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a double space between paths have. And it should have a period to end the sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

config Config
log *logp.Logger
config Config
newPaths map[string]bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using a map as a set data structure you can use map[string]struct{}. To me this is more clear because it's saying that you don't care about the value and that when you use map you will only be checking whether the key exists (rather than that the key exists and its value is true). Also it may save a few bytes.

m := map[string]struct{}{} // Init.

m[name] = struct{} // Add.

if _, exists := m[name]; exists {  // Contains.
  // Exists!
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, I like that it takes 0 bytes - done.

continue
}

lastEvent, err := load(ms.bucket, evalPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like nice clear solution to the problem. 👍

@@ -47,6 +48,7 @@ var actionNames = map[Action]string{
Updated: "updated",
Moved: "moved",
ConfigChange: "config_change",
Found: "found",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this flag indicates that the event was generated due to the initial scan? If so how about we call it initial_scan rather than found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - changed it.

@cwurm cwurm force-pushed the auditbeat_found_status branch from 9db5594 to 2cb655c Compare August 16, 2018 14:07
@cwurm cwurm changed the title Auditbeat: found action for new paths Auditbeat: initial_scan action for new paths Aug 16, 2018
@cwurm
Copy link
Contributor Author

cwurm commented Aug 16, 2018

Thanks @andrewkroh - all implemented. Travis is unhappy, but it doesn't look related.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cwurm cwurm force-pushed the auditbeat_found_status branch from 9a647ef to a1b5e8a Compare August 16, 2018 16:10
@andrewkroh
Copy link
Member

@cwurm A few notes on your commit message:

  • The first line should be a short summary.

  • Follow that by a blank line.

  • Then do your detailed description.

  • If you want the related issue to auto-close on merge (your choice) then use one of the keywords recognized by GH.

When `scan_at_start: true` the filesystem scanner will look through
all paths and record any new or modified files and directories it finds.
This commit introduces a new action `initial_scan` that the scanner uses
when it scans a path for the first time. Previously, the scanner would
report everything under a new path as `created`, making it impossible
to distinguish between files and directories that are truly new (e.g.
have been added during a restart of Auditbeat) and those that are merely
new because they are under a newly added path in the configuration.

Resolves elastic#7821
@cwurm cwurm force-pushed the auditbeat_found_status branch from a1b5e8a to de677c8 Compare August 16, 2018 16:56
@cwurm
Copy link
Contributor Author

cwurm commented Aug 16, 2018

Thanks @andrewkroh - I've changed it to follow the format and made it more descriptive. Let me know if anything is still missing.

@andrewkroh andrewkroh merged commit 95e53bc into elastic:master Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants