-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ID processor #14524
Add ID processor #14524
Conversation
@urso I'm torn about the name of this processor. Originally I had named it Do you have any suggestions about this? |
a26fc08
to
1822ca5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested a couple of minor changes plus one that will fix the build.
Would you mind adding fingerprint to the link list? It'll save you having to rebase.
I need to rethink how we organize the lists because it means a different order if you organize by processor name vs topic title.
[[uuid]] | ||
=== Generate UUID for an event | ||
|
||
The `uuid` processor generates a random but roughly ordered UUID for an event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a sentence that explains what a UUID is (for novice users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to hold off on this change until we've finalized the name of this processor. See #14524 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is moot now, since we renamed the processor to Add ID (add_id
) processor.
Re: your question about conditional coding: we only need to add conditions if the processor isn't available to all Beats. If it is, then no extra coding is required. |
41ae295
to
e4f9fe9
Compare
@urso I believe I've addressed all your feedback from the last round of review now. Please re-review when you get a chance. In particular, I'd like your thoughts on #14524 (comment) since my change there differs from the implementation you had proposed but I believe solves the underlying problem nevertheless. I do need to update/add tests in this PR but I will wait to do that based on your feedback, to avoid churn. |
Travis CI is green. Jenkins CI failures are unrelated. Merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTm
* WIP: Flake ID processor * Fleshing out implementation of generator * Rename package * Unexport const * Use increment operator * Adding processor scaffolding * Fixing default field * Adding CHANGELOG entry * Fixing compile errors * WIP: unit tests * Fixing byte copy * Fixing up tests * Adding test TODOs * Adding non-default target field unit test * Adding one more test TODO * Adding TODO for post-benchmarking * Introduce type * Adding unit test for factory * Adding unit test for mac * Adding unit test for mac * Fleshing out remaining mac unit tests * Adding tests for ES ID generator * Remove TODO after experimenting with IIFE (perf was worse) * Moving doc * Adding UUID processor to list in docs * Apply suggestions from docs code review Co-Authored-By: DeDe Morton <dede.morton@elastic.co> * Adding godoc * Rename generator function type * Exporting and adding godoc * Adding godoc * Updating godoc * Adding Unwrap error methods * Moving ES ID generator into generators package + singleton construction * Addressing Hound feedback * Renaming processor to `add_id` * Updating processor name in CHANGELOG entry * More refactoring updates * Fixing more vet errors * Unexport config struct as it's only used within this package * Fixing doc anchor * Moving generator construction to processor constructor; simplifying factory * Fixing compile error * Validate ID generator type in config * Finer-grained locking to reduce mutex contention * Initialize package global variables that depend on randomness, later * Compute last timestamp while accounting for system time going backwards * Simpler and testable timestamp() function * Adding unit test for timestamp function * Re-implementing ES timestamp algorithm * Removing unused variable
@ycombinator This may be a dumb question (and wrong place to ask it), but with time based UUID generator combined with high volume and multiple filebeat servers what are the chances for duplicate UUIDs? I'm only asking because 20 character UUID add_id generates is considerably smaller than what fingerprint logstash filter plugin generates (36 chars) which add_id replaced in our environment. |
This PR introduces a new
add_id
processor that generates unique IDs for events to use.The processor will take the following configuration options:
target_field
@metadata.id
type
elasticsearch
Currently the only type of ID that can be generated using this processor is
elasticsearch
. IDs of this type are generated using the same algorithm that Elasticsearch uses for its auto-generated document IDs. These IDs are conceptually similar to Flake IDs in that the ID generation algorithm generates IDs that are roughly ordered as time progresses. However, there are some optimizations done with choosing the ordering of the bytes in the ID to give Elasticsearch a better chance of compressing the IDs.Related: #14363.