-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multiple inputs per dataset #346
Conversation
util/dataset.go
Outdated
@@ -44,6 +44,7 @@ type Stream struct { | |||
Input string `config:"input" json:"input" validate:"required"` | |||
Vars []Variable `config:"vars" json:"vars,omitempty" yaml:"vars,omitempty"` | |||
Dataset string `config:"dataset" json:"dataset,omitempty" yaml:"dataset,omitempty"` | |||
File string `config:"file" json:"file,omitempty" yaml:"file,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if File is the best name here? Also was undecided if I should add the .yml
to it or not. Feedback welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does File
contain the same content as Template
, but inside the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, file is basically the path (should we call it path?) and template is the content of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there is a chance that the same Stream has defined both, File
and Template
?
If not, then I suggest to consider a following structure:
...
template: {
type: "text"
source: "foo: bar"
}
...
...
template: {
type: "file"
source: "/foo/bar/baz.yml"
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user must define the file path here as File, the Template part will then automatically be read in through the code: #335 So yes, both exist.
I might have missed something, but does it mean that all conditions like the following are not supported? {{#if input == "httpjson"}}
input: httpjson
api_key: {{this.api_key}}
authentication_scheme: {{this.authentication_scheme}}
http_client_timeout: {{this.http_client_timeout}}
http_method: {{this.http_method}}
http_headers: {{this.http_headers}}
http_request_body: {{this.http_request_body}}
no_http_body: {{this.no_http_body}}
interval: {{this.interval}}
json_objects_array: {{this.json_objects_array}}
pagination: {{this.pagination}}
rate_limit: {{this.rate_limit}}
url: {{this.url}}
ssl: {{this.ssl}}
{{else if input == "file"}}
input: log
paths:
{{#each paths}}
- {{this}}
{{/each}}
exclude_files: [".gz$"]
{{/if}} If so, then I need to adjust the |
@mtojek Yes, it would change the above. But still debating if it is the right approach to go. Initially I wanted to have both in one file as I was thinking it would simplify building the stream template and we already have it this way. The change was initially triggered by |
@@ -265,14 +275,25 @@ | |||
} | |||
], | |||
"dataset": "datasources.examplelog2", | |||
"template": "foo: bar\n", | |||
"template.path": "stream.yml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek Got stock on this PR as I realised the .
in the output here will probably cause some troubles in Kibana to read it :-( I still like the naming but need to find a way to make sure we expand it here but still have a short version in the yaml for devs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with leaving this as is if it brings more problems that helps ;)
BTW I see that the CI is failing, you can convert this PR to draft to let us know when it's ready for review. |
This approach could potentially explode in having all the inputs available per stream isn't it? for example, I could also be consuming nginx logs from Amazon S3. We had this discussion about having a routing pipeline matching some field (for instance it could be using |
@exekias Yes, there are many options how to consume the same data. My assumption is that each config will look quite different for the different inputs. For the routing pipeline: I think there are 2 parts here. One is same log data coming from different inputs and having slightly different content. Because of this is needs different processing. In this case I expect a dataset to have multiple ingest pipelines with a routing pipeline that decides on some fields / data on how to process it. This is supported already today for example with json and non json logs. The other part is around having lots of data coming from a single input (s3 as an example) and containing data for nginx access and error logs in one stream. The above will not solve this. Is that what you are referring to @exekias ? |
Yes, I'm referring to this later case. Just wondering if we should go deeper in this configuration of input and integration together, or work on disconnect this concerns, where the input should make sure that the data reaches ingestion (with the proper fields), and then it gets routed to the integration pipelines |
My current thinking: For the specific service packages like nginx we are in the first case. But there will probably be more generic packages like S3 that basically configure the input and there we potentially not some routing. Had a quick chat about this @exekias and we need to do some more brainstorming here and then get back to this issue with an update. |
A dataset can support more then one input. For example an nginx access log can come from file or syslog. To support this, each stream definition inside a dataset must be able to reference its own config template file. Because of this, a config option `file` was added to the dataset manifest. It only has to be specified if there is more then one input, otherwise it falls back to `stream.yml` as the default. I don't expect this to have any effects on the data source creation at the moment as this is only additional information and we don't support multiple inputs in the Ingest Manager yet.
d4decfc
to
4e3bace
Compare
@mtojek I slightly modified this PR to for now stick to |
Sure, let's move on. Thank you for heads up. I assume that the next step would be adjusting the import-beats script to support multiple stream templates? |
I wonder if it is worth the effort and complexity to support this in the import script. In how many cases do we have this? 2-3? I'll leave it up to you to see if the effort is worth it. I will also follow up with either an issue or PR to make sure we don't forget about renaming the config options later. Could you give "approve" PR if ok to merge? |
A dataset can support more then one input. For example an nginx access log can come from file or syslog. To support this, each stream definition inside a dataset must be able to reference its own config template file. Because of this, a config option
file
was added to the dataset manifest. It only has to be specified if there is more then one input, otherwise it falls back tostream.yml
as the default.I don't expect this to have any effects on the data source creation at the moment as this is only additional information and we don't support multiple inputs in the Ingest Manager yet.