-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce datasources in package to configure inputs and streams #212
Conversation
In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed. **Package manifest** Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config. The package manifest datasource definition looks as following (nginx example): ``` datasources: - # Do we need a name for the data source? name: nginx # List of inputs this datasource supports inputs: - # An id can be given, in case the type used here is not unique # This is for selection in the stream # id: nginx type: metrics/nginx # Common configuration options for this input vars: - name: hosts description: Nginx hosts default: ["http://127.0.0.1"] # All the config options that are required should be shown in the UI required: true - name: period description: "Collection period. Valid values: 10s, 5m, 2h" default: "10s" - name: username type: text - name: password # This is the html input type? type: password - type: logs # Common configuration options for this input vars: - type: syslog # Common configuration options for this input vars: ``` Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar. **Dataset manifest** With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following: ``` inputs: - type: "metric/nginx" # Only the variables have to be repeated that are not specified as part of the input vars: # All variables are specified in the input already ``` As an example with supporting multiple inputs, we have the nginx error logs: ``` inputs: - type: log vars: - name: paths required: true default: - /var/log/nginx/error.log* os.darwin: - /usr/local/var/log/nginx/error.log* os.windows: - c:/programdata/nginx/logs/error.log* - type: syslog vars: # Are udp and tcp syslog input two different inputs? - name: protocol.udp.host required: true default: - "localhost:9000" ``` The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again. **Stream definition** Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics: ``` input: metrics/nginx metricsets: ["stubstatus"] period: {{period}} enabled: true hosts: {{hosts}} {{#if username}} username: "{{username}}" {{/if}} {{#if password}} password: "{{password}}" {{/if}} ``` During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in. A stream definition could also support multiple inputs as seen in the following example: ``` {{#if input == log}} input: log {{#each paths}} paths: "{{this}}" {{/each}} exclude_files: [".gz$"] processors: - add_locale: ~ {{/if}} {{#if input == syslog}} input: syslog {{/if}} ``` **Further changes** * Rename `agent/input` to `agent/stream` as a stream is configured there.
@skh Could you give some feedback if the above info the the manifests and stream is enough to build a full config? |
Where can I find out what a full, correct agent config must look like? |
@ruflin format sound good, a few things:
|
# Should we define this as array? How will the UI best make sense of it? | ||
type: textarea | ||
default: | ||
- /var/log/nginx/access.log* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define this as array?
Does this mean that users can have multiple paths defined? If so, how about adding a boolean field called multi
that tells the UI that there can be multiple of the specified type
:
type: text
multi: true
For example, if we want to have multiple paths as default value, we can do like:
default:
- /var/log/nginx/access.log*
- /some/path/log/nginx/access.log*
or
default: ["/var/log/nginx/access.log*", "/some/path/log/nginx/access.log*"]
(not sure which yaml format we prefer but would be great to have it consistent across these examples 🙂)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there can be multiple default paths. Changed it do your suggestion.
os.darwin: | ||
- /usr/local/var/log/nginx/access.log* | ||
os.windows: | ||
- c:/programdata/nginx/logs/*access.log* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit confused about the format here, are these default values, but platform-specific? If so, should they be nested under default
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. They are platform specific defaults. As we should not mix array and object I wonder how the yaml should look like:
default:
all:
- /var/log/nginx/access.log*
# I suggest to use ECS fields for this config options here: https://github.com/elastic/ecs/blob/master/schemas/os.yml
# This would need to be based on a predefined definition on what can be filtered on
os.darwin:
- /usr/local/var/log/nginx/access.log*
os.windows:
- c:/programdata/nginx/logs/*access.log*
I don't like the all. Perhaps we find a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fond of all
either, and it seems silly to do default.default
🙂 Thinking about this again, what if we just nest default
under each platform? I don't know if we'll want to add other platform-specific configuration for vars, but this schema could support that too since it essentially mirrors all the top-level vars properties. As an example, maybe paths
is not required for windows
for whatever reason:
- name: paths
required: true
default:
- /var/log/nginx/access.log*
os.darwin:
default:
- /usr/local/var/log/nginx/access.log*
os.windows:
required: false
default:
- c:/programdata/nginx/logs/*access.log*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. So in summary: All configs from the top level could also be nested under each platform. Lets try it.
What I suggest for now, lets ignore platform at least on the UI side.
datasources: | ||
- | ||
# Do we need a name for the data source? | ||
name: nginx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think name would be useful to populate the id
in agent config, but we can just use the package name too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean prefixing the id? Because the id needs to be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I meant prefixing
- name: hosts | ||
description: Nginx hosts | ||
default: | ||
["http://127.0.0.1"] | ||
# All the config options that are required should be shown in the UI | ||
required: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wold like to see type
defined explicitly for all vars. And if we like the multi
suggestion in my previous comment, these should be added:
type: text
multi: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Adding.
- name: period | ||
description: "Collection period. Valid values: 10s, 5m, 2h" | ||
default: "10s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type: duration
or type: text
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for a duration type, we used integer priviously and it was not that great. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like duration if that works for the UI.
type: text | ||
- name: password | ||
# This is the html input type? | ||
type: password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to decide all possible values for type
field. They do not need to be HTML types, though. Here are the possible types I currently see (or can foresee being useful):
text -> maps to UI text input
number -> maps to UI number input
boolean -> maps to UI checkbox
duration -> maps to UI text input
this would let us do client-side validation but maybe overkill for now?
url -> maps to UI text input
this would let us do client-side validation but maybe overkill for now?
In the case of password vars, I would use text
for them as they'll show up as simple text inputs in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the above proposed types. I was thinking of password instead of text to not have to show clear text passwords directly on the screen when users put them in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: If we have these predefined values for type we can also validate it already on the package side to make sure nothing odd shows up in the package itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the URL types we have to be careful when we do this and we do it, in some modules we do use compound scheme (not sure if this is the exact term?) Like "http+npipe://./pipe/custom" see https://github.com/elastic/beats/blob/d75261469b5c2e675d7d331b86bfc8599d797ddc/metricbeat/mb/parse/url_test.go for a few examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see url support as a stretch goal. In case of doubt we can always fall back to just string. Lets keep it simple for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add granular text types (duration, url, etc) but it doesn't mean that we need to do validation right now. Validation can be added later on from all fronts (UI, agent, package) once we align on rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ++ on keeping password type. This give the UI information to decide when to show clear text inputs and when to obfuscate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as we started the UI implementation and are happy with the format, we should open a PR here with the exact specs.
- c:/programdata/nginx/logs/*access.log* | ||
# List of supported inputs | ||
inputs: | ||
- type: log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hbharding suggested today that it would be great to have descriptions in the UI. Could we add description
prop for every input, under its type
? I think these descriptions should only be a sentence or two, max. Not sure how we will localize these though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Added it. But did not add it here but on the package level where the overall inputs are defined. As a single input is shown for multiple datasets, I guess it makes more sense there.
inputs: | ||
- type: log | ||
vars: | ||
- name: paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my previous comment, it would be great to have description
prop per var too. This would be really short descriptions that give the user info about what each var is used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jen-huang @ph I did one more push with the most recent reviews. I suggest we get this PR in and then I adjust all the other packages we have + making it fully available in the our API. Having the format should already allow the UI work to get started.
- c:/programdata/nginx/logs/*access.log* | ||
# List of supported inputs | ||
inputs: | ||
- type: log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Added it. But did not add it here but on the package level where the overall inputs are defined. As a single input is shown for multiple datasets, I guess it makes more sense there.
inputs: | ||
- type: log | ||
vars: | ||
- name: paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
type: text | ||
- name: password | ||
# This is the html input type? | ||
type: password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as we started the UI implementation and are happy with the format, we should open a PR here with the exact specs.
@ruflin I am ok to merge this and iterate. |
I merged this change as it seems we are mostly happy with what we have here and will follow up with more changes for further discussions. |
In elastic#212 the datasource structure was introduced in the manifests but not exposed through the APIs yet. This PR changes this exposing all the fields. So far it only exposes the fields and no validation is done yet. This needs to be added in the future. A test package was added to see the output of datasource configs and potential changes to it.
In #212 the datasource structure was introduced in the manifests but not exposed through the APIs yet. This PR changes this exposing all the fields. So far it only exposes the fields and no validation is done yet. This needs to be added in the future. A test package was added to see the output of datasource configs and potential changes to it.
In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.
Package manifest
Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config.
The package manifest datasource definition looks as following (nginx example):
Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the
required
configs by default and all the others are under "Advanced" or similar.Dataset manifest
With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following:
As an example with supporting multiple inputs, we have the nginx error logs:
The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again.
Stream definition
Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics:
During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in.
A stream definition could also support multiple inputs as seen in the following example:
Further changes
agent/input
toagent/stream
as a stream is configured there.