-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support stored scripts #5339
Support stored scripts #5339
Conversation
@kvch I think it's enough if we support painless for now. Is there a reason we would need others? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really great to have the scripts now in separate files. It makes them much more readable.
For loading the script I think there are two ways:
- Either we take the scripts in the files, remove all new lines and still load it with the pipeline
- We load them separately and reference the id
The first case would be more similar to what we have now and would attach one script to a specific pipeline. This makes sure a script does not conflict with existing script and we don't have to deal with overwriting scripts etc.
If we load the script separately, we should make sure to namespace the script ids so they don't conflict with other scripts.
I could see us in the long term even supporting both cases. My preference for now is on option as I think it creates less movable parts in ES.
filebeat/fileset/modules.go
Outdated
s := make(map[string]string) | ||
s["lang"] = lang | ||
s["source"] = source | ||
payload := make(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we use common.MapStr
here mainly because we use it everywhere else where we have to create json. It has the same base interface.
It is true that right now we only have painless scripts. But to be more future-proof and leave more room to play with FB modules it would be ideal to have all languages which is supported by ES. I tried to come up with names that would not conflict with other scripts in users' ES. However, in the end I think it is the users' responsibility to make sure their script ids are not conflicting with FB's. |
In 6.0 the only general purpose language left is painless: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/modules-scripting.html For the script naming if we go with approach 2 we should have at least a prefix with the beat name, something like |
Great! How would you handle situations if a user wants to have a special-purpose language? Would you return an error or just ignore it and tell the user to rewrite it in painless? Yes, you are right. First I named the files "filebeat-stored-is-private-ip", etc. But I seemed weird, so I deleted it. But I will reintroduce it. |
7b589b7
to
ae2d428
Compare
@kvch What do you think about shipping the scripts with the processor instead of uploading them as separate script? Are there any downside of uploading it as part of the processor? For the special purpose language: I would wait until this actually happens and just mention in the docs painless is the supported language. I assume already know if someone defines the pipeline and doesn't use an id, he can use whatever language he wants? |
ae2d428
to
1757f7d
Compare
@ruflin I understand that you would like to eliminate the possibility of collusion. However, if we add scripts to pipelines, we take away the possibility from users to view, edit or debug their scripts after feeding it into ES. Furthermore, there is a Filebeat module for Kafka GC logs based on @urso's Kafka blogpost, but right now ES can't process its pipeline, because it includes too many scripts. But I rebased the branch on top of this one, and now it works smoothly. I thinks prefixing names with "filebeat-stored-" should be enough to avoid collisions with users' stored scripts. I am okay with documenting this behaviour. But I still need to implement graceful degradation in case users' would like to upload special-purpose languages. |
One more question for the script loading: In case X-Pack is enabled, are the same permission required to load a pipeline like are needed to load a script? For the user modification of scripts: In general I like that idea to offer this flexibility to the user. But do we expect the user to do this modifications in ES or on the Beats side in the script file? In case someone changes the script, I assume he would also have to modify other things from the module? |
@kvch Can you elaborate on |
@ruflin There is one privilege for pipeline handling called It would be nice if both scenarios were supported. But currently, if someone changes the script in ES, it should be changed in the FB module, too. It would be a nice addition to have a script/tool which exports scripts from ES and puts it under the right FB module. For now changing script in FB seems to be more comfortable. The ingesting the pipeline of Kafka GC logs are not valid reason anymore, because the pipeline can be loaded using ES 6.0.0-rc1. The way script compilation is limited was changes since I last tested in August. |
filebeat/fileset/modules.go
Outdated
@@ -296,6 +296,11 @@ func (reg *ModuleRegistry) LoadPipelines(esClient PipelineLoader) error { | |||
|
|||
func loadScript(name, source string, esClient PipelineLoader) error { | |||
parts := strings.Split(name, ".") | |||
|
|||
if parts[1] != "painless" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably check that len(parts) > 1
266273d
to
5c15275
Compare
@kvch I'm thinking that in order to reduce confusion, it would be good for the scripts to follow the same practice as the pipelines, which is to encode the version in the pipeline, e.g. |
I would consider explicitly listing the scripts in the |
d53ce61
to
d2363dc
Compare
filebeat/fileset/fileset.go
Outdated
} | ||
|
||
// getScriptIdTemplate returns the Ingest Node pipeline ID | ||
func (fs *Fileset) getScriptIdTemplate(beatVersion string) *template.Template { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getScriptIdTemplate should be getScriptIDTemplate
filebeat/fileset/fileset.go
Outdated
manifest *manifest | ||
vars map[string]interface{} | ||
pipelineID string | ||
scriptIdTemplate *template.Template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct field scriptIdTemplate should be scriptIDTemplate
filebeat/fileset/modules.go
Outdated
return nil, "", fmt.Errorf("Only painless scripts can be stored for pipelines") | ||
} | ||
|
||
scriptId := bytes.NewBufferString("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var scriptId should be scriptID
filebeat/fileset/fileset.go
Outdated
} | ||
|
||
// getScriptIdTemplate returns the Ingest Node pipeline ID | ||
func (fs *Fileset) getScriptIdTemplate(beatVersion string) *template.Template { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getScriptIdTemplate should be getScriptIDTemplate
filebeat/fileset/fileset.go
Outdated
manifest *manifest | ||
vars map[string]interface{} | ||
pipelineID string | ||
scriptIdTemplate *template.Template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct field scriptIdTemplate should be scriptIDTemplate
filebeat/fileset/modules.go
Outdated
return nil, "", fmt.Errorf("Only painless scripts can be stored for pipelines") | ||
} | ||
|
||
scriptId := bytes.NewBufferString("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var scriptId should be scriptID
2d673db
to
d05c616
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really looking forward to have this feature in FB as it makes writing and reading scripts so much nicer.
Some tests seem to break at the moment because of this change.
We probably should verify if this still works with the packaging as we add new files to make sure also these new script files are shipped.
As a follow up PR it would be good nice to have a system tests that checks the setup and starting of filebeat to verify that the scripts are actually loaded.
filebeat/fileset/fileset.go
Outdated
return fs.pipelineID, content, nil | ||
} | ||
|
||
// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch | ||
func formatScriptIDTemplate(module, fileset, beatVersion string) string { | ||
return fmt.Sprintf("filebeat-%s-%s-%s-{{.}}", beatVersion, module, fileset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is {{.}}
doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is replaced by the file / script name?
@@ -12,3 +12,7 @@ var: | |||
|
|||
ingest_pipeline: ingest/pipeline.json | |||
prospector: config/log.yml | |||
|
|||
pipeline_script: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should rename this to scripts
instead of pipeline_script(s)
as in ES these are just scripts and not related to a pipeline as far as I know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK no. I added pipeline_
prefix so it is unambiguous that is a pipeline setting, because manifest includes other types of settings, too. If it's not needed I am happy to remove the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it. @tsg please object if not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can do without the prefix.
c157614
to
7265dd6
Compare
filebeat/fileset/fileset.go
Outdated
|
||
} | ||
|
||
// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should probably say "used for the pipline script IDs".
filebeat/fileset/fileset.go
Outdated
return scripts, nil | ||
} | ||
|
||
// getScriptIDTemplate returns the Ingest Node pipeline ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment seems wrong, I guess it is about the script IDs.
} | ||
|
||
scriptElemFull := fmt.Sprintf(scriptPipelinePattern, scriptID.String()) | ||
jsonString = strings.Replace(jsonString, scriptElem, scriptElemFull, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string replacement for this seems brittle. Alternatively we could go through the JSON keys and look for the script tags. I think we do something like that for the ML job ID replacement. This gets more complicated if we look into anything but the top level processors, but I think that's all we need for the moment? What do you think?
7265dd6
to
e06abcd
Compare
e06abcd
to
85703d2
Compare
@@ -257,6 +258,105 @@ func (reg *ModuleRegistry) GetInputConfigs() ([]*common.Config, error) { | |||
return result, nil | |||
} | |||
|
|||
// PipelineLoader factory builds and returns a PipelineLoader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported type PipelineLoaderFactory should be of the form "PipelineLoaderFactory ..." (with optional leading article)
func substituteScriptIDs(jsonString, name string, t *template.Template) (string, error) { | ||
p := strings.Split(name, ".") | ||
if len(p) != 2 { | ||
return "", fmt.Errorf("Error substituting script ids: invalid script name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error strings should not be capitalized or end with punctuation or a newline
o7 |
This pull request is now in conflicts. Could you fix it? 🙏
|
@kvch I think we can close it. |
Previously, scripts of pipelines were inlined in JSON files. By moving scripts into separate files it makes them readable and testable. Now scripts are placed under
ingest/script
in every fileset which has scripts in its pipeline. Of course, it is still possible to use inline scripts in pipelines.Now only painless scripts are supported.
TODO