Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support stored scripts #5339

Closed

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Oct 5, 2017

Previously, scripts of pipelines were inlined in JSON files. By moving scripts into separate files it makes them readable and testable. Now scripts are placed under ingest/script in every fileset which has scripts in its pipeline. Of course, it is still possible to use inline scripts in pipelines.

Now only painless scripts are supported.

TODO

  • fail gracefully if non-general purpose language is used

@kvch kvch added Filebeat Filebeat in progress Pull request is currently in progress. review labels Oct 5, 2017
@ruflin
Copy link
Contributor

ruflin commented Oct 6, 2017

@kvch I think it's enough if we support painless for now. Is there a reason we would need others?

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really great to have the scripts now in separate files. It makes them much more readable.

For loading the script I think there are two ways:

  • Either we take the scripts in the files, remove all new lines and still load it with the pipeline
  • We load them separately and reference the id

The first case would be more similar to what we have now and would attach one script to a specific pipeline. This makes sure a script does not conflict with existing script and we don't have to deal with overwriting scripts etc.

If we load the script separately, we should make sure to namespace the script ids so they don't conflict with other scripts.

I could see us in the long term even supporting both cases. My preference for now is on option as I think it creates less movable parts in ES.

s := make(map[string]string)
s["lang"] = lang
s["source"] = source
payload := make(map[string]interface{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we use common.MapStr here mainly because we use it everywhere else where we have to create json. It has the same base interface.

@kvch
Copy link
Contributor Author

kvch commented Oct 6, 2017

It is true that right now we only have painless scripts. But to be more future-proof and leave more room to play with FB modules it would be ideal to have all languages which is supported by ES.

I tried to come up with names that would not conflict with other scripts in users' ES. However, in the end I think it is the users' responsibility to make sure their script ids are not conflicting with FB's.

@ruflin
Copy link
Contributor

ruflin commented Oct 6, 2017

In 6.0 the only general purpose language left is painless: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/modules-scripting.html

For the script naming if we go with approach 2 we should have at least a prefix with the beat name, something like filebeat-is-private-ip. I think it's our responsibility to heavily reduce the chance of name conflicts. I don't expect a user that load filebeat modules to know what the name of our scripts are and to check if they don't conflict with his.

@kvch
Copy link
Contributor Author

kvch commented Oct 6, 2017

Great! How would you handle situations if a user wants to have a special-purpose language? Would you return an error or just ignore it and tell the user to rewrite it in painless?

Yes, you are right. First I named the files "filebeat-stored-is-private-ip", etc. But I seemed weird, so I deleted it. But I will reintroduce it.

@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch from 7b589b7 to ae2d428 Compare October 6, 2017 15:25
@ruflin
Copy link
Contributor

ruflin commented Oct 9, 2017

@kvch What do you think about shipping the scripts with the processor instead of uploading them as separate script? Are there any downside of uploading it as part of the processor?

For the special purpose language: I would wait until this actually happens and just mention in the docs painless is the supported language. I assume already know if someone defines the pipeline and doesn't use an id, he can use whatever language he wants?

@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch from ae2d428 to 1757f7d Compare October 9, 2017 15:47
@kvch
Copy link
Contributor Author

kvch commented Oct 9, 2017

@ruflin I understand that you would like to eliminate the possibility of collusion. However, if we add scripts to pipelines, we take away the possibility from users to view, edit or debug their scripts after feeding it into ES.

Furthermore, there is a Filebeat module for Kafka GC logs based on @urso's Kafka blogpost, but right now ES can't process its pipeline, because it includes too many scripts. But I rebased the branch on top of this one, and now it works smoothly.

I thinks prefixing names with "filebeat-stored-" should be enough to avoid collisions with users' stored scripts.

I am okay with documenting this behaviour. But I still need to implement graceful degradation in case users' would like to upload special-purpose languages.

@ruflin
Copy link
Contributor

ruflin commented Oct 10, 2017

One more question for the script loading: In case X-Pack is enabled, are the same permission required to load a pipeline like are needed to load a script?

For the user modification of scripts: In general I like that idea to offer this flexibility to the user. But do we expect the user to do this modifications in ES or on the Beats side in the script file? In case someone changes the script, I assume he would also have to modify other things from the module?

@ruflin
Copy link
Contributor

ruflin commented Oct 10, 2017

@kvch Can you elaborate on ES can't process its pipeline, because it includes too many scripts.? Means if a pipeline contains scripts that are too long directly, it does not work?

@kvch
Copy link
Contributor Author

kvch commented Oct 11, 2017

@ruflin There is one privilege for pipeline handling called manage_pipeline. This includes all pipeline operations. As both inline and stored scripts are pipeline processors, it means that if someone has permission to load a pipeline, he/she can load scripts, too.

It would be nice if both scenarios were supported. But currently, if someone changes the script in ES, it should be changed in the FB module, too. It would be a nice addition to have a script/tool which exports scripts from ES and puts it under the right FB module. For now changing script in FB seems to be more comfortable.

The ingesting the pipeline of Kafka GC logs are not valid reason anymore, because the pipeline can be loaded using ES 6.0.0-rc1. The way script compilation is limited was changes since I last tested in August.

@kvch kvch removed the in progress Pull request is currently in progress. label Oct 12, 2017
@@ -296,6 +296,11 @@ func (reg *ModuleRegistry) LoadPipelines(esClient PipelineLoader) error {

func loadScript(name, source string, esClient PipelineLoader) error {
parts := strings.Split(name, ".")

if parts[1] != "painless" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably check that len(parts) > 1

@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch from 266273d to 5c15275 Compare October 13, 2017 14:44
@tsg
Copy link
Contributor

tsg commented Nov 2, 2017

@kvch I'm thinking that in order to reduce confusion, it would be good for the scripts to follow the same practice as the pipelines, which is to encode the version in the pipeline, e.g. filebeat-7.0.0-alpha1-nginx-error-pipeline. Otherwise, if you have two versions of Filebeat running, and the script was updated between them, you get the older version use the newer version of the script.

@tsg
Copy link
Contributor

tsg commented Nov 2, 2017

I would consider explicitly listing the scripts in the manifest.yml file, instead of loading them all from the filesystem. This is what we do for the other Filebeat modules artifacts (e.g. ML jobs). I don't have a reason for which necessarily need this today, but I'd be more comfortable if we follow the same pattern for every such thing.

@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch from d53ce61 to d2363dc Compare December 6, 2017 22:03
}

// getScriptIdTemplate returns the Ingest Node pipeline ID
func (fs *Fileset) getScriptIdTemplate(beatVersion string) *template.Template {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func getScriptIdTemplate should be getScriptIDTemplate

manifest *manifest
vars map[string]interface{}
pipelineID string
scriptIdTemplate *template.Template

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field scriptIdTemplate should be scriptIDTemplate

return nil, "", fmt.Errorf("Only painless scripts can be stored for pipelines")
}

scriptId := bytes.NewBufferString("")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var scriptId should be scriptID

}

// getScriptIdTemplate returns the Ingest Node pipeline ID
func (fs *Fileset) getScriptIdTemplate(beatVersion string) *template.Template {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func getScriptIdTemplate should be getScriptIDTemplate

manifest *manifest
vars map[string]interface{}
pipelineID string
scriptIdTemplate *template.Template

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field scriptIdTemplate should be scriptIDTemplate

return nil, "", fmt.Errorf("Only painless scripts can be stored for pipelines")
}

scriptId := bytes.NewBufferString("")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var scriptId should be scriptID

@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch 2 times, most recently from 2d673db to d05c616 Compare December 11, 2017 17:23
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really looking forward to have this feature in FB as it makes writing and reading scripts so much nicer.

Some tests seem to break at the moment because of this change.

We probably should verify if this still works with the packaging as we add new files to make sure also these new script files are shipped.

As a follow up PR it would be good nice to have a system tests that checks the setup and starting of filebeat to verify that the scripts are actually loaded.

return fs.pipelineID, content, nil
}

// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch
func formatScriptIDTemplate(module, fileset, beatVersion string) string {
return fmt.Sprintf("filebeat-%s-%s-%s-{{.}}", beatVersion, module, fileset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is {{.}} doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is replaced by the file / script name?

@@ -12,3 +12,7 @@ var:

ingest_pipeline: ingest/pipeline.json
prospector: config/log.yml

pipeline_script:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should rename this to scripts instead of pipeline_script(s) as in ES these are just scripts and not related to a pipeline as far as I know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK no. I added pipeline_ prefix so it is unambiguous that is a pipeline setting, because manifest includes other types of settings, too. If it's not needed I am happy to remove the prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it. @tsg please object if not the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can do without the prefix.

@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch 3 times, most recently from c157614 to 7265dd6 Compare December 12, 2017 19:54

}

// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should probably say "used for the pipline script IDs".

return scripts, nil
}

// getScriptIDTemplate returns the Ingest Node pipeline ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment seems wrong, I guess it is about the script IDs.

}

scriptElemFull := fmt.Sprintf(scriptPipelinePattern, scriptID.String())
jsonString = strings.Replace(jsonString, scriptElem, scriptElemFull, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string replacement for this seems brittle. Alternatively we could go through the JSON keys and look for the script tags. I think we do something like that for the ML job ID replacement. This gets more complicated if we look into anything but the top level processors, but I think that's all we need for the moment? What do you think?

@kvch kvch added the in progress Pull request is currently in progress. label Dec 26, 2017
@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch from 7265dd6 to e06abcd Compare January 11, 2018 20:06
@kvch kvch removed the review label May 14, 2018
@kvch kvch force-pushed the feature/filebeat/pipeline-stored-script branch from e06abcd to 85703d2 Compare May 22, 2018 08:16
@@ -257,6 +258,105 @@ func (reg *ModuleRegistry) GetInputConfigs() ([]*common.Config, error) {
return result, nil
}

// PipelineLoader factory builds and returns a PipelineLoader

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported type PipelineLoaderFactory should be of the form "PipelineLoaderFactory ..." (with optional leading article)

func substituteScriptIDs(jsonString, name string, t *template.Template) (string, error) {
p := strings.Split(name, ".")
if len(p) != 2 {
return "", fmt.Errorf("Error substituting script ids: invalid script name.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error strings should not be capitalized or end with punctuation or a newline

@ruflin
Copy link
Contributor

ruflin commented Nov 23, 2018

@kvch I would really like to reactive this PR as I think it would make creating scripts much nicer.

@jsoriano @sayden FYI

@kvch
Copy link
Contributor Author

kvch commented Nov 23, 2018

o7

@mergify
Copy link
Contributor

mergify bot commented Apr 7, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/filebeat/pipeline-stored-script upstream/feature/filebeat/pipeline-stored-script
git merge upstream/master
git push upstream feature/filebeat/pipeline-stored-script

@urso
Copy link

urso commented Apr 7, 2021

@kvch I think we can close it.

@kvch kvch closed this Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat in progress Pull request is currently in progress. module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants