Support stored scripts #5339

kvch · 2017-10-05T18:27:48Z

Previously, scripts of pipelines were inlined in JSON files. By moving scripts into separate files it makes them readable and testable. Now scripts are placed under ingest/script in every fileset which has scripts in its pipeline. Of course, it is still possible to use inline scripts in pipelines.

Now only painless scripts are supported.

TODO

fail gracefully if non-general purpose language is used

ruflin · 2017-10-06T12:01:16Z

@kvch I think it's enough if we support painless for now. Is there a reason we would need others?

ruflin

It's really great to have the scripts now in separate files. It makes them much more readable.

For loading the script I think there are two ways:

Either we take the scripts in the files, remove all new lines and still load it with the pipeline
We load them separately and reference the id

The first case would be more similar to what we have now and would attach one script to a specific pipeline. This makes sure a script does not conflict with existing script and we don't have to deal with overwriting scripts etc.

If we load the script separately, we should make sure to namespace the script ids so they don't conflict with other scripts.

I could see us in the long term even supporting both cases. My preference for now is on option as I think it creates less movable parts in ES.

ruflin · 2017-10-06T12:02:44Z

filebeat/fileset/modules.go

+	s := make(map[string]string)
+	s["lang"] = lang
+	s["source"] = source
+	payload := make(map[string]interface{})


I suggest we use common.MapStr here mainly because we use it everywhere else where we have to create json. It has the same base interface.

kvch · 2017-10-06T14:16:26Z

It is true that right now we only have painless scripts. But to be more future-proof and leave more room to play with FB modules it would be ideal to have all languages which is supported by ES.

I tried to come up with names that would not conflict with other scripts in users' ES. However, in the end I think it is the users' responsibility to make sure their script ids are not conflicting with FB's.

ruflin · 2017-10-06T14:24:10Z

In 6.0 the only general purpose language left is painless: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/modules-scripting.html

For the script naming if we go with approach 2 we should have at least a prefix with the beat name, something like filebeat-is-private-ip. I think it's our responsibility to heavily reduce the chance of name conflicts. I don't expect a user that load filebeat modules to know what the name of our scripts are and to check if they don't conflict with his.

kvch · 2017-10-06T14:57:43Z

Great! How would you handle situations if a user wants to have a special-purpose language? Would you return an error or just ignore it and tell the user to rewrite it in painless?

Yes, you are right. First I named the files "filebeat-stored-is-private-ip", etc. But I seemed weird, so I deleted it. But I will reintroduce it.

ruflin · 2017-10-09T06:38:03Z

@kvch What do you think about shipping the scripts with the processor instead of uploading them as separate script? Are there any downside of uploading it as part of the processor?

For the special purpose language: I would wait until this actually happens and just mention in the docs painless is the supported language. I assume already know if someone defines the pipeline and doesn't use an id, he can use whatever language he wants?

kvch · 2017-10-09T16:40:09Z

@ruflin I understand that you would like to eliminate the possibility of collusion. However, if we add scripts to pipelines, we take away the possibility from users to view, edit or debug their scripts after feeding it into ES.

Furthermore, there is a Filebeat module for Kafka GC logs based on @urso's Kafka blogpost, but right now ES can't process its pipeline, because it includes too many scripts. But I rebased the branch on top of this one, and now it works smoothly.

I thinks prefixing names with "filebeat-stored-" should be enough to avoid collisions with users' stored scripts.

I am okay with documenting this behaviour. But I still need to implement graceful degradation in case users' would like to upload special-purpose languages.

ruflin · 2017-10-10T11:09:37Z

One more question for the script loading: In case X-Pack is enabled, are the same permission required to load a pipeline like are needed to load a script?

For the user modification of scripts: In general I like that idea to offer this flexibility to the user. But do we expect the user to do this modifications in ES or on the Beats side in the script file? In case someone changes the script, I assume he would also have to modify other things from the module?

ruflin · 2017-10-10T11:10:32Z

@kvch Can you elaborate on ES can't process its pipeline, because it includes too many scripts.? Means if a pipeline contains scripts that are too long directly, it does not work?

kvch · 2017-10-11T11:28:34Z

@ruflin There is one privilege for pipeline handling called manage_pipeline. This includes all pipeline operations. As both inline and stored scripts are pipeline processors, it means that if someone has permission to load a pipeline, he/she can load scripts, too.

It would be nice if both scenarios were supported. But currently, if someone changes the script in ES, it should be changed in the FB module, too. It would be a nice addition to have a script/tool which exports scripts from ES and puts it under the right FB module. For now changing script in FB seems to be more comfortable.

The ingesting the pipeline of Kafka GC logs are not valid reason anymore, because the pipeline can be loaded using ES 6.0.0-rc1. The way script compilation is limited was changes since I last tested in August.

ruflin · 2017-10-13T08:17:38Z

filebeat/fileset/modules.go

@@ -296,6 +296,11 @@ func (reg *ModuleRegistry) LoadPipelines(esClient PipelineLoader) error {

 func loadScript(name, source string, esClient PipelineLoader) error {
 	parts := strings.Split(name, ".")
+
+	if parts[1] != "painless" {


you should probably check that len(parts) > 1

tsg · 2017-11-02T11:29:13Z

@kvch I'm thinking that in order to reduce confusion, it would be good for the scripts to follow the same practice as the pipelines, which is to encode the version in the pipeline, e.g. filebeat-7.0.0-alpha1-nginx-error-pipeline. Otherwise, if you have two versions of Filebeat running, and the script was updated between them, you get the older version use the newer version of the script.

tsg · 2017-11-02T11:33:11Z

I would consider explicitly listing the scripts in the manifest.yml file, instead of loading them all from the filesystem. This is what we do for the other Filebeat modules artifacts (e.g. ML jobs). I don't have a reason for which necessarily need this today, but I'd be more comfortable if we follow the same pattern for every such thing.

houndci-bot · 2017-12-08T14:32:49Z

filebeat/fileset/fileset.go

+}
+
+// getScriptIdTemplate returns the Ingest Node pipeline ID
+func (fs *Fileset) getScriptIdTemplate(beatVersion string) *template.Template {


func getScriptIdTemplate should be getScriptIDTemplate

houndci-bot · 2017-12-08T14:32:49Z

filebeat/fileset/fileset.go

+	manifest         *manifest
+	vars             map[string]interface{}
+	pipelineID       string
+	scriptIdTemplate *template.Template


struct field scriptIdTemplate should be scriptIDTemplate

houndci-bot · 2017-12-08T14:32:49Z

filebeat/fileset/modules.go

+		return nil, "", fmt.Errorf("Only painless scripts can be stored for pipelines")
+	}
+
+	scriptId := bytes.NewBufferString("")


var scriptId should be scriptID

houndci-bot · 2017-12-08T14:32:49Z

filebeat/fileset/fileset.go

+}
+
+// getScriptIdTemplate returns the Ingest Node pipeline ID
+func (fs *Fileset) getScriptIdTemplate(beatVersion string) *template.Template {


func getScriptIdTemplate should be getScriptIDTemplate

houndci-bot · 2017-12-08T14:32:49Z

filebeat/fileset/fileset.go

+	manifest         *manifest
+	vars             map[string]interface{}
+	pipelineID       string
+	scriptIdTemplate *template.Template


struct field scriptIdTemplate should be scriptIDTemplate

houndci-bot · 2017-12-08T14:32:49Z

filebeat/fileset/modules.go

+		return nil, "", fmt.Errorf("Only painless scripts can be stored for pipelines")
+	}
+
+	scriptId := bytes.NewBufferString("")


var scriptId should be scriptID

ruflin

I'm really looking forward to have this feature in FB as it makes writing and reading scripts so much nicer.

Some tests seem to break at the moment because of this change.

We probably should verify if this still works with the packaging as we add new files to make sure also these new script files are shipped.

As a follow up PR it would be good nice to have a system tests that checks the setup and starting of filebeat to verify that the scripts are actually loaded.

ruflin · 2017-12-12T04:01:52Z

filebeat/fileset/fileset.go

 	return fs.pipelineID, content, nil
 }

+// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch
+func formatScriptIDTemplate(module, fileset, beatVersion string) string {
+	return fmt.Sprintf("filebeat-%s-%s-%s-{{.}}", beatVersion, module, fileset)


What is {{.}} doing?

It seems this is replaced by the file / script name?

ruflin · 2017-12-12T04:10:34Z

filebeat/module/redis/log/manifest.yml

@@ -12,3 +12,7 @@ var:

 ingest_pipeline: ingest/pipeline.json
 prospector: config/log.yml
+
+pipeline_script:


I wonder if we should rename this to scripts instead of pipeline_script(s) as in ES these are just scripts and not related to a pipeline as far as I know?

AFAIK no. I added pipeline_ prefix so it is unambiguous that is a pipeline setting, because manifest includes other types of settings, too. If it's not needed I am happy to remove the prefix.

I think we can remove it. @tsg please object if not the case.

I agree we can do without the prefix.

tsg · 2017-12-25T11:16:25Z

filebeat/fileset/fileset.go

+
+}
+
+// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch


The comment should probably say "used for the pipline script IDs".

tsg · 2017-12-25T11:17:29Z

filebeat/fileset/fileset.go

+	return scripts, nil
+}
+
+// getScriptIDTemplate returns the Ingest Node pipeline ID


Comment seems wrong, I guess it is about the script IDs.

tsg · 2017-12-25T11:37:02Z

filebeat/fileset/fileset.go

+	}
+
+	scriptElemFull := fmt.Sprintf(scriptPipelinePattern, scriptID.String())
+	jsonString = strings.Replace(jsonString, scriptElem, scriptElemFull, -1)


Using string replacement for this seems brittle. Alternatively we could go through the JSON keys and look for the script tags. I think we do something like that for the ML job ID replacement. This gets more complicated if we look into anything but the top level processors, but I think that's all we need for the moment? What do you think?

houndci-bot · 2018-05-22T08:16:53Z

filebeat/fileset/modules.go

@@ -257,6 +258,105 @@ func (reg *ModuleRegistry) GetInputConfigs() ([]*common.Config, error) {
 	return result, nil
 }

+// PipelineLoader factory builds and returns a PipelineLoader


comment on exported type PipelineLoaderFactory should be of the form "PipelineLoaderFactory ..." (with optional leading article)

houndci-bot · 2018-05-22T08:16:53Z

filebeat/fileset/fileset.go

+func substituteScriptIDs(jsonString, name string, t *template.Template) (string, error) {
+	p := strings.Split(name, ".")
+	if len(p) != 2 {
+		return "", fmt.Errorf("Error substituting script ids: invalid script name.")


error strings should not be capitalized or end with punctuation or a newline

ruflin · 2018-11-23T07:52:36Z

@kvch I would really like to reactive this PR as I think it would make creating scripts much nicer.

@jsoriano @sayden FYI

kvch · 2018-11-23T13:09:54Z

o7

mergify · 2021-04-07T16:33:03Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/filebeat/pipeline-stored-script upstream/feature/filebeat/pipeline-stored-script
git merge upstream/master
git push upstream feature/filebeat/pipeline-stored-script

urso · 2021-04-07T18:16:25Z

@kvch I think we can close it.

kvch added Filebeat Filebeat in progress Pull request is currently in progress. review labels Oct 5, 2017

ruflin reviewed Oct 6, 2017

View reviewed changes

kvch force-pushed the feature/filebeat/pipeline-stored-script branch from 7b589b7 to ae2d428 Compare October 6, 2017 15:25

kvch force-pushed the feature/filebeat/pipeline-stored-script branch from ae2d428 to 1757f7d Compare October 9, 2017 15:47

kvch removed the in progress Pull request is currently in progress. label Oct 12, 2017

ruflin reviewed Oct 13, 2017

View reviewed changes

kvch force-pushed the feature/filebeat/pipeline-stored-script branch from 266273d to 5c15275 Compare October 13, 2017 14:44

ruflin mentioned this pull request Nov 28, 2017

Eliminate deprecation warnings on scripts #4865

Closed

kvch force-pushed the feature/filebeat/pipeline-stored-script branch from d53ce61 to d2363dc Compare December 6, 2017 22:03

houndci-bot reviewed Dec 8, 2017

View reviewed changes

kvch force-pushed the feature/filebeat/pipeline-stored-script branch 2 times, most recently from 2d673db to d05c616 Compare December 11, 2017 17:23

ruflin reviewed Dec 12, 2017

View reviewed changes

kvch force-pushed the feature/filebeat/pipeline-stored-script branch 3 times, most recently from c157614 to 7265dd6 Compare December 12, 2017 19:54

tsg reviewed Dec 25, 2017

View reviewed changes

kvch added the in progress Pull request is currently in progress. label Dec 26, 2017

kvch force-pushed the feature/filebeat/pipeline-stored-script branch from 7265dd6 to e06abcd Compare January 11, 2018 20:06

kvch removed the review label May 14, 2018

kvch added 7 commits May 18, 2018 19:15

filebeat: support stored scripts in module pipelines

2ca362f

filebeat: fail gracefully if stored script is not painless

11f4e31

add pr to changelog

ebd4c29

filebeat: follow pipeline naming conventions in loading scripts

d53f1e9

filebeat: support pipelines with generated script ids

420f6bb

minor fixes

168ce06

minor fixes

85703d2

kvch force-pushed the feature/filebeat/pipeline-stored-script branch from e06abcd to 85703d2 Compare May 22, 2018 08:16

houndci-bot reviewed May 22, 2018

View reviewed changes

jsoriano mentioned this pull request Dec 18, 2018

Painless / module failing with: "Too many dynamic script compilations within, max" #9600

Open

This was referenced Dec 24, 2018

Parameterizing Painless script literals #9770

Merged

Parameterizing Painless script #9821

Closed

ruflin added Team:Integrations Label for the Integrations team module labels Dec 28, 2018

exekias removed the Team:Integrations Label for the Integrations team label Nov 25, 2019

kvch closed this Apr 8, 2021


		}

		// formatScriptIDTemplate generates the ID to be used for the pipeline ID in Elasticsearch

Support stored scripts #5339

Support stored scripts #5339

Conversation

kvch commented Oct 5, 2017 • edited Loading

ruflin commented Oct 6, 2017

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvch commented Oct 6, 2017

ruflin commented Oct 6, 2017

kvch commented Oct 6, 2017

ruflin commented Oct 9, 2017 • edited Loading

kvch commented Oct 9, 2017

ruflin commented Oct 10, 2017

ruflin commented Oct 10, 2017

kvch commented Oct 11, 2017

Choose a reason for hiding this comment

tsg commented Nov 2, 2017

tsg commented Nov 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Nov 23, 2018

kvch commented Nov 23, 2018

mergify bot commented Apr 7, 2021

urso commented Apr 7, 2021

kvch commented Oct 5, 2017 •

edited

Loading

ruflin commented Oct 9, 2017 •

edited

Loading