Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new transform): Initial javascript transform implementation #721

Closed
wants to merge 56 commits into from
Closed

feat(new transform): Initial javascript transform implementation #721

wants to merge 56 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Aug 7, 2019

Description

Closes #667

This PR adds basic JavaScript transform implemented using QuickJS engine. See docs in the committed files for details.

Limitations

  • As JavaScript Date type supports only millisecond precision, Vector's timestamps lose precision after applying any JavaScript transform. An alternative would be to not convert timestamps to JavaScript dates, but it would probably make user experience less streamlined.
  • Nested objects are sent to JavaScript as is, without unflattening. I'm not sure should unflattening happen or not here, but implementing it would probably require making internals of Unflatten struct public or creating new specialized interfaces for it.

Possible improvements that require changes in quick-js crate

  • Construct JsValues directly, without intermediate serialization to JSON, to improve performance.
  • Enable ES6 modules.
  • Enable standard library.

In addition, if Vector supported asynchronous transformations that returned Futures instead of transformed values, and also quick-js crate implemented support for returning Promises to native code, it could have been possible to implement fetch API and allow the transformations to connect to external HTTP services to enrich the events.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Copy link
Contributor

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This at a quick glance looks pretty good! I'd be curious to see benchmarks to compare it to the lua runtime, what do you think?

docker-compose.yml Outdated Show resolved Hide resolved
a-rodin added 2 commits August 7, 2019 20:29
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Aug 7, 2019

@LucioFranco I would be interested in doing the benchmarking too. However, I expect it to be less performant than Lua because of the different model of computation: here event is copied entirely to JavaScript and then the result is copied entirely back, while in Lua fields are read and written on-demand. But I think the functional approach to processing events is more flexible for the user.

@LucioFranco
Copy link
Contributor

@a-rodin agreed, I think mostly it would be helpful for documentation to express how much slower it might be.

@binarylogic
Copy link
Contributor

👍 This is awesome! Thanks for doing this.

Nested objects are sent to JavaScript as is, without unflattening. I'm not sure should unflattening happen or not here, but implementing it would probably require making internals of Unflatten struct public or creating new specialized interfaces for it.

This is perfectly fine for now. I added #706 for further discussion around this exact topic. If we keep the flattened structure (pending on #704), I'm not sure I would unflatten the event anyway. A transform, by nature, is operating on internal data structures.

In addition, if Vector supported asynchronous transformations that returned Futures instead of transformed values, and also quick-js crate implemented support of returning Promises to native code, it could have been possible to implement fetch API and allow the transformations to connect to external HTTP services to enrich the events.

That's super interesting. I'm curious what you think about the @LucioFranco?

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@LucioFranco
Copy link
Contributor

@a-rodin looks like we need to track the changes on master to get the tests to pass.

a-rodin added 3 commits August 8, 2019 17:38
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Aug 8, 2019

@LucioFranco I've added some benchmarks, similar to the ones for Lua.

The results:

javascript_add_fields/native
                        time:   [17.396 ms 17.457 ms 17.595 ms]

javascript_add_fields/javascript_with_copying
                        time:   [687.13 ms 690.43 ms 694.58 ms]

javascript_add_fields/javascript_without_copying
                        time:   [675.23 ms 676.63 ms 679.59 ms]

javascript_field_filter/native
                        time:   [26.127 ms 26.257 ms 26.346 ms]

javascript_field_filter/javascript
                        time:   [693.96 ms 697.47 ms 699.38 ms]

The numbers for Lua:

lua_add_fields/lua      time:   [208.87 ms 213.03 ms 215.53 ms]
lua_field_filter/lua    time:   [232.33 ms 236.09 ms 239.50 ms]

From disabling certain parts of the code it seems like serialization to JSON in Rust + just running JS eval with the output string in place of the code takes half of the time. It means that another half of the time is spent on executing the actual JavaScript code, which consist of calling JSON.parse/JSON.stringify with custom replacers and invoking the handler function.

@ghost
Copy link
Author

ghost commented Aug 8, 2019

Some really weird things happen: edefa8f breaks one of the tests, causing stack overflow in JavaScript code.

I'm not sure how can it be, but probably removing format! causes the memory allocated to the strings data to be freed before being used by QuickJS, so that QuickJS gets garbage instead of JSON. I was not able to reproduce it in a simpler example that uses just QuickJS.

It is also interesting that the problem shows itself only on debug builds, as cargo test --release passes all tests.

@ghost
Copy link
Author

ghost commented Aug 9, 2019

I think the problem might be caused by unsafe implementation of Send, as moving creation of the context into process function fixes the tests

However, it is better to keep the initialization of the context at the creation of the transform, both for performance reasons and to allow keeping state between subsequent events. So probably a way to create a new context for each thread is needed.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Aug 9, 2019

The problem was caused by an issue in QuickJS code that checks for stack overflow exception. I've sent the patch with the fix to QuickJS mailing list and created PR theduke/quickjs-rs#19 to Rust quickjs-rs crate, but for now I've changed the Vector's Cargo.toml to use my fork of quickjs-rs with the fix.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Aug 9, 2019

Replacing eval_as to call_function in the transform code increased the performance twofold:

javascript_add_fields/native
                        time:   [17.675 ms 17.729 ms 17.784 ms]

javascript_add_fields/javascript_with_copying
                        time:   [325.56 ms 328.03 ms 330.03 ms]

javascript_add_fields/javascript_without_copying
                        time:   [313.64 ms 315.36 ms 317.29 ms]

javascript_field_filter/native
                        time:   [26.023 ms 26.217 ms 26.366 ms]

javascript_field_filter/javascript
                        time:   [317.15 ms 319.31 ms 321.24 ms]

@binarylogic
Copy link
Contributor

Nice! That looks comparable to the lua transform.

@binarylogic
Copy link
Contributor

This looks good. @a-rodin @LucioFranco is this ready to merge?

@ghost
Copy link
Author

ghost commented Aug 12, 2019

@binarylogic I think maybe we can wait for the next release of quick-js crate with updated QuickJS and the added patch.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Aug 13, 2019

With native creation of JsValue implemented using recently added quick-js features, simple javascript transforms become faster than lua:

javascript_add_fields/native
                        time:   [16.633 ms 16.744 ms 16.823 ms]
javascript_add_fields/javascript_with_copying
                        time:   [133.17 ms 133.69 ms 134.09 ms]
javascript_add_fields/javascript_without_copying
                        time:   [121.78 ms 122.46 ms 123.04 ms]
javascript_field_filter/native
                        time:   [24.665 ms 24.749 ms 24.831 ms]
javascript_field_filter/javascript
                        time:   [117.05 ms 117.24 ms 117.54 ms]

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@LucioFranco
Copy link
Contributor

@a-rodin those benchmarks look quite good! I see that a lot of the js tests are failing on CI, any idea why?

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Aug 13, 2019

@LucioFranco The tests should be passing now.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
a-rodin added 15 commits January 8, 2020 08:55
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
This reverts commit 652fcd9.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@binarylogic
Copy link
Contributor

Noting, we can merge this if it's ready and not register it as a transform until we're ready to expose the API. We'll need to remove the docs before doing so.

@binarylogic binarylogic assigned ghost Jan 27, 2020
a-rodin added 3 commits March 30, 2020 20:15
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@binarylogic binarylogic assigned Hoverbear and unassigned ghost Jul 21, 2020
@binarylogic
Copy link
Contributor

@Hoverbear what are your thoughts on this? Should we close it in favor of the wasm transform? Or is this worth re-exploring?

@theduke
Copy link

theduke commented Jul 21, 2020

Related: running QuickJS inside a wasm runtime requires quite a lot of plumbing.
I plan on having a WASM executor for quickjs-rs, but no guarantees when I'll get to that.

theduke/quickjs-rs#11

@binarylogic
Copy link
Contributor

@Hoverbear bump.

@binarylogic
Copy link
Contributor

binarylogic commented Sep 15, 2020

Closing this PR since it has fallen out of sync, the author is no longer working on it, and we want to understand better how it fits in with Vector. A lot has changed since this PR was opened:

  1. We have a new remap transform that provides a light, performant, and safe syntax for mapping data. This is likely to cover most use cases and is much faster and safer.
  2. We have a wasm transform that lets users write transform with any language that will compile to WASM, which includes Javascript. We have a guide here.
  3. Finally, we're working on a config macros feature that will allow users to build components with lower-level components. This will also help avoid the need to reach for a full language like Javascript.

All of that said, we'd like to see how the community adopts the above features before introducing more languages into Vector. Javascript is significantly worse in the performance and safety aspects than all of the offerings above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New javascript transform
5 participants