Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-5093] Data Transforms SDK: Schema Registry Support for JavaScript #21491

Merged
merged 10 commits into from
Jul 23, 2024

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Jul 17, 2024

This PR introduces @redpanda-data/sr, a C++-backed node module offering a client and various related types for accessing Redpanda Schema Registry from JavaScript or TypeScript transforms.

Includes TypeScript type descriptions, primarily for developer QoL (LSP support).

Also Closes CORE-5577

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Features

  • Adds Schema Registry support for the JavaScript Data Transforms SDK

@oleiman oleiman self-assigned this Jul 17, 2024
@github-actions github-actions bot added area/build area/wasm WASM Data Transforms labels Jul 17, 2024
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch 13 times, most recently from 8accf27 to ee358eb Compare July 18, 2024 17:49
src/transform-sdk/js/sr-module/index.d.ts Outdated Show resolved Hide resolved
src/transform-sdk/js/sr-module/index.d.ts Outdated Show resolved Hide resolved
src/transform-sdk/js/sr-module/index.d.ts Outdated Show resolved Hide resolved
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch 2 times, most recently from fa6b583 to 058b065 Compare July 18, 2024 23:33
@oleiman oleiman changed the title Xform/sdk/core 5093/sr js Data Transforms SDK: Schema Registry Support for JavaScript Jul 18, 2024
@oleiman oleiman changed the title Data Transforms SDK: Schema Registry Support for JavaScript [CORE-5093] Data Transforms SDK: Schema Registry Support for JavaScript Jul 19, 2024
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch 5 times, most recently from 92ba82f to f777ce2 Compare July 19, 2024 05:18
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch 2 times, most recently from 6c0ddec to c218201 Compare July 19, 2024 18:41
@oleiman oleiman marked this pull request as ready for review July 19, 2024 19:40
@oleiman
Copy link
Member Author

oleiman commented Jul 19, 2024

@rockwotj - marking this ready for review in case you have time. going to be working on adding encodeSchemaID this afternoon.

@oleiman oleiman requested review from rockwotj, a team and michael-redpanda and removed request for a team July 19, 2024 19:41
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First look is good - I will look more in-depth Monday.

src/transform-sdk/js/sr-module/index.d.ts Outdated Show resolved Hide resolved
src/transform-sdk/js/main.cc Outdated Show resolved Hide resolved
src/transform-sdk/js/main.cc Outdated Show resolved Hide resolved
src/transform-sdk/js/sr-module/index.d.ts Outdated Show resolved Hide resolved
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch 2 times, most recently from 70b0b70 to 52a3364 Compare July 21, 2024 09:58
oleiman added 4 commits July 21, 2024 14:57
Otherwise any nontrivial JS transform will blow our stack allocation
pretty consistently.

Since the project is not very large, build time is not much of an
issue. Qualitatively, any difference is not noticeable.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Adds `is_error` to distinguish between JavaScript errors
and exceptions.

Also adds a stack trace to the debug string for exceptions and
errors both. Note that traversing the stack trace is a recursive
operation, but generally we're going to be hitting this code when
an exception escapes a transform function, killing the VM.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
As an alias for 'global' itself. This usage is (anectdotally) pretty
common for code that expects to find itself in a nodeJS env and is
not covered by our usual polyfills.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Take double representation of the value, round to the nearest
long, and cast to int32.
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch from 52a3364 to 7d1cbde Compare July 21, 2024 22:00
@oleiman oleiman requested a review from rockwotj July 22, 2024 18:29
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - a few smaller things, then we can ship this

@@ -132,12 +133,26 @@ value value::uint8_array(JSContext* ctx, std::span<uint8_t> data) {
JSFreeArrayBufferDataFunc* func = [](JSRuntime*, void* opaque, void* ptr) {
// Nothing to do
};

// https://github.com/quickjs-ng/quickjs/blob/da5b95dcaf372dcc206019e171a0b08983683bf5/quickjs.c#L51945-L51953
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context here would be nice :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I had a comment inline that I must have deleted when I separated the two functions...

src/transform-sdk/js/js_vm.cc Outdated Show resolved Hide resolved
src/transform-sdk/js/sr-module/index.d.ts Outdated Show resolved Hide resolved
* ]
* };
*
* const subj_schema = sr_client.createSchema(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS usually uses camelCase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i got them all

auto orig = value_as_bytes_view(buf);
redpanda::bytes encoded = redpanda::sr::encode_schema_id(
redpanda::sr::schema_id{sid.as_integer()}, orig);
return qjs::value::uint8_array_copy(ctx, encoded);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make a full copy? Can't we pass the original bytes and have a dtor cleanup the bytes when the JS object is GC'd? I guess this is fine now. Let's leave a TODO(perf) and not prematurely optimize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we pass the original bytes

not sure how this would work mechanically. we get a vector back from encode_schema_id, so I'm fairly certain you have to copy those out into an unmanaged buffer anyway.

could return a unique_ptr<uint8_t*> from encode_schema_id or something else that allows releasing the underlying memory. am I missing something? maybe overload encode_schema_id to take a view into memory allocated at the call site?

generally agree that we can punt on this though. will add the TODO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have to "move" the vector to a heap allocated pointer (no copies of the data) and make that the opaque value in the JSRuntime that is deleted when it's done. No need to do now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i tried that... though you know what, i may have been going through the wrong pointer param 🤦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably fast follow if I have time

Comment on lines 670 to 671
// NOTE(oren): returning a view back into the target buffer might
// be a terrible idea. Needs a good think.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a copy for now and optimize later. We should be able to keep a reference to the old data and make a subview: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/subarray

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/transform-sdk/js/main.cc Outdated Show resolved Hide resolved
oleiman added 2 commits July 22, 2024 17:41
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Introduces '@redpanda-data/transform-sdk-sr' module and export for creating
a client instance

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch from 7d1cbde to e4e646e Compare July 23, 2024 00:42
value::uint8_array_copy:

Use Uint8Array constructor that copies input buffer into
the object. Useful for returning arbitrary buffers to user
code (e.g. the result of `transform-sdk-sr.encodeSchemaID`).

value::array_buffer_copy: similar reasoning
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch from e4e646e to 0d02c00 Compare July 23, 2024 16:49
@oleiman
Copy link
Member Author

oleiman commented Jul 23, 2024

force pushes: CR comments and fix mangled push from yesterday

oleiman added 3 commits July 23, 2024 11:40
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Returns an owning string/ArrayBuffer/Uint8Array depending on the input
buffer type.
Includes TypeScript typings in sr-module/index.d.ts

CORE-5577

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/sdk/core-5093/sr-js branch from 0d02c00 to 80a858a Compare July 23, 2024 18:41
@oleiman
Copy link
Member Author

oleiman commented Jul 23, 2024

force push adds perf TODOs

@oleiman oleiman requested a review from rockwotj July 23, 2024 19:33
@oleiman oleiman merged commit f19f266 into redpanda-data:dev Jul 23, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants