Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when trying to validate STAC schema #2

Closed
blacha opened this issue Oct 28, 2021 · 11 comments
Closed

Panic when trying to validate STAC schema #2

blacha opened this issue Oct 28, 2021 · 11 comments

Comments

@blacha
Copy link

blacha commented Oct 28, 2021

Using @node-rs/jsonschema@^0.1.2"

I receive a rust panic when trying to validate documents against https://schemas.stacspec.org/v1.0.0/item-spec/json-schema/item.json

thread '<unnamed>' panicked at 'trying to resolve an http(s), but reqwest support has not been included', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/jsonschema-0.12.2/src/resolver.rs:71:29
@Brooooooklyn
Copy link

Blocked by Stranger6667/jsonschema#306

@ahungrynoob
Copy link
Owner

Hmmm... After I redesigned the api and did the benchmark, ajv is recommended to use instead of this lib.
N-API take much huger overhead then calculate if json meet schema.

@Stranger6667
Copy link

Stranger6667 commented Oct 28, 2021

@ahungrynoob

As a possible solution to this overhead problem, I think we can take a look at this issue. If the input will implement the Json trait (e.g. around some newtype wrapper), then no copying / serde parsing will be needed at all. I am not sure about the napi side, but it should be feasible.

Another point is that the compilation phase takes quite a lot on the jsonschema side. The best usage flow performance-wise is to create a JSONSchema object once and then reuse it for validation.

What do you think?

cc @Brooooooklyn

@ahungrynoob
Copy link
Owner

ahungrynoob commented Oct 28, 2021

Another point is that the compilation phase takes quite a lot on the jsonschema side. The best usage flow performance-wise is to create a JSONSchema object once and then reuse it for validation.

In the new version, I compile to a JSONSchema first , then call is_valid, however the bench result is still bad.
#4

If the input will implement the Json trait (e.g. around some newtype wrapper),

How could you let the JsObject in v8 to implement Json trait in rust?
@Stranger6667

@Stranger6667
Copy link

How could you let the JsObject in v8 to implement Json trait in rust?

The trick is commonly known as a New Type idiom. It exists for sake of compile-time guarantees and is compiled away in the final artifacts.

struct JsObjectWrapper(JsObject);

impl jsonschema::Json for JsObjectWrapper {
    // Here goes the access to internals, etc.
}

fn validate(input: JsObject) -> bool {
    let compiled = ...;
    compiled.is_valid(JsObjectWrapper(input))
}

In the bindings code, you'll need to wrap the input JsObject into JsObjectWrapper before passing it to is_valid.
In the final result, there will be just details of your implementation of the Json trait placed inside the jsonschema code, due to monomorphization. No serde-json parsing will be needed at all.

The design of the Json trait is not finalized yet, but the mainly it will require checking the right type and extracting the values like serde-json does (e.g. is_string() or as_u64()).

@Stranger6667
Copy link

Though I'll take a deeper look into the existing implementation and I am very curious to check some profiling results as well. So we can be sure where do we have a bottleneck.

@ahungrynoob
Copy link
Owner

Though I'll take a deeper look into the existing implementation and I am very curious to check some profiling results as well. So we can be sure where do we have a bottleneck.

Here are methods of JsObject, https://docs.rs/napi/1.7.7/napi/struct.JsObject.html#method.coerce_to_string.
As you can see, these methods are from napi which cause huge overhead. Too many napi methods calls will result in bad performance.

@Stranger6667
Copy link

Stranger6667 commented Oct 28, 2021

As far as I see, napi implements serde support directly, so there should be no need to coerce to a string & parse it later.

@Stranger6667
Copy link

FYI, changing that block gives only 2x speedup :(

#[js_function(1)]
fn compile(ctx: CallContext) -> Result<JsFunction> {
  let arg0 = ctx.get::<JsString>(0)?.into_utf8()?;
  let schema = from_str(arg0.as_str()?)?;
  let compiled = JSONSchema::options()
    .with_draft(Draft::Draft7)
    .compile(&schema)
    .map_err(|e| Error::new(Status::InvalidArg, format!("{}", e)))?;
  ctx.env.create_function_from_closure("isValid", move |ctx| {
    let arg = ctx.get::<JsUnknown>(0)?;
    let input = ctx.env.from_js_value(arg)?;
    ctx.env.get_boolean(compiled.is_valid(&input))
  })
}

@Stranger6667
Copy link

Stranger6667 commented Oct 28, 2021

But, I don't think that ajv expects a string either. In this case it requires only checking the input type and immediately rejecting it. So, the benchmark code requires some adjustments.

UPDATE: Missed it - it accepts the original input

@ahungrynoob
Copy link
Owner

FYI, changing that block gives only 2x speedup :(

#[js_function(1)]

fn compile(ctx: CallContext) -> Result<JsFunction> {

  let arg0 = ctx.get::<JsString>(0)?.into_utf8()?;

  let schema = from_str(arg0.as_str()?)?;

  let compiled = JSONSchema::options()

    .with_draft(Draft::Draft7)

    .compile(&schema)

    .map_err(|e| Error::new(Status::InvalidArg, format!("{}", e)))?;

  ctx.env.create_function_from_closure("isValid", move |ctx| {

    let arg = ctx.get::<JsUnknown>(0)?;

    let input = ctx.env.from_js_value(arg)?;

    ctx.env.get_boolean(compiled.is_valid(&input))

  })

}

The only solution for this kind of bottleneck is using napi methods as little as possible, which is conflict with jsonschema validating scene as node-addon...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants