json.match_schema performance #7011

lcarva · 2024-09-10T19:14:56Z

Short description

The json.match_schema function takes much longer when the JSON schema is significantly large.

I created a simple reproducer here: https://github.com/lcarva/opa-json-schema-perf
(Schema too large for rego playground)

The reproducer validates a small object against the CycloneDX SBOM JSON Schema (about 5k lines long).

$ opa eval --data . 'data.main.results' --profile --format=pretty
[
  [
    true,
    []
  ],
  [
    true,
    []
  ]
]
+------------------------------+-----------+
|            METRIC            |   VALUE   |
+------------------------------+-----------+
| timer_rego_load_files_ns     | 26615260  |
| timer_rego_module_compile_ns | 24751477  |
| timer_rego_module_parse_ns   | 26171100  |
| timer_rego_query_compile_ns  | 38743     |
| timer_rego_query_eval_ns     | 288952134 |
| timer_rego_query_parse_ns    | 37035     |
+------------------------------+-----------+
+--------------+----------+----------+--------------+-------------------+
|     TIME     | NUM EVAL | NUM REDO | NUM GEN EXPR |     LOCATION      |
+--------------+----------+----------+--------------+-------------------+
| 288.827434ms | 4        | 4        | 4            | main.rego:10      |
| 63.714µs     | 4        | 4        | 4            | main.rego:12      |
| 32.994µs     | 4        | 4        | 4            | main.rego:14      |
| 15.366µs     | 1        | 1        | 1            | data.main.results |
| 3.809µs      | 1        | 1        | 1            | main.rego:5       |
| 3.181µs      | 1        | 1        | 1            | schema.rego:3     |
| 2.385µs      | 1        | 1        | 1            | schema.rego:36    |
+--------------+----------+----------+--------------+-------------------+

main.rego:10 is the json.match_schema call where the CycloneDX schema is being used. main.rego:12 uses a much smaller schema. That's 288,827 vs 63 microseconds.

Version: 0.68.0
Build Commit: db53d77c482676fadd53bc67a10cf75b3d0ce00b
Build Timestamp: 2024-08-29T15:23:19Z
Build Hostname: 3aae2b82a15f
Go Version: go1.22.5
Platform: linux/amd64
WebAssembly: available

Steps To Reproduce

See description.

Expected behavior

Validation of object should not take longer than 1ms.

The text was updated successfully, but these errors were encountered:

anderseknert · 2024-09-10T20:34:55Z

Hi there! And thanks for filing this issue.

Looking into this briefly, and almost all of that time is spent in loading the JSON schema, not actually validating.
This loading isn't cached either, so each call is going to repeat loading the schema. Using a cached schema makes things... faster, to say the least. Notice the use of gojsonschema.NewSchema(sl) below, where the returned schema is reused:

package main

import (
	"fmt"
	"os"
	"time"
)
import "github.com/xeipuuv/gojsonschema"

func main() {
	now := time.Now()

	bs, err := os.ReadFile("schema.json")
	if err != nil {
		panic(err)
	}

	sl := gojsonschema.NewBytesLoader(bs)

	schema, err := gojsonschema.NewSchema(sl)
	if err != nil {
		panic(err)
	}

	dl := gojsonschema.NewStringLoader(`{"name": "John", "age": 30}`)

	result, err := schema.Validate(dl)
	if err != nil {
		panic(err)
	}

	fmt.Println(result.Valid())
	fmt.Println(time.Since(now))

	now = time.Now()

	dl = gojsonschema.NewStringLoader(`{"another": "object", "x": 1}`)

	result, err = schema.Validate(dl)
	if err != nil {
		panic(err)
	}

	fmt.Println(result.Valid())
	fmt.Println(time.Since(now))
}

Output

false
637.524583ms
false
14.709µs

I guess using the inter query cache for this built-in storing loaded schemas across decisions would be the way to go.

It wouldn't make your single opa eval call any faster though, as the first invocation would still need to load the schema.

lcarva added the bug label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json.match_schema performance #7011

json.match_schema performance #7011

lcarva commented Sep 10, 2024

anderseknert commented Sep 10, 2024

json.match_schema performance #7011

json.match_schema performance #7011

Comments

lcarva commented Sep 10, 2024

Short description

Steps To Reproduce

Expected behavior

anderseknert commented Sep 10, 2024