Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure TypeBox Compiler In Benchmarks #574

Closed
sinclairzx81 opened this issue Apr 6, 2023 · 15 comments
Closed

Configure TypeBox Compiler In Benchmarks #574

sinclairzx81 opened this issue Apr 6, 2023 · 15 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers

Comments

@sinclairzx81
Copy link
Contributor

sinclairzx81 commented Apr 6, 2023

Can you please configure the TypeBox compiler for each is benchmark using the AllowNaN and AllowArrayObjects compiler settings.

The following shows updates for the TypeBoxObjectSimple benchmark.

import { Type } from "@sinclair/typebox";
import { TypeCompiler } from "@sinclair/typebox/compiler";
import { TypeSystem } from '@sinclair/typebox/system'; // added

const Point3D = Type.Object({
    x: Type.Number(),
    y: Type.Number(),
    z: Type.Number(),
});

const Box3D = Type.Object({
    scale: Point3D,
    position: Point3D,
    rotate: Point3D,
    pivot: Point3D,
});

// ensure configuration before compilation
TypeSystem.AllowArrayObjects = true; // added
TypeSystem.AllowNaN = true; // added
export const __TypeBoxObjectSimple = Box3D;
export const TypeBoxObjectSimple = TypeCompiler.Compile(__TypeBoxObjectSimple);

This configuration aligns TypeBox to assert using the same assertion policies as Typia by omitting critical numeric and object array assertion checks. This configuration only applies to the is benchmarks. Formal documentation for these policy overrides can be found at the link below.

https://github.com/sinclairzx81/typebox/tree/literal#policies

Cheers

@sinclairzx81
Copy link
Contributor Author

Also, can you investigate the performance degradation on the Object (Simple) benchmark in general? The image on your readme shows a TB degradation of around 30%, but running locally, I see an approximate 15% increase on Ajv.

┌────────────────────────────┬────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
          (index)            Iterations   ValueCheck       Ajv       TypeCompiler  Performance  
├────────────────────────────┼────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
 Object_Box3D                 1000000    '   1765 ms'  '     61 ms'  '     53 ms'  '    1.15 x' 
└────────────────────────────┴────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

If I disable the NaN and Array Object checks, I get the following.

┌────────────────────────────┬────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
          (index)            Iterations   ValueCheck       Ajv       TypeCompiler  Performance  
├────────────────────────────┼────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
 Object_Box3D                 1000000    '   2002 ms'  '     60 ms'  '     29 ms'  '    2.07 x' 
└────────────────────────────┴────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

The TB benchmarks do not split runs between separate Node processes (so degradation may be a result of one validator breaking runtime optimizations), however the dedicated benchmark system I put together late last year does split runs across distinct node processes, and does shows results inline with the configured compiler. You can review these results at the link below (search for Object_Simple)

https://sinclairzx81.github.io/runtime-type-benchmarks/

image

I would like to understand why the Typia benchmarks report such low numbers for TypeBox for these relatively simple checks.

@samchon samchon self-assigned this Apr 7, 2023
@samchon samchon added enhancement New feature or request good first issue Good for newcomers documentation Improvements or additions to documentation labels Apr 7, 2023
@samchon
Copy link
Owner

samchon commented Apr 7, 2023

Now, you can optimize wrapper provider of fastify (request DTO validation). It seems good.

@sinclairzx81
Copy link
Contributor Author

sinclairzx81 commented Apr 7, 2023

Now, you can optimize wrapper provider of fastify (request DTO validation). It seems good.

I cannot configure the compiler in the Fastify Type Provider to ignore these checks by default as it's unsafe to do so. However users can configure the compiler to disable these checks through the TypeSystem module (as shown in benchmark example)

Note, Fastify recently took TypeBox as a direct dependency on fastify-type-provider-typebox@3.0.0, if you install the latest major revision, it will pull down @sinclair/typebox@0.26.x which includes additional policy overrides.

@samchon
Copy link
Owner

samchon commented Apr 7, 2023

As JSON.parse() function would be called before request DTO validation, I think those configurations are safe.

Anyway, separating each benchmark features as independency node process is not simple work for me.

Therefore, please wait for a while (maybe next week?).

@sinclairzx81
Copy link
Contributor Author

sinclairzx81 commented Apr 7, 2023

As JSON.parse() function would be called before request DTO validation, I think those configurations are safe.

Message encoding is often configurable. The following codecs are typical in both HTTP and Web Socket usage as these encodings support Uint8Array

import * as msgpack from '@msgpack/msgpack'
{
    const encoded = msgpack.encode({ nan: NaN, infinity: Infinity })
    const decoded = msgpack.decode(encoded)
    console.log(decoded) // { nan: NaN, infinity: Infinity } - unsafe
}
import * as cbor from 'cbor'
{
    const encoded = cbor.encode({ nan: NaN, infinity: Infinity })
    const decoded = cbor.decode(encoded)
    console.log(decoded) // { nan: NaN, infinity: Infinity } - unsafe
}

Anyway, separating each benchmark features as independency node process is not simple work for me.

This is not necessary. I just want to see the TypeBox compiler aligned to the assertion policies used by Typia to get an accurate measurement on performance. This to compare JIT to AOT for equivalent assertion logic under the current benchmarking infrastructure.

@samchon
Copy link
Owner

samchon commented Apr 7, 2023

@sinclairzx81

I think typebox would be faster than typia, because typia has an extra cost calling internal function like below:

(input: any): input is ObjectAlias => {
    const $io0 = (input: any): boolean =>
        (null === input.id || "string" === typeof input.id) &&
        "string" === typeof input.email &&
        "string" === typeof input.name &&
        (null === input.sex ||
            1 === input.sex ||
            2 === input.sex ||
            "male" === input.sex ||
            "female" === input.sex) &&
        (null === input.age ||
            ("number" === typeof input.age &&
                Number.isFinite(input.age))) &&
        (null === input.dead || "boolean" === typeof input.dead);
    return (
        Array.isArray(input) &&
        input.every(
            (elem: any) =>
                "object" === typeof elem && null !== elem && $io0(elem),
        )
    );
}

Anyway, upgrading benchmark program, I wanna ask you something. Can you give me an idea?

Current is() function benchmark program is repeating is() function call about 3~4 times. If I remove such repeating, typia and typebox become 100,000x times faster than class-validator. In the object (simple) case, if remove the simple flag and let it to repeat 3 times, typia and typebox become 200,000x times faster.

As I'd doubted over-fitting optimization like return value cashing, I did such repeating. However, I can't sure whether my assumption (return value cashing) is right or not. As you know, such repeated function call can be extra cost, therefore damage on exact benchmark measurement. Do you think such repeating is required? Or removing all repeating is better?

Can you guide me about that?

// TO ANTICIPATE OVER-FITTING OPTIMIZATION
// BY typia AND TYPEBOX
const simple: boolean = category === "object (simple)";
const a: T = generator();
const b: T = generator();
const c: T = generator();
const d: T = generator();
const suite: benchmark.Suite = new benchmark.Suite();
for (const key of components) {
const is = parameters[key];
if (is === null) continue;
const task = simple
? () => {
is(a);
is(b);
is(c);
is(d);
}
: () => {
is(a);
is(b);
is(c);
};

@sinclairzx81
Copy link
Contributor Author

I think typebox would be faster than typia, because typia has an extra cost calling internal function like below:

I wouldn't expect the additional function call to have much impact in the results. But if it is impacting results, it's better to highlight that impact and optimize in subsequent revisions.

As I'd doubted over-fitting optimization like return value cashing, I did such repeating. However, I can't sure whether my assumption (return value cashing) is right or not. As you know, such repeated function call can be extra cost, therefore damage on exact benchmark measurement. Do you think such repeating is required? Or removing all repeating is better?

Compute benchmarks should be extremely simple and only measure the elapsed time it takes to complete N iterations.

function benchmark_run(iter: number, op: Function) {
  const start = performance.now()
  for(let i = 0; i < iter; i++) op()
  return performance.now() - start
}

There will be variability in the elapsed result for subsequent individual runs (due to v8 internals or other system tasks). To fix this you can take an average across multiple runs to yield a more stable / accurate result.

function benchmark_average(runs: number, iter: number, op: Function) {
  const elapsed: number[] = []
  for(let i = 0; i < runs; i++) elapsed.push(benchmark_run(iter, op))
  return elapsed.reduce((acc, c) => acc + c, 0) / runs
}

The following is the usage

const average = benchmark_average(10, 10_000_000, () => {
    const [A, B] = [1, 2]
    const _ = A + B
})
console.log(average) // 10 runs, 10 million iterations per run

If running across distinct node processes, the benchmark_run would be executed inside each process, the benchmark_average would be run from the host process.

@samchon
Copy link
Owner

samchon commented Apr 8, 2023

@sinclairzx81 https://github.com/samchon/typia/tree/features/benchmark/benchmark/results/AMD%20Ryzen%207%206800HS%20with%20Radeon%20Graphics

Benchmark result after separating each measurements as an independent process.

Also, configuration of TypeBox has been changed.

Many categories are not revived yet, but it is too hard to migrate. They'll be revived in someday.

@sinclairzx81
Copy link
Contributor Author

That's interesting, Moltar's benchmark system also saw similar balancing improvements when they moved to distinct processes. The 20,000x delta is inline with the results I was seeing for Typia on my local when investigating this last year. Contrasting comparative benchmarks, they seem to line up correctly.

image

Refer to here for the static datasets used in the above benchmark if it helps to resolve the failing TB and Ajv tests (these are probably best expressed as templates if randomizing in Typia)

samchon added a commit that referenced this issue Apr 8, 2023
@sinclairzx81
Copy link
Contributor Author

sinclairzx81 commented Apr 8, 2023

Could you please also update the assert and validate logic to call check before calling errors.

https://github.com/samchon/typia/blob/features/benchmark/benchmark/programs/assert/typebox/createAssertTypeboxBenchmarkProgram.ts#L1-L13

import { TSchema } from "@sinclair/typebox";
import { TypeCheck } from "@sinclair/typebox/compiler";

import { createAssertBenchmarkProgram } from "../createAssertBenchmarkProgram";

export const createAssertTypeboxBenchmarkProgram = <S extends TSchema>(
    schema: TypeCheck<S>,
) =>
    createAssertBenchmarkProgram((input) => {
        if(schema.Check(input)) return input // added
        const first = schema.Errors(input).First();
        if (first) throw first;
        return input;
    });

This is documented on the TypeBox project here with the following description.

Use the Errors(...) function to produce diagnostic errors for a value. The Errors(...) function will return an iterator that if enumerated; will perform an exhaustive check across the entire value and yield any error found. For performance, this function should only be called after failed Check(...). Applications may also choose to yield only the first value to avoid exhaustive error generation.

Remember, TypeBox does not have a built in Assert function. This was previously discussed on this comment #268 (comment) where users are expected to implement Assert themselves efficiently. But as TypeBox will never implement an Assert function, there are two possible paths to take.

Options:

  • A) Implement the Check() before Errors() as per documentation.
  • B) Remove TypeBox from Assert and Validate benchmarks

My preference would be option (A) as it compares Check() before Errors() performance against inline Assert() (as implemented in Typia and Ajv). If the data is varying for the benchmark (50% correct, 50% incorrect), I'd expect TypeBox to report 50% the performance of Typia due to dynamic checks performed during diagnostic gathering.

samchon added a commit that referenced this issue Apr 8, 2023
@samchon samchon closed this as completed in b7d2644 Apr 8, 2023
samchon added a commit that referenced this issue Apr 8, 2023
Prepare #574 - benchmark in each process
@sinclairzx81
Copy link
Contributor Author

@samchon Were you going to implement either option A or B?

@samchon
Copy link
Owner

samchon commented Apr 8, 2023

Will accept option A, maybe tomorrow

@sinclairzx81
Copy link
Contributor Author

Cool :)

samchon added a commit that referenced this issue Apr 9, 2023
samchon added a commit that referenced this issue Apr 9, 2023
samchon added a commit that referenced this issue Apr 9, 2023
samchon added a commit that referenced this issue Apr 9, 2023
Complement #574 and #578 - new benchmark program
@samchon
Copy link
Owner

samchon commented Apr 9, 2023

@sinclairzx81 Changed as you want, but I can't sure this is right or not.

Typia can do it with only one line, but Typebox needs...

@sinclairzx81
Copy link
Contributor Author

Changed as you want, but I can't sure this is right or not. ...

@samchon Thanks. And yes it's correct as far as implementing a standard Assert in TypeBox goes. I'm actually very curious to compare "Check then Report" (TypeBox) against inline "Check & Report" (Typia, Ajv). I'm hoping to see good variance for correct and incorrect data there.

Typia can do it with only one line, but Typebox needs...

It's actually more flexible to keep Check and Errors separated. Instead of implementing a 1 line Assert, users can just implement that themselves in 2 lines. The intent is to reduce API surface area (2 functions instead of 3), and to make testing and optimization work easier to deal with.

It's just a design principle TypeBox tries to follow.
Thanks Again!
S

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants