More representative code for benchmarks #1884

overlookmotel · 2024-01-03T18:23:22Z

The current benchmarks for parser are based on 3 inputs:

TypeScript compiler's checker
PDF.js
Ant Design

The common factor is that they're all library code.

I wonder if it'd be helpful to add a benchmark which is more representative of application code? (which I imagine is more commonly what OXC is run on)

I'd guess that the composition of app code might be quite different from library code. For example, React app code would (obviously) contain JSX syntax. It'd also likely contain a lot more string literals for text that gets displayed to the user in the UI, where as library code is more likely to be pure logic.

Is there some source for a pseudo-typical React app which can be downloaded from Github or somewhere? I guess the ideal would be a whole app bundled into a single file, but not minified or transpiled - but why would someone publish such a thing?

Boshen · 2024-01-04T04:34:53Z

We can manually stitch together a bunch of .tsx files. For example concatenate all files from ant-design and upload load it to a gist for consumption.

overlookmotel · 2024-01-04T13:57:07Z

ant-design would be good, but it'd still be library code.

What I was thinking is that a typical website built with e.g. React would contain both rendering logic ("if product is in basket, display tick component") and also content ("These shoes are made from the finest Italian leather blah blah blah").

Library code is generic so will tend to just have the logic part but not much content.

So what I had in mind is to find something like a demo e-commerce React site which would have a mix of product pages which pull data from an API and also some content-heavy pages e.g. landing page, "about us". If it's a demo, probably the original source (complete with untranspiled JSX) would be available.

That wouldn't be totally representative of typical application code (what's typical?) but would be a closer approximation than pure library code.

Does that make sense? Can you think of such a demo site off top of your head?

camc314 · 2024-01-04T14:17:13Z

cal.com is probably a good one >> https://github.com/calcom/cal.com

Boshen · 2024-01-05T02:59:02Z

cal.com is probably a good one >> https://github.com/calcom/cal.com

Nice. For a benchmark, we need to avoid all file IOs, who want's to write a script where we walk the repo, extract all the relevant files, and concat them together?

We can setup a repo for this, so our benchmark can get the file from raw.github.com.

overlookmotel · 2024-01-05T12:00:22Z

Cool. That is a good example app.

When concatenating, do we need to make sure the result is valid JS? e.g. if 2 files both include import React from 'react'; then concatenating them would produce:

import React from 'react';
// blah blah
import React from 'react';
// blah blah

That'd be a syntax error (redefinition of var React). I'm unclear at what point OXC checks for such syntax errors (not in the parser, as far as I can see).

camc314 · 2024-01-05T13:34:14Z

use declare module '<random id>' ?

we wouldn't be able to do the import plugin. but i think it's ok

drwpow · 2024-01-19T19:30:10Z

If you needed another example beyond cal, Radix’s website is a great example of a React app. Not too big, and contains lots of practical, generic React code (a lot of their code is good reference, actually)

overlookmotel · 2024-01-20T04:23:41Z

Thanks. That's a nice example of a content-heavy site, to complement cal.com which is more app logic.

overlookmotel · 2024-01-21T15:31:57Z

OK, I'm going to get on with doing this. I can see 2 ways to approach this:

1. The easy way

Just concatenate the files without alteration. So they'll be invalid JS, as will have duplicate bindings in top-level scope.

This would be fine for benchmarking the parser, but would not be suitable for oxc_semantic, oxc_linter, or anything else which depends on semantic analysis.

If going this route, I would propose writing the scraper as a NodeJS script, as it'll be faster than coding it in Rust.

2. The better way

Even if concatenated sources were "legalised" by renaming top-level bindings, I imagine these huge concatenated files would not be an ideal benchmark for oxc_semantic, or anything else which depends on it. Presumably, having a very large number of top-level bindings would slow down searching for the binding for a ReferenceIdentifier, due to increased rate of hash table collisions.

It'd be more representative to run semantic/linter/etc on a series of smaller files.

NB: The existing benchmarks for semantic, linter etc probably already have this problem, as they're using pre-bundled sources. It'd be much more typical in the "real world" for linter to be run on smaller files prior to bundling. The current benchmarks probably are underestimating OXC's real-world performance a bit.

So... alter the benchmark framework to handle running a task on multiple smaller files.

TestFiles I guess would download all the files via GitHub's API on each run. Should be reasonably fast on CI, as data transfer is all within Github's internal network. I doubt there's a risk of hitting API rate limits, but if that does turn out to be a problem, maybe CI could be configured to cache the directory of downloaded files. For local dev, downloaded files are cached already.

Questions

Which way should we go?

I actually don't think doing it "properly" (option 2) would be so hard to implement, so my preference is to do that.

Boshen · 2024-02-04T10:50:11Z

I'm going to implement "The easy way" today because we need this urgently.

closes #1884

…ect#2297) closes oxc-project#1884

camc314 mentioned this issue Jan 6, 2024

feat(linter): no-unknown-property for eslint-plugin-react #1875

Merged

Boshen added a commit that referenced this issue Feb 4, 2024

chore(benchmark): add cal.com.tsx as a realword tsx example

adaffe3

closes #1884

Boshen mentioned this issue Feb 4, 2024

chore(benchmark): add cal.com.tsx as a realword tsx example #2297

Merged

Boshen closed this as completed in #2297 Feb 4, 2024

Boshen added a commit that referenced this issue Feb 4, 2024

chore(benchmark): add cal.com.tsx as a realword tsx example (#2297)

018674c

closes #1884

IWANABETHATGUY pushed a commit to IWANABETHATGUY/oxc that referenced this issue May 29, 2024

chore(benchmark): add cal.com.tsx as a realword tsx example (oxc-proj…

8024eff

…ect#2297) closes oxc-project#1884

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More representative code for benchmarks #1884

More representative code for benchmarks #1884

overlookmotel commented Jan 3, 2024 •

edited

Loading

Boshen commented Jan 4, 2024

overlookmotel commented Jan 4, 2024

camc314 commented Jan 4, 2024

Boshen commented Jan 5, 2024

overlookmotel commented Jan 5, 2024

camc314 commented Jan 5, 2024

drwpow commented Jan 19, 2024

overlookmotel commented Jan 20, 2024

overlookmotel commented Jan 21, 2024 •

edited

Loading

Boshen commented Feb 4, 2024

More representative code for benchmarks #1884

More representative code for benchmarks #1884

Comments

overlookmotel commented Jan 3, 2024 • edited Loading

Boshen commented Jan 4, 2024

overlookmotel commented Jan 4, 2024

camc314 commented Jan 4, 2024

Boshen commented Jan 5, 2024

overlookmotel commented Jan 5, 2024

camc314 commented Jan 5, 2024

drwpow commented Jan 19, 2024

overlookmotel commented Jan 20, 2024

overlookmotel commented Jan 21, 2024 • edited Loading

1. The easy way

2. The better way

Questions

Boshen commented Feb 4, 2024

overlookmotel commented Jan 3, 2024 •

edited

Loading

overlookmotel commented Jan 21, 2024 •

edited

Loading