Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More representative code for benchmarks #1884

Closed
overlookmotel opened this issue Jan 3, 2024 · 10 comments · Fixed by #2297
Closed

More representative code for benchmarks #1884

overlookmotel opened this issue Jan 3, 2024 · 10 comments · Fixed by #2297

Comments

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 3, 2024

The current benchmarks for parser are based on 3 inputs:

  • TypeScript compiler's checker
  • PDF.js
  • Ant Design

The common factor is that they're all library code.

I wonder if it'd be helpful to add a benchmark which is more representative of application code? (which I imagine is more commonly what OXC is run on)

I'd guess that the composition of app code might be quite different from library code. For example, React app code would (obviously) contain JSX syntax. It'd also likely contain a lot more string literals for text that gets displayed to the user in the UI, where as library code is more likely to be pure logic.

Is there some source for a pseudo-typical React app which can be downloaded from Github or somewhere? I guess the ideal would be a whole app bundled into a single file, but not minified or transpiled - but why would someone publish such a thing?

@Boshen
Copy link
Member

Boshen commented Jan 4, 2024

We can manually stitch together a bunch of .tsx files. For example concatenate all files from ant-design and upload load it to a gist for consumption.

@overlookmotel
Copy link
Contributor Author

ant-design would be good, but it'd still be library code.

What I was thinking is that a typical website built with e.g. React would contain both rendering logic ("if product is in basket, display tick component") and also content ("These shoes are made from the finest Italian leather blah blah blah").

Library code is generic so will tend to just have the logic part but not much content.

So what I had in mind is to find something like a demo e-commerce React site which would have a mix of product pages which pull data from an API and also some content-heavy pages e.g. landing page, "about us". If it's a demo, probably the original source (complete with untranspiled JSX) would be available.

That wouldn't be totally representative of typical application code (what's typical?) but would be a closer approximation than pure library code.

Does that make sense? Can you think of such a demo site off top of your head?

@camc314
Copy link
Contributor

camc314 commented Jan 4, 2024

cal.com is probably a good one >> https://github.com/calcom/cal.com

@Boshen
Copy link
Member

Boshen commented Jan 5, 2024

cal.com is probably a good one >> https://github.com/calcom/cal.com

Nice. For a benchmark, we need to avoid all file IOs, who want's to write a script where we walk the repo, extract all the relevant files, and concat them together?

We can setup a repo for this, so our benchmark can get the file from raw.github.com.

@overlookmotel
Copy link
Contributor Author

Cool. That is a good example app.

When concatenating, do we need to make sure the result is valid JS? e.g. if 2 files both include import React from 'react'; then concatenating them would produce:

import React from 'react';
// blah blah
import React from 'react';
// blah blah

That'd be a syntax error (redefinition of var React). I'm unclear at what point OXC checks for such syntax errors (not in the parser, as far as I can see).

@camc314
Copy link
Contributor

camc314 commented Jan 5, 2024

use declare module '<random id>' ?

we wouldn't be able to do the import plugin. but i think it's ok

@drwpow
Copy link

drwpow commented Jan 19, 2024

If you needed another example beyond cal, Radix’s website is a great example of a React app. Not too big, and contains lots of practical, generic React code (a lot of their code is good reference, actually)

@overlookmotel
Copy link
Contributor Author

Thanks. That's a nice example of a content-heavy site, to complement cal.com which is more app logic.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Jan 21, 2024

OK, I'm going to get on with doing this. I can see 2 ways to approach this:

1. The easy way

Just concatenate the files without alteration. So they'll be invalid JS, as will have duplicate bindings in top-level scope.

This would be fine for benchmarking the parser, but would not be suitable for oxc_semantic, oxc_linter, or anything else which depends on semantic analysis.

If going this route, I would propose writing the scraper as a NodeJS script, as it'll be faster than coding it in Rust.

2. The better way

Even if concatenated sources were "legalised" by renaming top-level bindings, I imagine these huge concatenated files would not be an ideal benchmark for oxc_semantic, or anything else which depends on it. Presumably, having a very large number of top-level bindings would slow down searching for the binding for a ReferenceIdentifier, due to increased rate of hash table collisions.

It'd be more representative to run semantic/linter/etc on a series of smaller files.

NB: The existing benchmarks for semantic, linter etc probably already have this problem, as they're using pre-bundled sources. It'd be much more typical in the "real world" for linter to be run on smaller files prior to bundling. The current benchmarks probably are underestimating OXC's real-world performance a bit.

So... alter the benchmark framework to handle running a task on multiple smaller files.

TestFiles I guess would download all the files via GitHub's API on each run. Should be reasonably fast on CI, as data transfer is all within Github's internal network. I doubt there's a risk of hitting API rate limits, but if that does turn out to be a problem, maybe CI could be configured to cache the directory of downloaded files. For local dev, downloaded files are cached already.

Questions

Which way should we go?

I actually don't think doing it "properly" (option 2) would be so hard to implement, so my preference is to do that.

@Boshen
Copy link
Member

Boshen commented Feb 4, 2024

I'm going to implement "The easy way" today because we need this urgently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants