Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(codegen): reduce size of LineOffsetTable #4643

Merged

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented Aug 5, 2024

LineOffsetTables records mappings from byte offset to line and column numbers (with column number in UTF-16 characters).

Most lines do not contain any Unicode characters, and for these lines there is an exact correspondence between number of bytes from start of line and UTF-16 column number, so no column lookup table is required.

Reduce the data stored for each line from 32 bytes to 8 bytes by storing column offset lookup tables for the rare lines which do contain Unicode chars separately.

Additionally, store column lookup tables as a Box<[u32]> instead of Vec<u32> to reduce the size of ColumnOffsets by 8 bytes.

Copy link

graphite-app bot commented Aug 5, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link
Contributor Author

overlookmotel commented Aug 5, 2024

Copy link

codspeed-hq bot commented Aug 5, 2024

CodSpeed Performance Report

Merging #4643 will improve performances by 4.02%

Comparing 08-05-perf_codegen_reduce_size_of_lineoffsettable_ (8dd76e4) with main (9f8f299)

Summary

⚡ 2 improvements
✅ 30 untouched benchmarks

Benchmarks breakdown

Benchmark main 08-05-perf_codegen_reduce_size_of_lineoffsettable_ Change
codegen_sourcemap[checker.ts] 75.6 ms 72.9 ms +3.7%
sourcemap[cal.com.tsx] 67.2 ms 64.7 ms +4.02%

@Boshen Boshen force-pushed the 08-05-perf_codegen_u32_indexes_in_lineoffsettable_for_source_maps branch from f36184f to b8e6753 Compare August 5, 2024 02:28
@Boshen Boshen changed the base branch from 08-05-perf_codegen_u32_indexes_in_lineoffsettable_for_source_maps to main August 5, 2024 02:32
@Boshen Boshen force-pushed the 08-05-perf_codegen_reduce_size_of_lineoffsettable_ branch from 55dee64 to fcfaaa6 Compare August 5, 2024 02:32
@overlookmotel overlookmotel force-pushed the 08-05-perf_codegen_reduce_size_of_lineoffsettable_ branch from fcfaaa6 to 150ccb0 Compare August 5, 2024 03:02
@overlookmotel overlookmotel force-pushed the 08-05-perf_codegen_reduce_size_of_lineoffsettable_ branch from 150ccb0 to f961b5c Compare August 5, 2024 23:49
@overlookmotel overlookmotel marked this pull request as ready for review August 5, 2024 23:58
@Boshen Boshen requested a review from underfin August 6, 2024 01:03
@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Aug 6, 2024
Copy link

graphite-app bot commented Aug 6, 2024

Merge activity

  • Aug 5, 9:04 PM EDT: The merge label 'merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Aug 5, 9:04 PM EDT: Boshen added this pull request to the Graphite merge queue.
  • Aug 5, 9:13 PM EDT: Boshen merged this pull request with the Graphite merge queue.

`LineOffsetTables` records mappings from byte offset to line and column numbers (with column number in UTF-16 characters).

Most lines do not contain any Unicode characters, and for these lines there is an exact correspondence between number of bytes from start of line and UTF-16 column number, so no column lookup table is required.

Reduce the data stored for each line from 32 bytes to 8 bytes by storing column offset lookup tables for the rare lines which do contain Unicode chars separately.

Additionally, store column lookup tables as a `Box<[u32]>` instead of `Vec<u32>` to reduce the size of `ColumnOffsets` by 8 bytes.
@Boshen Boshen force-pushed the 08-05-perf_codegen_reduce_size_of_lineoffsettable_ branch from f961b5c to 8dd76e4 Compare August 6, 2024 01:08
@graphite-app graphite-app bot merged commit 8dd76e4 into main Aug 6, 2024
24 checks passed
@graphite-app graphite-app bot deleted the 08-05-perf_codegen_reduce_size_of_lineoffsettable_ branch August 6, 2024 01:13
@oxc-bot oxc-bot mentioned this pull request Aug 6, 2024
Boshen added a commit that referenced this pull request Aug 6, 2024
## [0.23.1] - 2024-08-06

### Features

- fd2d9da ast: Improve `AstKind::debug_name` (#4553) (DonIsaac)
- b3b7028 ast: Implement missing Clone, Hash, and Display traits for
literals (#4552) (DonIsaac)
- 54047e0 ast: `GetSpanMut` trait (#4609) (overlookmotel)
- eae401c ast, ast_macros: Apply stable repr to all `#[ast]` enums
(#4373) (rzvxa)
- 0c52c0d ast_codegen: Add alignment and size data to the schema.
(#4615) (rzvxa)
- 229a0e9 minifier: Implement dot define for member expressions (#3959)
(camc314)
- e42ac3a sourcemap: Add `ConcatSourceMapBuilder::from_sourcemaps`
(#4639) (overlookmotel)

### Bug Fixes

- 4a56954 codegen: Print raw if value is number is Infinity (#4676)
(Boshen)
- bf48c7f minifier: Fix `keep_var` keeping vars from arrow functions
(#4680) (Boshen)
- 9be29af minifier: Temporarily fix shadowed `undefined` variable
(#4678) (Boshen)
- e8b662a minifier: Various fixes to pass minifier conformance (#4667)
(Boshen)
- a40a217 parser: Parse `assert` keyword in `TSImportAttributes` (#4610)
(Boshen)
- 03c643a semantic: Incorrect `scope_id` for catch parameter symbols
(#4659) (Dunqing)
- 6c612d1 semantic/jsdoc: Handle whitespace absence (#4642) (leaysgur)
- 0d2c41a semantic/jsdoc: Panic on parsing `type_name_comment`. (#4632)
(rzvxa)
- 9f8f299 syntax: Prevent creating invalid u32 IDs (#4675)
(overlookmotel)
- 5327acd transformer/react: The `require` IdentifierReference does not
have a `reference_id` (#4658) (Dunqing)
- 3987665 transformer/typescript: Incorrect enum-related
`symbol_id`/`reference_id` (#4660) (Dunqing)
- 4efd54b transformer/typescript: Incorrect `SymbolFlags` for jsx
imports (#4549) (Dunqing)

### Performance

- 8dd76e4 codegen: Reduce size of `LineOffsetTable` (#4643)
(overlookmotel)
- b8e6753 codegen: `u32` indexes in `LineOffsetTable` for source maps
(#4641) (overlookmotel)
- 6ff200d linter: Change react rules and utils to use `Cow` and
`CompactStr` instead of `String` (#4603) (DonIsaac)
- 0f5e982 minifier: Only visit arrow expression after dropping
`console.log` (#4677) (Boshen)
- ff43dff sourcemap: Speed up VLQ encoding (#4633) (overlookmotel)
- a330773 sourcemap: Reduce string copying in `ConcatSourceMapBuilder`
(#4638) (overlookmotel)
- 372316b sourcemap: `ConcatSourceMapBuilder` extend `source_contents`
in separate loop (#4634) (overlookmotel)
- c7f1d48 sourcemap: Keep local copy of previous token in VLQ encode
(#4596) (overlookmotel)
- 590d795 sourcemap: Shorten main loop encoding VLQ (#4586)
(overlookmotel)

### Documentation

- c69ada4 ast: Improve AST node documentation (#4051) (Rintaro Itokawa)

### Refactor

- ba70001 ast: Put `assert_layouts.rs` behind `debug_assertions` (#4621)
(rzvxa)
- 3f53b6f ast: Make AST structs `repr(C)`. (#4614) (rzvxa)
- 452e0ee ast: Remove defunct `visit_as` + `visit_args` attrs from
`#[ast]` macro (#4599) (overlookmotel)
- e78cba6 minifier: Ast passes infrastructure (#4625) (Boshen)
- d25dea7 parser: Use `ast_builder` in more places. (#4612) (rzvxa)
- 09d9822 semantic: Simplify setting scope flags (#4674) (overlookmotel)
- 6e453db semantic: Simplify inherit scope flags from parent scope
(#4664) (Dunqing)- 9b51e04 Overhaul napi transformer package (#4592)
(DonIsaac)

### Testing

- 49d5196 ast: Fix `assert_layouts.rs` offset tests on 32bit platforms.
(#4620) (rzvxa)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
@oxc-bot oxc-bot mentioned this pull request Aug 8, 2024
Boshen added a commit that referenced this pull request Aug 8, 2024
## [0.24.0] - 2024-08-08

- 75f2207 traverse: [**BREAKING**] Replace `find_scope` with
`ancestor_scopes` returning iterator (#4693) (overlookmotel)

- 506709f traverse: [**BREAKING**] Replace `find_ancestor` with
`ancestors` returning iterator (#4692) (overlookmotel)

### Features

- 23b0040 allocator: Introduce `CloneIn` trait. (#4726) (rzvxa)
- 51c1ca0 ast: Derive `CloneIn` for AST types, using `generate_derive`.
(#4732) (rzvxa)
- e12bd1e ast: Allow conversion from TSAccessibility into &'static str
(#4711) (DonIsaac)
- fd2d9da ast: Improve `AstKind::debug_name` (#4553) (DonIsaac)
- b3b7028 ast: Implement missing Clone, Hash, and Display traits for
literals (#4552) (DonIsaac)
- 54047e0 ast: `GetSpanMut` trait (#4609) (overlookmotel)
- eae401c ast, ast_macros: Apply stable repr to all `#[ast]` enums
(#4373) (rzvxa)
- ec0b4cb ast_codegen: Add `derive_clone_in` generator. (#4731) (rzvxa)
- 2e91ad6 ast_codegen: Support for `generate_derive` marker. (#4728)
(rzvxa)
- 82e2f6b ast_codegen: Process AST-related `syntax` types. (#4694)
(rzvxa)
- 0c52c0d ast_codegen: Add alignment and size data to the schema.
(#4615) (rzvxa)
- 07607d3 ast_codegen, span: Process `Span` through ast_codegen (#4703)
(overlookmotel)
- 125c5fd ast_codegen, span: Process `SourceType` through ast_codegen.
(#4696) (rzvxa)
- eaddc8f linter: Add fixer for eslint/func_names (#4714) (DonIsaac)
- 229a0e9 minifier: Implement dot define for member expressions (#3959)
(camc314)
- 33f1312 semantic: Impl GetSpan for AstNode (#4717) (DonIsaac)
- e42ac3a sourcemap: Add `ConcatSourceMapBuilder::from_sourcemaps`
(#4639) (overlookmotel)
- 2e63618 span: Implement `CloneIn` for the AST-related items. (#4729)
(rzvxa)
- 6a36616 syntax: Derive `CloneIn` for the AST-related items. (#4730)
(rzvxa)

### Bug Fixes

- 4a56954 codegen: Print raw if value is number is Infinity (#4676)
(Boshen)
- 94d3c31 minifier: Avoid removing function declaration from `KeepVar`
(#4722) (Boshen)
- bf43148 minifier: Do not `remove_syntax` in dead_code_elimination
(Boshen)
- bf48c7f minifier: Fix `keep_var` keeping vars from arrow functions
(#4680) (Boshen)
- 9be29af minifier: Temporarily fix shadowed `undefined` variable
(#4678) (Boshen)
- e8b662a minifier: Various fixes to pass minifier conformance (#4667)
(Boshen)
- 01d85de napi/transform: Update napi files (Boshen)
- f290191 oxc_ast_macros: Fix `syn` lacking features to build (Boshen)
- a40a217 parser: Parse `assert` keyword in `TSImportAttributes` (#4610)
(Boshen)
- 03c643a semantic: Incorrect `scope_id` for catch parameter symbols
(#4659) (Dunqing)
- 6c612d1 semantic/jsdoc: Handle whitespace absence (#4642) (leaysgur)
- 0d2c41a semantic/jsdoc: Panic on parsing `type_name_comment`. (#4632)
(rzvxa)
- 9f8f299 syntax: Prevent creating invalid u32 IDs (#4675)
(overlookmotel)
- 4797eaa transformer: Strip TS statements from for in/of statement
bodies (#4686) (overlookmotel)
- 5327acd transformer/react: The `require` IdentifierReference does not
have a `reference_id` (#4658) (Dunqing)
- 3987665 transformer/typescript: Incorrect enum-related
`symbol_id`/`reference_id` (#4660) (Dunqing)
- 4efd54b transformer/typescript: Incorrect `SymbolFlags` for jsx
imports (#4549) (Dunqing)

### Performance

- 8dd76e4 codegen: Reduce size of `LineOffsetTable` (#4643)
(overlookmotel)
- b8e6753 codegen: `u32` indexes in `LineOffsetTable` for source maps
(#4641) (overlookmotel)
- 6ff200d linter: Change react rules and utils to use `Cow` and
`CompactStr` instead of `String` (#4603) (DonIsaac)
- 0f5e982 minifier: Only visit arrow expression after dropping
`console.log` (#4677) (Boshen)
- ff43dff sourcemap: Speed up VLQ encoding (#4633) (overlookmotel)
- a330773 sourcemap: Reduce string copying in `ConcatSourceMapBuilder`
(#4638) (overlookmotel)
- 372316b sourcemap: `ConcatSourceMapBuilder` extend `source_contents`
in separate loop (#4634) (overlookmotel)
- c7f1d48 sourcemap: Keep local copy of previous token in VLQ encode
(#4596) (overlookmotel)
- 590d795 sourcemap: Shorten main loop encoding VLQ (#4586)
(overlookmotel)

### Documentation

- c69ada4 ast: Improve AST node documentation (#4051) (Rintaro Itokawa)

### Refactor

- 579b797 ast: Use type identifier instead of `CloneIn::Cloned` GAT.
(#4738) (rzvxa)
- 475266d ast: Use correct lifetimes for name-related methods (#4712)
(DonIsaac)
- 83b6ca9 ast: Add explicit enum discriminants. (#4689) (rzvxa)
- ba70001 ast: Put `assert_layouts.rs` behind `debug_assertions` (#4621)
(rzvxa)
- 3f53b6f ast: Make AST structs `repr(C)`. (#4614) (rzvxa)
- 452e0ee ast: Remove defunct `visit_as` + `visit_args` attrs from
`#[ast]` macro (#4599) (overlookmotel)
- 2218340 ast, ast_codegen: Use `generate_derive` for implementing
`GetSpan` and `GetSpanMut` traits. (#4735) (rzvxa)
- fbfd852 minifier: Add `NodeUtil` trait for accessing symbols on ast
nodes (#4734) (Boshen)
- e0832f8 minifier: Use `oxc_traverse` for AST passes (#4725) (Boshen)
- 17602db minifier: Move tests and files around (Boshen)
- 3289477 minifier: Clean up tests (#4724) (Boshen)
- e78cba6 minifier: Ast passes infrastructure (#4625) (Boshen)
- d25dea7 parser: Use `ast_builder` in more places. (#4612) (rzvxa)
- 09d9822 semantic: Simplify setting scope flags (#4674) (overlookmotel)
- 6e453db semantic: Simplify inherit scope flags from parent scope
(#4664) (Dunqing)
- e1429e5 span: Reduce #[cfg_attr] boilerplate in type defs (#4702)
(overlookmotel)
- e24fb5b syntax: Add explicit enum discriminants to AST related types.
(#4691) (rzvxa)
- 3f3cb62 syntax, span: Reduce #[cfg_attr] boilerplate in type defs
(#4698) (overlookmotel)
- 54f9897 traverse: Simpler code for entering/exiting unconditional
scopes (#4685) (overlookmotel)
- 83546d3 traverse: Enter node before entering scope (#4684)
(overlookmotel)- 9b51e04 Overhaul napi transformer package (#4592)
(DonIsaac)

### Testing

- 49d5196 ast: Fix `assert_layouts.rs` offset tests on 32bit platforms.
(#4620) (rzvxa)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-merge Merge with Graphite Merge Queue A-codegen Area - Code Generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants