Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of HTML parser on JVM #67

Merged
merged 1 commit into from
Oct 3, 2020
Merged

Commits on Oct 3, 2020

  1. HtmlParser: Improve performance on JVM

    The HTML parser incurs a significant slowdown as the nesting level
    increases:
    
    ```shell
    $ bloop run pine-bench-jvm -- slow
    [...]
    Benchmark: Parse HTML w/o attributes
    - depth=2:
      units: 7
      iterations: 591733
      run time: 3384 μs/it ± 7
    - depth=6:
      units: 127
      iterations: 11610
      run time: 171586 μs/it ± 504
    - depth=10:
      units: 2047
      iterations: 56
      run time: 36148809 μs/it ± 74820
    - depth=14:
      units: 32767
      iterations: 1
      run time: 9353666666 μs/it ± 174704194
    
    Summary:
      Unit growth: 18.1x, 16.1x, 16.0x
      Run time growth: 50.7x, 210.7x, 258.8x
    ```
    
    This slow down can be attributed to the `rest()` function in `Reader`.
    It calls `data.drop()` which on the JVM creates a copy of the string
    rather than pointing to the same memory.
    
    Scala.js' `drop()` implementation has the expected semantics such that
    the run time performance is roughly linear to the number of nodes in the
    tree:
    
    ```shell
    $ bloop run pine-bench-js -- slow
    [...]
    Benchmark: Parse HTML w/o attributes
    - depth=2:
      units: 7
      iterations: 92592
      run time: 21624 μs/it ± 128
    - depth=6:
      units: 127
      iterations: 5229
      run time: 382531 μs/it ± 182
    - depth=10:
      units: 2047
      iterations: 312
      run time: 6479611 μs/it ± 134542
    - depth=14:
      units: 32767
      iterations: 17
      run time: 119013071 μs/it ± 1455066
    
    Summary:
      Unit growth: 18.1x, 16.1x, 16.0x
      Run time growth: 17.7x, 16.9x, 18.4x
    ```
    
    After applying the optimisations, the parser will behave similarly on
    the JVM:
    
    ```
    $ bloop run pine-bench-jvm -- slow
    [...]
    Benchmark: Parse HTML w/o attributes
    - depth=2:
      units: 7
      iterations: 991955
      run time: 2048 μs/it ± 51
    - depth=6:
      units: 127
      iterations: 45471
      run time: 43403 μs/it ± 412
    - depth=10:
      units: 2047
      iterations: 2523
      run time: 777550 μs/it ± 10750
    - depth=14:
      units: 32767
      iterations: 147
      run time: 13759510 μs/it ± 160258
    
    Summary:
      Unit growth: 18.1x, 16.1x, 16.0x
      Run time growth: 21.2x, 17.9x, 17.7x
    ```
    tindzk committed Oct 3, 2020
    Configuration menu
    Copy the full SHA
    8e70cc9 View commit details
    Browse the repository at this point in the history