Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: drop support for non-UTF8 paths #7570

Merged
merged 4 commits into from
Aug 20, 2024
Merged

refactor: drop support for non-UTF8 paths #7570

merged 4 commits into from
Aug 20, 2024

Conversation

h-a-n-a
Copy link
Collaborator

@h-a-n-a h-a-n-a commented Aug 14, 2024

Summary

Enforce paths passed to rspack to UTF-8.

According to OsString:

  • On Unix systems, strings are often arbitrary sequences of non-zero bytes, in many cases interpreted as UTF-8.
  • On Windows, strings are often arbitrary sequences of non-zero 16-bit values, interpreted as UTF-16 when it is valid to do so.
  • In Rust, strings are always valid UTF-8, which may contain zeros.

Two paths types are received by rspack:

  • Paths passed as String from JavaScript runtime (UTF-8 encoded)
  • Paths passed from system APIs are platform specific (could be non UTF-8 encoded). This might happen on unix system and windows (WTF-8 encoding) if the path contains a surrogate byte sequence. Most of the time it's UTF-8 encoded.
Performance

Ad-hoc test : String reference -> PathBuf / Utf8PathBuf -> String

  1. PathBuf::from takes inputs without UTF-8 validations, to_string_lossy validates if PathBuf is a valid UTF-8 string.
  2. Utf8PathBuf::from takes inputs without UTF-8 validations, to_string creates a formatter and call the as_str to validate(not if rustc version is >= 1.74 stable, perf: cost-free conversion from paths to &str camino-rs/camino#93). (It's even faster to call as_str().to_string() instead)
let mut group = c.benchmark_group("match");
    for i in [10, 100, 1000, 10000] {
        let p = "i".repeat(i);
        group.bench_with_input(BenchmarkId::new("utf8 lossy", i), &p, |b, i| {
            b.iter(|| {
                let a = PathBuf::from(i);
                a.to_string_lossy().to_string();
            })
        });

        group.bench_with_input(BenchmarkId::new("utf8 camino", i), &p, |b, i| {
            b.iter(|| {
                let a = Utf8PathBuf::from(i);
                a.to_string();
            })
        });
    }
match/utf8 lossy/10     time:   [33.520 ns 33.636 ns 33.757 ns]
match/utf8 camino/10    time:   [39.122 ns 39.266 ns 39.405 ns]

match/utf8 lossy/100    time:   [67.696 ns 68.091 ns 68.521 ns]
match/utf8 camino/100   time:   [42.497 ns 42.864 ns 43.411 ns]

match/utf8 lossy/1000   time:   [428.14 ns 428.97 ns 429.80 ns]
match/utf8 camino/1000  time:   [146.31 ns 146.57 ns 146.82 ns]

match/utf8 lossy/10000  time:   [3.1279 µs 3.1323 µs 3.1368 µs]
match/utf8 camino/10000 time:   [617.73 ns 623.08 ns 628.75 ns]

Comparison: PathBuf -> Utf8PathBuf -> String vs. PathBuf -> String

This is the performance generated with bench command as shown below.
Always choose not to call assert_utf8 if these paths are only read once. UTF-8 validation is costy.
For example: file_dependencies, etc. Check out this commit for the changes I made: 58b8b95

Checklist

  • Tests updated (or not required).
  • Documentation updated (or not required).

@github-actions github-actions bot added the team The issue/pr is created by the member of Rspack. label Aug 14, 2024
Copy link

netlify bot commented Aug 14, 2024

Deploy Preview for rspack ready!

Name Link
🔨 Latest commit 5c6b5d7
🔍 Latest deploy log https://app.netlify.com/sites/rspack/deploys/66c416c8f8647f0008d45cd0
😎 Deploy Preview https://deploy-preview-7570--rspack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@h-a-n-a

This comment was marked as outdated.

@rspack-bot

This comment was marked as outdated.

@h-a-n-a

This comment was marked as outdated.

@rspack-bot

This comment was marked as outdated.

@h-a-n-a

This comment was marked as outdated.

@rspack-bot

This comment was marked as outdated.

@h-a-n-a h-a-n-a changed the title refactor: utf8 paths refactor!: drop support for non-UTF8 paths Aug 14, 2024
@h-a-n-a h-a-n-a added the release: breaking change release: breaking change related release(mr only) label Aug 14, 2024
@h-a-n-a h-a-n-a changed the title refactor!: drop support for non-UTF8 paths refactor: drop support for non-UTF8 paths Aug 14, 2024
@h-a-n-a h-a-n-a removed the release: breaking change release: breaking change related release(mr only) label Aug 14, 2024
@h-a-n-a
Copy link
Collaborator Author

h-a-n-a commented Aug 14, 2024

This will not be considered as a breaking change. As we does not support Rust API in the first place and strings passed from NAPI runtime are always UTF-8 encoded.

@h-a-n-a h-a-n-a force-pushed the refactor-path branch 2 times, most recently from 5bf0ade to c88403e Compare August 15, 2024 08:10
@h-a-n-a

This comment was marked as outdated.

@rspack-bot

This comment was marked as outdated.

@h-a-n-a

This comment was marked as outdated.

@rspack-bot

This comment was marked as outdated.

@h-a-n-a

This comment was marked as outdated.

@h-a-n-a
Copy link
Collaborator Author

h-a-n-a commented Aug 16, 2024

!bench

@rspack-bot
Copy link

rspack-bot commented Aug 16, 2024

📝 Benchmark detail: Open

Name Base (2024-08-16 218bda0) Current Change
10000_development-mode + exec 2.35 s ± 21 ms 2.33 s ± 28 ms -0.98 %
10000_development-mode_hmr + exec 713 ms ± 13 ms 728 ms ± 10 ms +2.11 %
10000_production-mode + exec 3.02 s ± 38 ms 3.03 s ± 44 ms +0.41 %
arco-pro_development-mode + exec 1.91 s ± 68 ms 1.89 s ± 89 ms -1.22 %
arco-pro_development-mode_hmr + exec 439 ms ± 2 ms 439 ms ± 1.5 ms -0.00 %
arco-pro_production-mode + exec 3.48 s ± 110 ms 3.46 s ± 77 ms -0.35 %
arco-pro_production-mode_generate-package-json-webpack-plugin + exec 3.52 s ± 98 ms 3.53 s ± 83 ms +0.19 %
threejs_development-mode_10x + exec 1.71 s ± 14 ms 1.69 s ± 16 ms -1.07 %
threejs_development-mode_10x_hmr + exec 823 ms ± 9.3 ms 804 ms ± 5.2 ms -2.31 %
threejs_production-mode_10x + exec 5.53 s ± 39 ms 5.54 s ± 41 ms +0.18 %

@h-a-n-a h-a-n-a marked this pull request as ready for review August 20, 2024 03:03
@h-a-n-a h-a-n-a enabled auto-merge (squash) August 20, 2024 03:03
@h-a-n-a
Copy link
Collaborator Author

h-a-n-a commented Aug 20, 2024

bumped camino to support: camino-rs/camino#93

@h-a-n-a h-a-n-a requested a review from JSerFeng August 20, 2024 04:44
@h-a-n-a h-a-n-a merged commit d97d69f into main Aug 20, 2024
32 checks passed
@h-a-n-a h-a-n-a deleted the refactor-path branch August 20, 2024 06:20
@chenjiahan chenjiahan mentioned this pull request Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team The issue/pr is created by the member of Rspack.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants