Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pathUtils.absoluteToNormal correctness and performance #1210

Closed
wants to merge 3 commits into from

Conversation

robhogan
Copy link
Contributor

@robhogan robhogan commented Feb 5, 2024

Summary:

Correctness

Fix a correctness issue in pathUtils.absoluteToNormal where the returned paths would not be fully normalised relative to the root - e.g. an input of /project/root/../root/foo would return ../root/foo instead of foo.

Performance

This rewrite also increases the set of paths absoluteToNormal is able to optimise for, beyond just those beginning with the full project root - absolute paths outside the project root are typical in monorepo setups.

Now, given a mix of absolute paths within and outside the project root, absoluteToNormal is >8x faster than the equivalent path.relative and 4x faster than before this diff.

This is called whenever TreeFS checks the existence of a file by absolute path (the typical case) during resolution, and accounts for more than half the time taken to return whether the file exists in the file map.

(Some of the code here is a little arcane for low-level performance - apologies!)

Benchmarks

$ node bench3
Correctness check passed
Comparing absoluteToNormal with fastPath across 2116232 existence checks performed by js1 build buckfiles resolution
┌─────────┬────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┬─────────────┐
│ (index) │ Task Name                      │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │ Comparison  │
├─────────┼────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┼─────────────┤
│ 0       │ 'path.relative()'              │ '0'     │ 2879838924.896717  │ '±1.92%' │ 10      │ '1x (base)' │
│ 1       │ 'fastPath.relative()'          │ '0'     │ 1595209724.8077393 │ '±1.26%' │ 10      │ '1.8x'      │
│ 2       │ 'pathUtils.absoluteToNormal()' │ '2'     │ 348316470.78990936 │ '±1.60%' │ 10      │ '8.3x'      │
└─────────┴────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┴─────────────┘

Changelog:

**[Performance]:** Improve resolution performance for watched files outside `projectRoot`.

Reviewed By: motiz88

Differential Revision: D52513808

Summary:


## Correctness
Fixes a bug in `fastPath.resolve` where paths ending in `/..` would not collapse preceding segments, eg `resolve('/project/root', '../..')` would return `/project/..` rather than `/`.

This came up when implementing incremental resolution support - we end up resolving canonical paths like this during lookup of first missing directory nodes. The correctness fix is changelog-internal because Metro doesn't currently call `resolve` with paths like these, and this API isn't public.

## Performance
In implementing this, I've also squeezed out a bit more performance so that `fast_path.resolve` is now about 1.6x faster than before this diff and 20x faster than `path.resolve` for repeated calls with the same root. This is significant because it actually dominates `getRealPath`, which is called hundreds of thousands of times during module resolution on a typical graph. Zoomed out benchmarks to follow in the stack.

Changelog: [Performance] Improve resolution performance by optimising normal->absolute path conversion

## Benchmarks
Calling `resolve` 2m times on different, representative (FB internal) paths per sample.
```
┌─────────┬───────────┬─────────┬───────────────────┬──────────┬─────────┐
│ (index) │ Task Name │ ops/sec │ Average Time (ns) │ Margin   │ Samples │
├─────────┼───────────┼─────────┼───────────────────┼──────────┼─────────┤
│ 0       │ 'new'     │ '16'    │ 61555575.15459959 │ '±0.26%' │ 488     │
│ 1       │ 'old'     │ '10'    │ 99607323.81785941 │ '±0.32%' │ 302     │
│ 2       │ 'system'  │ '0'     │ 1472189198.380425 │ '±0.73%' │ 21      │
└─────────┴───────────┴─────────┴───────────────────┴──────────┴─────────┘
```

Reviewed By: motiz88

Differential Revision: D52464135
Summary:
`fast_path` is an important collection of functions for Metro startup and resolution performance, much faster than `node:path` "equivalents" when we can make certain assumptions about their use (such as consistent path separators).

However, they're not 1-1 replacements for `path`, and rather than leave the differences to missable code comments I think it'd be better for this collection to have its own, more explicitly named API.

Additionally, by changing the API so that a root is specified up-front rather than with each call, we can better cater for Metro's typical use case (`root` is almost always the configured `projectRoot`, and is invariant), allowing for better performance (indeed `resolve`/`normalToAbsolute` becomes about 10% faster in this diff).

This rearranges the API into a `RootPathUtils(rootDir: string)` class exposing `normalToAbsolute` and `absoluteToNormal` methods, and replaces all call sites.

Changelog: Internal

Reviewed By: motiz88

Differential Revision: D52480112
Summary:
## Correctness
Fix a correctness issue in `pathUtils.absoluteToNormal` where the returned paths would not be fully normalised relative to the root - e.g. an input of `/project/root/../root/foo` would return `../root/foo` instead of `foo`.

## Performance
This rewrite also increases the set of paths `absoluteToNormal` is able to optimise for, beyond just those beginning with the full project root - absolute paths outside the project root are typical in monorepo setups.

Now, given a mix of absolute paths within and outside the project root, `absoluteToNormal` is >8x faster than the equivalent `path.relative` and 4x faster than before this diff.

This is called whenever `TreeFS` checks the existence of a file by absolute path (the typical case) during resolution, and accounts for more than half the time taken to return whether the file exists in the file map.

(Some of the code here is a little arcane for low-level performance - apologies!)

## Benchmarks

```
$ node bench3
Correctness check passed
Comparing absoluteToNormal with fastPath across 2116232 existence checks performed by js1 build buckfiles resolution
┌─────────┬────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┬─────────────┐
│ (index) │ Task Name                      │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │ Comparison  │
├─────────┼────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┼─────────────┤
│ 0       │ 'path.relative()'              │ '0'     │ 2879838924.896717  │ '±1.92%' │ 10      │ '1x (base)' │
│ 1       │ 'fastPath.relative()'          │ '0'     │ 1595209724.8077393 │ '±1.26%' │ 10      │ '1.8x'      │
│ 2       │ 'pathUtils.absoluteToNormal()' │ '2'     │ 348316470.78990936 │ '±1.60%' │ 10      │ '8.3x'      │
└─────────┴────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┴─────────────┘
```

Changelog: 
```
**[Performance]:** Improve resolution performance for watched files outside `projectRoot`.
```

Reviewed By: motiz88

Differential Revision: D52513808
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52513808

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 39de3d4.

@robhogan robhogan deleted the export-D52513808 branch February 6, 2024 19:17
facebook-github-bot pushed a commit that referenced this pull request Mar 7, 2024
Summary:
Pull Request resolved: #1231

D52513808 / #1210 introduced a startup regression in Watchman crawling on Windows where a clock key was incorrectly calculated due to more strict behaviour around path separators.

This meant the clock was not matched with cached state, and the crawler would fall back to a "glob" query, rather than the (typically much cheaper) incremental "since" query.

The bug isn't in the new `RootPathUtils` itself, which already states its assumptions (including `"All input path separators must be system-native."`), but rather in the failure to properly normalise a particular response from Watchman (which returns posix separators, even on Windows) to Metro's system-native paths, before using `pathUtils.absoluteToNormal`. Note that we already normalise other responses from Watchman to system separators.

Before D52513808 / #1210 we would've deemed the path too complex to handle and fallen back to `path.relative`, which is much slower but more forgiving.

Changelog
```
* **[Fix]**: Fix startup time regression when using Watchman on Windows with a warm cache.
```

Reviewed By: motiz88

Differential Revision: D54592646

fbshipit-source-id: de91b1a724fcec470e93650ec9c6adce8c148aac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants