-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize path mapping lookups #59048
Conversation
…uickly. This also adds a naive cache for each call, though it is not resilient to being passed the same array of patterns twice with different contents, and could be defeated if the patterns frequently differ.
I don't know if our perf suites have repos with lots of path mappings, but... @typescript-bot perf test this |
Set
and a binary search to find longest-prefix matches more quicklySet
and a binary search to search for patterns on paths
@typescript-bot pack this |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
…ping/pong situation.
In some experiments on internal codebases, this seems to shave off maybe 200-300ms depending on the project (they are using pretty big lists of path mappings), but it looks like it ends up in a situation where the function gets called with two different arrays in an interleaving way. So instead of a single-slot cache, I added a @typescript-bot test top400 |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
src/compiler/core.ts
Outdated
let matchedValue: T | undefined; | ||
// use length of prefix as betterness criteria | ||
let longestMatchPrefixLength = -1; | ||
|
||
for (let i = 0; i < values.length; i++) { | ||
for (let i = 0; i < endIndex; i++) { | ||
const v = values[i]; | ||
const pattern = getPattern(v); | ||
if (isPatternMatch(pattern, candidate) && pattern.prefix.length > longestMatchPrefixLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to run these conditions in the reverse order? No point in running the cost of pattern matching if the pattern to be matched is a worse fit than the current best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, what will probably happen is we'll have an empty prefix, that'll match, and then everything after will still need to be checked. It probably makes sense to walk the list backwards looking for a match in case a long match is found, and then bail out as soon as an entry with a shorter prefix length is encountered.
src/compiler/utilities.ts
Outdated
return undefined; | ||
} | ||
|
||
let index = binarySearchKey(sortedPatterns, candidate, getPatternPrefix, compareStringsCaseSensitive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rush
we have to efficiently match a large array of Git file paths to project prefixes as part of hashing our input states; for that we use this class which you might find useful for this application: https://github.com/microsoft/rushstack/blob/main/libraries/rush-lib/src/logic/LookupByPath.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some numbers, that tool matches 142936 unique file paths to 1279 unique path prefixes in 62 milliseconds on my codespace machine (single threaded). This is without taking any advantage of that the list of paths returned by Git are technically sorted.
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
I think the latest change helped a bit; however, it seems like frequently this function ends up with an (effectively) identical list every single time with different reference equality. So I've tried adding an earlier cache. I'll see if that helps. @typescript-bot pack this |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
For whatever reason, I'm still seeing at least one occurrence of an identical path array being passed in; but I think this is pretty ready for review. |
@typescript-bot perf test this |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
@typescript-bot pack this |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
…Specs`." This reverts commit 8919029.
Set
and a binary search to search for patterns on paths
@typescript-bot pack this |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
The current state of this just always caches the objects and divides the entries into exact specifiers (as a I'm open to re-introducing the binary search. It probably still makes sense to sort the array and iterate backwards to find a match and bail earlier when possible. But apparently we don't seem to have a test where suffix length matters in the case of |
@typescript-bot perf test this |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
src/compiler/utilities.ts
Outdated
return candidate; | ||
} | ||
export function matchPatternOrExact(patternOrStrings: ParsedPatterns, candidate: string): string | Pattern | undefined { | ||
const { matchableStringSet, patterns } = patternOrStrings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const { matchableStringSet, patterns } = patternOrStrings; | |
const { matchableStringSet, patterns } = parsedPatterns; | |
just because, well, it's not an array of pattern or string anymore. The comment above this function also needs updating.
When checking if a specifier satisfies a pattern (e.g. for a path mapping entry in
paths
), it seems like TypeScript needs to check every single available pattern to see if the specifier is matched.Here, I've slightly modified the code to create a
Set
of exact specifiers, and an array of all remainingPattern
s. TheSet
becomes a fast look-up, and the fallback becomes a walk across the array. A new difference is that once we've found a match, we don't bother checking any other patterns unless they have a longer prefix than the current match. That can help quickly dismiss irrelevant matches.This also adds a caches so that we don't end up re-creating these objects over and over.
In earlier iterations of this PR, the array of patterns was sorted and lookups on patterns could optimistically be done via a binary search; however, there's some nuance there with respect to how
paths
differs fromexports
, and we don't have enough test coverage there (nothing fails if you only sort by prefix specifier length!). So for now, I've stuck with a linear search.