-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc: use a trie for name-based search #133005
Conversation
Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. [^1]: technically utf-16 code units
} else { | ||
const sb = name.charCodeAt(substart); | ||
let child; | ||
if (this.children[sb] !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to check this.children.length < sb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a sparse array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this explanation and link on the field definition please. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it's added.
Apart from small nits, looks good to me. Performance improvement is really impressive! |
Some changes occurred in HTML/CSS/JS. cc @GuillaumeGomez, @jsha |
Thanks! @bors r+ |
…llaumeGomez Rollup of 5 pull requests Successful merges: - rust-lang#132172 (borrowck diagnostics: suggest borrowing function inputs in generic positions) - rust-lang#132649 (add ./x clippy ci) - rust-lang#133005 (rustdoc: use a trie for name-based search) - rust-lang#133034 (update download-rustc comments and default) - rust-lang#133036 (add myself into `users_on_vacation` on triagebot) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#133005 - notriddle:notriddle/trie-search, r=GuillaumeGomez rustdoc: use a trie for name-based search Potentially rust-lang#131156 — need to try reproducing the problem with `windows` Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. - It uses more RAM. - It's faster (assuming you don't swap thrash). [^1]: technically utf-16 code units
Potentially #131156 — need to try reproducing the problem with
windows
Preview and profiler results
Here's some quick profiling in Firefox done on the rust compiler docs:
Here's the results for the node.js profiler:
Here's a copy that you can use to try it out. Compare it with the nightly. Try typing
typecheckercontext
one character at a time, slowly.The fuzzy match algo is based on Fast String Correction with Levenshtein-Automata and the corresponding implementation code in moman and Lucene; the bit-packing representation comes from Lucene, but the actual matcher is more based on
fsc.py
. As suggested in the paper, a trie is used to represent the FSA dictionary.The same trie is used for prefix matching. Substring matching is done with a side table of three-character1 windows that point into the trie.
User-visible changes
I don't expect anybody to notice anything, but it does cause two changes:
Footnotes
technically utf-16 code units ↩ ↩2