Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneeded extra chars to reduce search-index size #56709

Merged
merged 2 commits into from
Dec 14, 2018

Conversation

GuillaumeGomez
Copy link
Member

Before:

2013782 Dec 11 10:16 build/x86_64-unknown-linux-gnu/doc/search-index.js

After:

1736597 Dec 11 10:50 build/x86_64-unknown-linux-gnu/doc/search-index.js

No changes in the output of the search.

r? @QuietMisdreavus

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 11, 2018
@QuietMisdreavus
Copy link
Member

The plain_summary_line function is not just used for the search index. It's also used for item listings on module pages, so you're effectively truncating those listings as well.

You're also not accounting for combined characters, so this will cut between a letter and a combined accent if it happens to land on that boundary.

Since this is going to be used for user-displayed text, i'd prefer it if this used a smarter means of collecting characters (maybe use split_whitespace and count codepoints as you add words to the final string?) and ended with a "..." if we're cutting it off.

@GuillaumeGomez
Copy link
Member Author

Arf, I thought combined accent was accounted for one char in Rust (which is why I did it this way...). I'll add a parameter to only do it when required.

@QuietMisdreavus
Copy link
Member

Nah, chars() only splits at codepoints, it doesn't account for what a printable "character" is. So something with combining characters, where multiple codepoints converge on the same screen location, will still have multiple items in that iterator. (If you want to step up to printable "graphemes", that's a whole other crate, unicode-segmentation - but since this is mainly a file-size optimization and not a screen-size one, i'm not sure it will be worth bringing in the new crate. It is worth considering that this is a sort of penalty for languages with lots combining characters, though.)

@GuillaumeGomez
Copy link
Member Author

Then I can just split on whitespaces and as long as we're under the limit, we'll add the word.

@GuillaumeGomez
Copy link
Member Author

Updated.

@GuillaumeGomez GuillaumeGomez force-pushed the reduce-search-index branch 2 times, most recently from be9a4db to 50a274a Compare December 11, 2018 22:58
@GuillaumeGomez
Copy link
Member Author

Updated.

Copy link
Member

@QuietMisdreavus QuietMisdreavus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good now, thanks so much!

@QuietMisdreavus
Copy link
Member

r=me when travis is green.

@GuillaumeGomez
Copy link
Member Author

@bors: r=QuietMisdreavus

@bors
Copy link
Contributor

bors commented Dec 13, 2018

📌 Commit 987bf2e has been approved by QuietMisdreavus

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 13, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Dec 14, 2018
… r=QuietMisdreavus

Remove unneeded extra chars to reduce search-index size

Before:

```
2013782 Dec 11 10:16 build/x86_64-unknown-linux-gnu/doc/search-index.js
```

After:

```
1736597 Dec 11 10:50 build/x86_64-unknown-linux-gnu/doc/search-index.js
```

No changes in the output of the search.

r? @QuietMisdreavus
kennytm added a commit to kennytm/rust that referenced this pull request Dec 14, 2018
… r=QuietMisdreavus

Remove unneeded extra chars to reduce search-index size

Before:

```
2013782 Dec 11 10:16 build/x86_64-unknown-linux-gnu/doc/search-index.js
```

After:

```
1736597 Dec 11 10:50 build/x86_64-unknown-linux-gnu/doc/search-index.js
```

No changes in the output of the search.

r? @QuietMisdreavus
bors added a commit that referenced this pull request Dec 14, 2018
Rollup of 14 pull requests (first batch)

Successful merges:

 - #56562 (Update libc version required by rustc)
 - #56609 (Unconditionally emit the target-cpu LLVM attribute.)
 - #56637 (rustdoc: Fix local reexports of proc macros)
 - #56658 (Add non-panicking `maybe_new_parser_from_file` variant)
 - #56695 (Fix irrefutable matches on integer ranges)
 - #56699 (Use a `newtype_index!` within `Symbol`.)
 - #56702 ([self-profiler] Add column for percent of total time)
 - #56708 (Remove some unnecessary feature gates)
 - #56709 (Remove unneeded extra chars to reduce search-index size)
 - #56744 (specialize: remove Boxes used by Children::insert)
 - #56748 (Update panic message to be clearer about env-vars)
 - #56749 (x86: Add the `adx` target feature to whitelist)
 - #56756 (Disable btree pretty-printers on older gdbs)
 - #56789 (rustc: Add an unstable `simd_select_bitmask` intrinsic)

r? @ghost
@bors bors merged commit 987bf2e into rust-lang:master Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants