-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[READY] Implement partial sorting #825
Conversation
Reviewed 12 of 12 files at r1. cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 55 at r1 (raw file):
The cpp/ycm/benchmarks/PythonSupport_bench.cpp, line 56 at r1 (raw file):
What does Comments from Reviewable |
Reviewed 2 of 12 files at r1. cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 55 at r1 (raw file): Previously, bstaletic (Boris Staletic) wrote…
cpp/ycm/benchmarks/PythonSupport_bench.cpp, line 56 at r1 (raw file): Previously, bstaletic (Boris Staletic) wrote…
Comments from Reviewable |
Codecov Report
@@ Coverage Diff @@
## master #825 +/- ##
==========================================
- Coverage 94.82% 94.82% -0.01%
==========================================
Files 79 79
Lines 5374 5389 +15
Branches 168 169 +1
==========================================
+ Hits 5096 5110 +14
Misses 231 231
- Partials 47 48 +1 |
I'm sorry, but the PR description isn't at all clear to me. I've read it twice now and I still have no clue what this PR is trying to accomplish. :D It might have just been a long day at work, but still, could this be clarified? Perhaps with some examples. Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. Comments from Reviewable |
@Valloric Let me try to explain. Current situationIdentifier completer - returns 10 candidates. Sorting - This pull requestIdentifier completer - returns 10 sorted candidates. Sorting - Partially sort those 50 (by default) candidates following this algorithm.
This results in much faster sorting of candidates. Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. Comments from Reviewable |
Sorry for being unclear. The idea is that, since the maximum number of identifiers returned to the user is (by default) 10, we don't need to sort all of them but only the 10 smallest. This is significantly faster when the number of identifiers is big. The same can be done for non-identifier candidates if we limit them too. The rational in limiting them is that, in addition to improving performance, it's not particularly useful to return thousands of candidates. Users will never go through all of them. @bstaletic did a good summary except that the PR doesn't actually set the maximum number of non-identifier candidates returned to the user to 50 but give the possibility to limit this number by adding an optional parameter to the Reviewed 9 of 12 files at r1. Comments from Reviewable |
@micbou Taking another look at the code, the optional argument determining the max number of candidates is initialised defaulting to 0. with 0 having a special meaning of returning everything without any limitation on the number of returned candidates. I really think we shuld default to some other value. Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. Comments from Reviewable |
062d3c1
to
2a071f8
Compare
The default is Reviewed 1 of 12 files at r1, 1 of 1 files at r2. Comments from Reviewable |
Fair point. Now why is codecov still reporting the same coverage after adding another test in r2 of this PR? Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Because the test only sorts the 2 smallest identifiers while it should sort at least 1024 if we want the other part of Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
2a071f8
to
0270c33
Compare
With the updated test this is . Reviewed 1 of 1 files at r2, 1 of 1 files at r3. Comments from Reviewable |
0270c33
to
9b5a88e
Compare
New test improved coverage. This is ready. Reviewed 1 of 1 files at r3. Comments from Reviewable |
9b5a88e
to
51c7331
Compare
Thanks for the extra clarification folks! I get what this is trying to do now. Seems like a sensible change. Minor inline comments, but otherwise . Review status: 12 of 13 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. cpp/ycm/Utils.h, line 191 at r4 (raw file):
"or more" -> "or larger" cpp/ycm/Utils.h, line 210 at r4 (raw file):
IMO the right shift is too clever. Just use Comments from Reviewable |
Add an option to limit the number of candidates to sort and filter. Partially sort the list of candidates according to this limit.
51c7331
to
1e00ddd
Compare
Reviewed 1 of 1 files at r4, 1 of 1 files at r5. cpp/ycm/Utils.h, line 191 at r4 (raw file): Previously, Valloric (Val Markovic) wrote…
Done. cpp/ycm/Utils.h, line 210 at r4 (raw file): Previously, Valloric (Val Markovic) wrote…
Done. Comments from Reviewable |
Thanks! @zzbot r=bstaletic Review status: all files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
📌 Commit 1e00ddd has been approved by |
[READY] Implement partial sorting This PR adds an optional parameter to the `CandidatesForQueryAndType` and `FilterAndSortCandidates` functions that limits the number of candidates returned by these functions. The candidates are partially sorted according to this limit (i.e. only the `n` smallest candidates are sorted). The current behavior is unchanged: identifiers are still limited to `10` by default and there is no limit for other kind of candidates. The plan is to add a new setting (e.g. `g:ycm_max_num_candidates`) to limit the number of non-identifier candidates. `50` seems a reasonable choice. The `PartialSort` function implements the following strategy: - if the number of elements to sort is less than `1024` or represents less than `1/64` the total number of elements, use `std::partial_sort`; - otherwise, use `std::nth_element` then `std::sort` on the `n`th smallest elements. This heuristic was obtained by comparing the performance of these 3 algorithms: - [`std::sort`](http://en.cppreference.com/w/cpp/algorithm/sort); - [`std::partial_sort`](http://en.cppreference.com/w/cpp/algorithm/partial_sort); - [`std::nth_element`](http://en.cppreference.com/w/cpp/algorithm/nth_element) + `std::sort`. They were tested on randomly generated strings of 20 characters with different values for the total number of strings and the number of strings to sort. Here is the graph of an experiment with 2<sup>10</sup> random strings: ![partial-sorting-comparison](https://user-images.githubusercontent.com/10026824/29936611-2c05ceaa-8e83-11e7-8303-a08c39afcae0.png) The observation is that, when the number of elements to sort is small, `std::partial_sort` outperforms other algorithms while `std::nth_element` + `std::sort` always beats `std::sort` alone in other cases. This experiment can be reproduced by checking out [this branch](https://github.com/micbou/ycmd/tree/partial-sorting-bench) and running the script `plot_bench.py` (the `matplotlib` package must be installed to draw the graph): ``` ./plot_bench.py --bench bench.log ``` The number of elements can be modified by editing [this variable](https://github.com/micbou/ycmd/blob/41d0b5e1735fa917c071421831b48111a8cb6621/cpp/ycm/benchmarks/PartialSorting_bench.cpp#L205). Finally, [here](https://gist.github.com/micbou/424944dafd8150aa11dc6d8cfe49f624) are the benchmark numbers obtained with these changes on my config. Conclusions are: - no performance loss when all candidates are returned; - a ~36% speedup in sorting and filtering identifiers with `max_min_identifier_candidates` default value. - when limiting the number of non-identifier candidates to `50`, sorting and filtering non-identifier candidates is ~5% faster if candidates are not already stored in the repository and ~24% if they are. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/825) <!-- Reviewable:end -->
💔 Test failed - status-travis |
☀️ Test successful - status-travis |
[READY] Add max_num_candidates option Add the `max_num_candidates` option to limit the number of non-identifier candidates returned by ycmd. Its default value is set to 50 as this seems a good compromise between performance and the amount of candidates shown to the user (10 candidates like for identifiers would be too low). Limiting the number of candidates does not only improve performance when sorting them (see PR #825) but also when sending the response to the client as the response is smaller and when clients are populating the completion menu since there are less candidates. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/830) <!-- Reviewable:end -->
[READY] Add max_num_candidates option Add the `max_num_candidates` option to limit the number of non-identifier candidates returned by ycmd. Its default value is set to 50 as this seems a good compromise between performance and the amount of candidates shown to the user (10 candidates like for identifiers would be too low). Limiting the number of candidates does not only improve performance when sorting them (see PR #825) but also when sending the response to the client as the response is smaller and when clients are populating the completion menu since there are less candidates. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/830) <!-- Reviewable:end -->
[READY] Update ycmd This new version of ycmd includes the following changes: - PR ycm-core/ycmd#795: add option to make relative paths in flags from extra conf absolute; - PR ycm-core/ycmd#802: fix compilation on Haiku; - PR ycm-core/ycmd#804: add libclang detection on FreeBSD; - PR ycm-core/ycmd#808: write python used during build before installing completers; - PR ycm-core/ycmd#810: support unknown languages from tags; - PR ycm-core/ycmd#811: update Universal Ctags languages list; - PR ycm-core/ycmd#814: resolve symlinks in extra conf glob patterns; - PR ycm-core/ycmd#815: update JediHTTP; - PR ycm-core/ycmd#816: update Boost to 1.65.0; - PR ycm-core/ycmd#819: filter and sort candidates when query is empty; - PR ycm-core/ycmd#820: improve LLVM root path search for prebuilt binaries; - PR ycm-core/ycmd#822: inline critical utility functions; - PR ycm-core/ycmd#824: do not sort header paths in filename completer; - PR ycm-core/ycmd#825: implement partial sorting; - PR ycm-core/ycmd#830: add max_num_candidates option; - PR ycm-core/ycmd#831: fix multiline comments and strings issues; - PR ycm-core/ycmd#832: update Clang to 5.0.0. The `g:ycm_max_num_candidates` and `g:ycm_max_num_identifier_candidates` options are added to the documentation. The link to ycmd extra conf is updated. Fixes #2562. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/youcompleteme/2768) <!-- Reviewable:end -->
[READY] Update ycmd This new version of ycmd includes the following changes: - PR ycm-core/ycmd#795: add option to make relative paths in flags from extra conf absolute; - PR ycm-core/ycmd#802: fix compilation on Haiku; - PR ycm-core/ycmd#804: add libclang detection on FreeBSD; - PR ycm-core/ycmd#808: write python used during build before installing completers; - PR ycm-core/ycmd#810: support unknown languages from tags; - PR ycm-core/ycmd#811: update Universal Ctags languages list; - PR ycm-core/ycmd#814: resolve symlinks in extra conf glob patterns; - PR ycm-core/ycmd#815: update JediHTTP; - PR ycm-core/ycmd#816: update Boost to 1.65.0; - PR ycm-core/ycmd#819: filter and sort candidates when query is empty; - PR ycm-core/ycmd#820: improve LLVM root path search for prebuilt binaries; - PR ycm-core/ycmd#822: inline critical utility functions; - PR ycm-core/ycmd#824: do not sort header paths in filename completer; - PR ycm-core/ycmd#825: implement partial sorting; - PR ycm-core/ycmd#830: add max_num_candidates option; - PR ycm-core/ycmd#831: fix multiline comments and strings issues; - PR ycm-core/ycmd#832: update Clang to 5.0.0. The `g:ycm_max_num_candidates` and `g:ycm_max_num_identifier_candidates` options are added to the documentation. The link to ycmd extra conf is updated. Fixes #2562. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/youcompleteme/2768) <!-- Reviewable:end -->
This PR adds an optional parameter to the
CandidatesForQueryAndType
andFilterAndSortCandidates
functions that limits the number of candidates returned by these functions. The candidates are partially sorted according to this limit (i.e. only then
smallest candidates are sorted).The current behavior is unchanged: identifiers are still limited to
10
by default and there is no limit for other kind of candidates. The plan is to add a new setting (e.g.g:ycm_max_num_candidates
) to limit the number of non-identifier candidates.50
seems a reasonable choice.The
PartialSort
function implements the following strategy:1024
or represents less than1/64
the total number of elements, usestd::partial_sort
;std::nth_element
thenstd::sort
on then
th smallest elements.This heuristic was obtained by comparing the performance of these 3 algorithms:
std::sort
;std::partial_sort
;std::nth_element
+std::sort
.They were tested on randomly generated strings of 20 characters with different values for the total number of strings and the number of strings to sort. Here is the graph of an experiment with 210 random strings:
The observation is that, when the number of elements to sort is small,
std::partial_sort
outperforms other algorithms whilestd::nth_element
+std::sort
always beatsstd::sort
alone in other cases.This experiment can be reproduced by checking out this branch and running the script
plot_bench.py
(thematplotlib
package must be installed to draw the graph):The number of elements can be modified by editing this variable.
Finally, here are the benchmark numbers obtained with these changes on my config. Conclusions are:
max_min_identifier_candidates
default value.50
, sorting and filtering non-identifier candidates is ~5% faster if candidates are not already stored in the repository and ~24% if they are.This change is