Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[READY] Add benchmark infrastructure #769

Merged
merged 1 commit into from
Jun 5, 2017

Conversation

micbou
Copy link
Collaborator

@micbou micbou commented Jun 2, 2017

This PR sets the infrastructure for adding benchmarks through the Google benchmark library and for automatically running them on Travis and AppVeyor. They can also be run locally with the benchmark.py script. The library is included in the repository for compilation ease. Benchmarks are run on all platforms because optimizations may be platform-dependent.

For now, there is only one benchmark based on the output of Program 2 in ycm-core/YouCompleteMe#2668. It measures the filter and sort algorithm on a worst-case scenario: all identifiers share a common prefix and the query is part of the prefix. In that case, no identifiers are filtered and since they all have the same weight, the algorithm falls back to lexicographic sorting. This scenario is not uncommon in practice. For instance, C libraries often use a common prefix for naming variables and functions to simulate namespaces.

Here's the output of the benchmark on my configuration:

------------------------------------------------------------------------------------------
Benchmark                                       Time           CPU Iterations
------------------------------------------------------------------------------------------
CandidatesWithCommonPrefix_bench/1           1955 ns       1898 ns     345165
CandidatesWithCommonPrefix_bench/2          11563 ns      11681 ns      64102
CandidatesWithCommonPrefix_bench/4          30761 ns      30594 ns      22436
CandidatesWithCommonPrefix_bench/8          69551 ns      69532 ns      11218
CandidatesWithCommonPrefix_bench/16        143963 ns     143924 ns       4986
CandidatesWithCommonPrefix_bench/32        292668 ns     290603 ns       2362
CandidatesWithCommonPrefix_bench/64        862766 ns     869571 ns        897
CandidatesWithCommonPrefix_bench/128      2205099 ns    2191318 ns        299
CandidatesWithCommonPrefix_bench/256      8895499 ns    8840057 ns         90
CandidatesWithCommonPrefix_bench/512     17704787 ns   17680113 ns         45
CandidatesWithCommonPrefix_bench/1024    45564517 ns   45760293 ns         15
CandidatesWithCommonPrefix_bench/2048    96960893 ns   98057771 ns          7
CandidatesWithCommonPrefix_bench/4096   217881085 ns  218401400 ns          3
CandidatesWithCommonPrefix_bench/8192   481444392 ns  483603100 ns          2
CandidatesWithCommonPrefix_bench/16384 1005462405 ns  982806300 ns          1
CandidatesWithCommonPrefix_bench/32768 1805209871 ns 1809611600 ns          1
CandidatesWithCommonPrefix_bench/65536 4215533125 ns 4212027000 ns          1
CandidatesWithCommonPrefix_bench_BigO     3979.06 NlgN    3974.50 NlgN
CandidatesWithCommonPrefix_bench_RMS           10 %          9 %

As you can see, performance becomes unacceptable starting from 16000 identifiers which is not a lot. A great feature of Google benchmark is that it can calculate the algorithm complexity. As expected, we have a O(n log n) complexity where n is the number of candidates (we are using std::sort to sort our candidates).

Thanks to this benchmark, I was able to improve the performance on this particular case by a factor of 60. I'll send the changes once this PR is merged.


This change is Reviewable

@codecov-io
Copy link

codecov-io commented Jun 2, 2017

Codecov Report

Merging #769 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #769      +/-   ##
==========================================
+ Coverage   94.52%   94.52%   +<.01%     
==========================================
  Files          79       79              
  Lines        5276     5278       +2     
  Branches      158      158              
==========================================
+ Hits         4987     4989       +2     
  Misses        246      246              
  Partials       43       43

@bstaletic
Copy link
Collaborator

I have tested this last night and confirmed that it works as expected. I did leave a minor comment, but this is :lgtm: any way.


Reviewed 9 of 59 files at r1.
Review status: 9 of 59 files reviewed at latest revision, 1 unresolved discussion.


cpp/ycm/CMakeLists.txt, line 231 at r1 (raw file):

                             benchmarks/*.h benchmarks/*.cpp
                             CMakeFiles/*.cpp
                             *client* )

*client* ) could be joined with the line above. Or every glob in this file() statement can be on a different line.

No strong opinion, feel free to disregard this comment.


Comments from Reviewable

@Valloric
Copy link
Member

Valloric commented Jun 2, 2017

I've used Google's C++ benchmarking library extensively when I worked there; the API seems to have changed a bit, but it's largely the same thing. The library was great back then and I don't expect that to have changed.

@micbou Thanks for setting this up! I never really benchmarked the filter/sort part; it was plenty fast for all the use-cases I could come up with years ago. But if others have found cases where it's slow and we could make it faster, great!

The complexity of the algo being O(NlogN) is indeed expected. Back in 2011 when I was still thinking about starting YCM, I remember coming up with a proof that it's not possible to perform subsequence matching on a list of strings (the strings being known ahead of time, but not which subset of the strings or their order) faster than O(NlogN). That's just Big Oh, so plenty of constant factors to optimize.

:lgtm:


Review status: 9 of 59 files reviewed at latest revision, 2 unresolved discussions.


cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 28 at r1 (raw file):

    benchmark::State& state ) {

  // Generate a list of candidates of the form a_a_a_[a-z]{5}.

You might want to put state.PauseTiming(); before this block and state.ResumeTiming(); right after it. Otherwise the setup time will be included in your benchmark run. It will get spread over all the iterations, but still.


Comments from Reviewable

@micbou
Copy link
Collaborator Author

micbou commented Jun 3, 2017

I added a method to CandidateRepository for clearing the candidates. This is needed to isolate the benchmarks (and the CandidateReposity tests). If we don't clear the candidates, measurements are influenced by previous runs.

Also, I changed the benchmark to use a query with a lowercase and an uppercase letter. This makes the benchmark more relevant because if we want to improve performance against it, we need a solution that deal with both kinds of case, not just lowercase.


Reviewed 59 of 59 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 1 of 1 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


cpp/ycm/CMakeLists.txt, line 231 at r1 (raw file):

Previously, bstaletic wrote…

*client* ) could be joined with the line above. Or every glob in this file() statement can be on a different line.

No strong opinion, feel free to disregard this comment.

Done.


cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 28 at r1 (raw file):

Previously, Valloric (Val Markovic) wrote…

You might want to put state.PauseTiming(); before this block and state.ResumeTiming(); right after it. Otherwise the setup time will be included in your benchmark run. It will get spread over all the iterations, but still.

Isn't the measurement only occurring in the KeepRunning loop?


cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 43 at r4 (raw file):

  }

  IdentifierCompleter completer( candidates );

Moved this out of the main loop because we don't want to measure the addition of candidates to the repository.


Comments from Reviewable

@vheon
Copy link
Contributor

vheon commented Jun 4, 2017

I know that is easier for us but are we sure we want to keep the benchmarks library in the source tree? I'm only saying this because people already complains about the size of the project when they have to download it and this only increase the burden 😕 I understand that it would be more fragile but we could clone the library "on demand" on the first invocation of the benchmark.py script if the library was not already there.


Reviewed 6 of 59 files at r1, 1 of 1 files at r2, 1 of 3 files at r3, 1 of 1 files at r4, 1 of 3 files at r5.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@micbou
Copy link
Collaborator Author

micbou commented Jun 4, 2017

The benchmark folder only takes 336 KB. That's negligible compared to the ~370 MB that ycmd takes after installation. In addition, like the gmock folder, we will never update it so it will only increase Git history once.


Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@Valloric
Copy link
Member

Valloric commented Jun 4, 2017

@vheon Complaints about YCM on-disk size have always been silly to me. It's a few hundred MB, who cares. Disks are both huge and cheap.


Review status: all files reviewed at latest revision, 2 unresolved discussions.


cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 28 at r1 (raw file):

Previously, micbou wrote…

Isn't the measurement only occurring in the KeepRunning loop?

Why would it be? The API has changed since the last time I used it, but unless the first call to KeepRunning mutates some internal state so that measurement starts (quite possible), the whole benchmark time will be measured from the start of the function.

That's the way it worked before at least.


Comments from Reviewable

@bstaletic
Copy link
Collaborator

I measured YCM/ycmd disk usage today. It's a bit over 200MB. You decide if that is big.


Reviewed 1 of 1 files at r2, 2 of 3 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@Valloric
Copy link
Member

Valloric commented Jun 4, 2017

@bstaletic It's not something we should worry about at all IMO. If it's less than 1GB, I don't care.


Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@puremourning
Copy link
Member

Reviewed 1 of 59 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@puremourning
Copy link
Member

:lgtm:


Reviewed 56 of 59 files at r1, 1 of 1 files at r2, 1 of 3 files at r3, 1 of 1 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


benchmark.py, line 25 at r5 (raw file):

  parser.add_argument( '--msvc', type = int, choices = [ 12, 14, 15 ],
                       help = 'Choose the Microsoft Visual '
                       'Studio version. (default: 15).' )

maybe i'm being thick, but what ensures the default is 15? it might be implicit, but i think i would prefer default=15 if that's a thing.


cpp/ycm/benchmarks/IdentifierCompleter_bench.cpp, line 28 at r1 (raw file):

Previously, Valloric (Val Markovic) wrote…

Why would it be? The API has changed since the last time I used it, but unless the first call to KeepRunning mutates some internal state so that measurement starts (quite possible), the whole benchmark time will be measured from the start of the function.

That's the way it worked before at least.

As I recall from using it before, the first call to KeepRunning starts the timing for an iteration, and subsequent calls end/restart. Seems to be backed up here: https://github.com/google/benchmark/blob/93bfabc8b89270c00816c3cafb601475eaf85364/include/benchmark/benchmark_api.h#L334


Comments from Reviewable

@micbou
Copy link
Collaborator Author

micbou commented Jun 4, 2017

Reviewed 2 of 3 files at r6, 1 of 3 files at r8.
Review status: 62 of 64 files reviewed at latest revision, 3 unresolved discussions.


benchmark.py, line 25 at r5 (raw file):

Previously, puremourning (Ben Jackson) wrote…

maybe i'm being thick, but what ensures the default is 15? it might be implicit, but i think i would prefer default=15 if that's a thing.

Copy-paste from the run_tests.py script. I've changed it in both scripts.


Comments from Reviewable

@vheon
Copy link
Contributor

vheon commented Jun 4, 2017

:lgtm:


Reviewed 1 of 59 files at r1, 3 of 3 files at r8.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

@micbou
Copy link
Collaborator Author

micbou commented Jun 4, 2017

I've squashed the commits. PR is ready.


Reviewed 1 of 3 files at r8.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@micbou micbou changed the title [RFC] Add benchmark infrastructure [READY] Add benchmark infrastructure Jun 4, 2017
@bstaletic
Copy link
Collaborator

@zzbot r+


Reviewed 1 of 3 files at r5, 2 of 3 files at r6, 2 of 3 files at r8.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@zzbot
Copy link
Contributor

zzbot commented Jun 5, 2017

📌 Commit dff0884 has been approved by bstaletic

@zzbot
Copy link
Contributor

zzbot commented Jun 5, 2017

⌛ Testing commit dff0884 with merge 64ddce4...

zzbot added a commit that referenced this pull request Jun 5, 2017
[READY] Add benchmark infrastructure

This PR sets the infrastructure for adding benchmarks through [the Google benchmark library](https://github.com/google/benchmark) and for automatically running them on Travis and AppVeyor. They can also be run locally with the `benchmark.py` script. The library is included in the repository for compilation ease. Benchmarks are run on all platforms because optimizations may be platform-dependent.

For now, there is only one benchmark based on the output of *Program 2* in ycm-core/YouCompleteMe#2668. It measures the filter and sort algorithm on a worst-case scenario: all identifiers share a common prefix and the query is part of the prefix. In that case, no identifiers are filtered and since they all have the same weight, the algorithm falls back to lexicographic sorting. This scenario is not uncommon in practice. For instance, C libraries often use a common prefix for naming variables and functions to simulate namespaces.

Here's the output of the benchmark on my configuration:
```
------------------------------------------------------------------------------------------
Benchmark                                       Time           CPU Iterations
------------------------------------------------------------------------------------------
CandidatesWithCommonPrefix_bench/1           1955 ns       1898 ns     345165
CandidatesWithCommonPrefix_bench/2          11563 ns      11681 ns      64102
CandidatesWithCommonPrefix_bench/4          30761 ns      30594 ns      22436
CandidatesWithCommonPrefix_bench/8          69551 ns      69532 ns      11218
CandidatesWithCommonPrefix_bench/16        143963 ns     143924 ns       4986
CandidatesWithCommonPrefix_bench/32        292668 ns     290603 ns       2362
CandidatesWithCommonPrefix_bench/64        862766 ns     869571 ns        897
CandidatesWithCommonPrefix_bench/128      2205099 ns    2191318 ns        299
CandidatesWithCommonPrefix_bench/256      8895499 ns    8840057 ns         90
CandidatesWithCommonPrefix_bench/512     17704787 ns   17680113 ns         45
CandidatesWithCommonPrefix_bench/1024    45564517 ns   45760293 ns         15
CandidatesWithCommonPrefix_bench/2048    96960893 ns   98057771 ns          7
CandidatesWithCommonPrefix_bench/4096   217881085 ns  218401400 ns          3
CandidatesWithCommonPrefix_bench/8192   481444392 ns  483603100 ns          2
CandidatesWithCommonPrefix_bench/16384 1005462405 ns  982806300 ns          1
CandidatesWithCommonPrefix_bench/32768 1805209871 ns 1809611600 ns          1
CandidatesWithCommonPrefix_bench/65536 4215533125 ns 4212027000 ns          1
CandidatesWithCommonPrefix_bench_BigO     3979.06 NlgN    3974.50 NlgN
CandidatesWithCommonPrefix_bench_RMS           10 %          9 %
```
As you can see, performance becomes unacceptable starting from 16000 identifiers which is not a lot. A great feature of Google benchmark is that it can calculate the algorithm complexity. As expected, we have a `O(n log n)` complexity where `n` is the number of candidates (we are using `std::sort` to sort our candidates).

Thanks to this benchmark, I was able to improve the performance on this particular case by a factor of 60. I'll send the changes once this PR is merged.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/769)
<!-- Reviewable:end -->
@zzbot
Copy link
Contributor

zzbot commented Jun 5, 2017

☀️ Test successful - status-travis
Approved by: bstaletic
Pushing 64ddce4 to master...

@zzbot zzbot merged commit dff0884 into ycm-core:master Jun 5, 2017
@micbou micbou deleted the google-benchmark branch June 14, 2017 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants