-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C scanner infrastructure. #253
Conversation
autoload :FindFileScanner, 'command-t/scanner/file_scanner/find_file_scanner' | ||
autoload :GitFileScanner, 'command-t/scanner/file_scanner/git_file_scanner' | ||
autoload :RubyFileScanner, 'command-t/scanner/file_scanner/ruby_file_scanner' | ||
autoload :WatchmanFileScanner, 'command-t/scanner/file_scanner/watchman_file_scanner' | ||
|
||
def self.for_string s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want this to be called new
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. We could use a more descriptive word than "string" though. Maybe for_scanner_type
or something like that.
] | ||
Dir.chdir @path do | ||
CommandT::Paths.from_string `#{command}`, terminator, | ||
drop: drop, # Drop leading ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(kevincox) move this comment to the drop
method.
if @scan_dot_directories | ||
'' | ||
else | ||
'-not -path "*/.*/*" ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(kevincox) make sure there are tests for this.
spec/command-t/matcher_spec.rb
Outdated
@@ -6,7 +6,7 @@ | |||
|
|||
describe CommandT::Matcher do | |||
def matcher(*paths) | |||
scanner = OpenStruct.new(:paths => paths) | |||
scanner = OpenStruct.new(:c_paths => CommandT::Paths.from_array(paths)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(kevincox) revert this (c_paths
now does this conversion).
1cb57a4
to
a51f3ce
Compare
Ok I like the way this looks. It relies on virtual memory to prevent reallocations and copies but I think that it will work on all modern OSes. That will need to be tested. As for performance it is about 6x faster on my system. The most exciting part is that it is now "faster" then Let me know if you have any quelms and I will try and address them. |
d1e7f4c
to
cc6610f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just gave this a superficial look over. Looks promising!
I'll give this a deeper look and start testing it locally, but in the meantime I've added some minor nit-ish feedback.
ruby/command-t/ext/command-t/match.h
Outdated
// Struct for representing a collection of matches. | ||
typedef struct { | ||
int len; | ||
match_t matches[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent by 4 spaces.
|
||
thread_count = NIL_P(threads_option) ? 1 : NUM2LONG(threads_option); | ||
if (use_heap) { | ||
heap_matches = malloc(thread_count * limit * sizeof(match_t)); | ||
heap_matches = malloc(sizeof(matches_t) + (thread_count*limit+1) * sizeof(match_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spaces around the *
and +
operators
|
||
long len = RARRAY_LEN(source); | ||
long bufsize = sizeof(paths_t) + len * sizeof(match_t); | ||
paths_t *paths = mmap(NULL, bufsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It relies on virtual memory to prevent reallocations and copies but I think that it will work on all modern OSes. That will need to be tested.
If we want to pre-check for the existence of mmap
at build time we can add something to extconf.rb
(it's going to be something like have_func
or similar). But yes, we should have some fallback in place if it turns out that this isn't going to work on one of the major platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can spin up a windows VM today and try to compile vim and see if this works. If you could do the mac test it would be appreciated. This will work on the BSDs and Linux for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have been a bit ambitious. I'm having trouble compiling even master, I probably have a borked gem development setup but I can compile json
so I don't know what I'm doing wrong. Is command-t known to work on windows? If I feel brave I might give it another try tonight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has worked historically, but I personally haven't tested it on Windows for a long, long time (years). It's one of those things where I make a best effort to not break things, but ultimately we need a Windows-using maintainer to step forward if we want to really be on top of the Windows side, and in the absence of that then we just have to rely on user reports of stuff working (or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get some CI set up. There are free CI services for the big three platforms. Building and running the specs would help a lot even if not running vim.
|
||
def filter | ||
if @wildignore | ||
lambda {|p| !path_excluded? p } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after the {
end | ||
|
||
def update | ||
lambda {|count| progress_reporter.update count } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after the {
# Same in submodules | ||
git submodule --quiet foreach --recursive ' | ||
git ls-files --exclude-standard -z | | ||
awk "-vp=$path" "BEGIN{RS=\\"\\\\0\\";ORS =\\"\\\\0\\"}\\$0=p \\"#{File::SEPARATOR}\\" \\$0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add a comment somewhere near here saying what this awk
incantation does.
end | ||
|
||
def scanner_failed status, result | ||
return super if @nogit | ||
puts status.inspect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover debugging code?
autoload :FindFileScanner, 'command-t/scanner/file_scanner/find_file_scanner' | ||
autoload :GitFileScanner, 'command-t/scanner/file_scanner/git_file_scanner' | ||
autoload :RubyFileScanner, 'command-t/scanner/file_scanner/ruby_file_scanner' | ||
autoload :WatchmanFileScanner, 'command-t/scanner/file_scanner/watchman_file_scanner' | ||
|
||
def self.for_string s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. We could use a more descriptive word than "string" though. Maybe for_scanner_type
or something like that.
// order matters; we want this to be evaluated only after ruby.h | ||
#ifdef HAVE_PTHREAD_H | ||
#include <pthread.h> /* for pthread_create, pthread_join etc */ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is copy-pasta? No use of pthread_*
, qsort
, strncmp
in this file. Haven't checked all the headers included here, but I bet many of these include
statements can be deleted (heap.h
, matcher.h
etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, forgot to clean that up. FIxed.
// Licensed under the terms of the BSD 2-clause license. | ||
|
||
#include <ruby.h> | ||
#include <assert.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
Thanks for the look over, I have fixed all your comments. |
493d853
to
3249756
Compare
FWIW, I'd love to see this feature added. I work in a very large code base (approx. 1GB of source) and launching command-t usually takes around 10 seconds to build the cache from |
Thanks @sstallion, I kept forgetting to ping this. I would be very interested if you can time the next branch as well as this patch (also let me know if it works for you :) ) as well as just whatever scanner you use piped into /dev/null. If you aren't using the next branch I would expect most of the improvement to be from switching to it. But the should also provide some improvement over that. |
Sure thing. What's the best way to test? Normally I just interact with On Wed, Nov 16, 2016 at 3:45 PM, Kevin Cox notifications@github.com wrote:
|
Just running from vim would be useful. If you want there is also a script
to benchmark the scanners in the repo, the options are set by environment
variables which you can see in the source (I should probably open a ticket
to add some docs). This has the advantage of running a couple of times
automatically and not counting the matching.
As for timing to /dev/null I just meant running the commands in a shell.
(for example find or git ls-files)
|
Just an FYI, I haven't forgotten this, but it is a fairly invasive change so wanted it to "bake" a while, with at least @kevincox running it locally, and hopefully some others. Still interested in getting this into |
Totally reasonable to let this settle for a bit. For an update on my usage I'm using it at work every day on a modified Ubuntu 14.04 and at home most days on Arch Linux and haven't noticed any problems. @sstallion If you could try running it as your daily driver the testing would be much appreciated :) |
Sounds like a plan. I've been meaning to move over to next and retire my On Thu, Nov 17, 2016 at 4:45 AM, Kevin Cox notifications@github.com wrote:
|
All looks well so far, but I have noticed a compiler warning:
|
In general, I'm seeing a 2-3x increase in scanning speed over next. Great work! I am however seeing some warning messages in Vim that I need to dig into. Would you be opposed to having me submit a PR to your branch to fix these issues? |
Almost forgot:
There's still a bit of overhead. Displaying the file list takes around 2 seconds with this PR. It's much snappier and having the spinner is a nice touch - it beats staring at my terminal for 5-10 seconds at a time waiting for a refresh 👍 |
@sstallion Thanks for catching that warning. I might have some work to do on a 32bit system since mmap takes a size_t so it does seem like the right type to use. I'll take a look at it or if you can submit a PR that would be great. It does sound like a real error. Awesome, that speed improvement does look about right. Also consider that some of the overhead is from the first match (aka filter) (which is somewhat slower the subsequent matches since it does some precomputation which makes the next ones faster). Although as I say that it might be an interesting idea to move the mask generation into the scanner because then it can be done in parallel with the scanner executable. Since we are generally now speed limited by the external process this could completely hide that latency. I'll probably look into that in a future PR. Also what OS are you running? |
Great! This was tested on OS X El Capitan. I'll start digging in - so far I've only seen a couple of trivial problems. |
This is strange. While looking into some odd behavior, I swapped back to the next branch and I'm now seeing this branch outperform this PR. What was taking 10 seconds before is now less than one. If I switch back to this PR, it goes up to about 2 seconds. Nothing in my environment has changed, which leads me to think that my initial test may not have been completely correct. |
Hmm, that's interesting. Were your original tests with a hot cache? And what about the background CPU and RAM untilization? Also are you still using the same scanner? The Ruby scanner will be much slower then this version but shouldn't have a 10x difference between here and next. Git also does some clever things with the modified timestamps of files to speed up If you can run Another think is to make sure you have recompiled the library when switching branches. I'm very interested in why your performance just jumped for the next branch "without changing anything" |
This scanner allows low overhead reading of files. It can currently read paths from an array of strings (good for ruby-based scanners) and a terminated string (great for command executor scanners). This also converts the find scanner and the git scanner to use the string based infrastructure.
Takes the idea from this 2016 PR that never got merged: #253 of creating a large `mmap()`-backed allocation and having the scanner read into it from an external process. This is by no means finished yet, but I wanted to get it to the point where it runs, and it does run. Key changes to the scanner: - New `scanner_new_command()` function that takes a script command string and runs it, capturing the results. The output is stored in an `mmap()`-ed buffer, which is retained for the lifetime of the scanner. Associated `str_t` candidates also live in an `mmap()`-ed region. Their `contents` pointers point directly into the buffer, so there is no additional string copying. - The existing `scanner_new_copy()` is updated to use an `mmap()`-backed allocation as well, just for simplicity (so I don't have to maintain book-keeping data-structures for `free()`-ing `malloc()`-ed memory blocks and `munmap()`-ing `mmap()`-ed ones). I will include benchmarks here to "prove" that this doesn't seem to have had any unanticipated performance effects. - Deleted `scanner_new()` because it is was unused. As the only user of the `capacity` feature, I got rid of the `capacity` field. Because scanners use `mmap()` now, they can act as though they didn't have a capacity limit (technically, they do, but it is absurdly large and I can't see it being an issue in real workloads). - Switched from `xmalloc()` to `xcalloc()` in a couple of places to ensure the `scanner_t` struct fields all get initialized to zero. This is important so that we don't end up trying to `munmap()` uninitialized fields like `buffer` when using the `scanner_new_copy()` form of scanner. - The Watchman scanner continues to use `scanner_new_str()`, which is the non-copying version of the scanner. We don't want to call `munmap()` on those candidates, because they were allocated with `malloc(); I considered flagging this by setting `candidates_size` to zero, but given that there was no performance penalty detected by the benchmark run (see below), I opted to instead teach the Watchman scanner to use `xmap()` too. Perf delta info follows. Nothing to see here. The benchmarks use `scanner_new_copy()`, so we are indeed exercising the `mmap()`-ed code path: best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.28685 0.29972 0.11067 [-0.6%] (0.28685) (0.29980) (0.11089) [-1.4%] command-t 0.23675 0.24726 0.08370 [-0.1%] (0.23675) (0.24759) (0.08559) [-1.2%] chromium (subset) 1.63047 1.64562 0.10958 [+0.7%] 0.05 (0.42029) (0.43740) (0.19917) [+2.4%] chromium (whole) 1.88771 1.90492 0.09161 [+0.1%] (0.30267) (0.31403) (0.07257) [+1.6%] big (400k) 2.91391 2.95510 0.14608 [+0.2%] (0.45750) (0.47420) (0.07242) [+0.8%] total 6.96067 7.05262 0.29309 [+0.2%] (1.70912) (1.77301) (0.28325) [+0.7%] Differences from the PR: - That one leans heavily on Ruby-level stuff (ie. both in the calling code, and in the C layer) whereas this one is "pure" C. For example, the `popen` call in the PR is made from Ruby, and lots of Ruby values are passed back and forth. There are calls back into the Ruby layer to do filtering that are not implemented here (and I rather like the idea of keep things in "pure" C, so they never be). - There are lots of features in the PR that aren't implemented yet (and again, I haven't decided when/if I will implement them): things like control over the max files count, ability to pass in a `drop` value to skip over entries, ability to specify the field separator character etc.
To reflect indirect contribution referenced in the parent commit. Cross-referencing again: #253
Just to "close" the loop here after many years 🤦, over on the
The basic idea is to reset the whole thing onto some more modern underpinnings. Compatibility with Windows or Vim, for example, aren't on the priority list at this stage — just getting the thing to work for simple cases is the focus (without necessarily even many configuration options; I just want to pick some "reasonable defaults" and see them work first). My C is super rusty at this point (although it is becoming less so as I proceed with the rewrite), and while it is far from finished, it is showing a nice performance boost already. I started by porting the core matching/scoring code, then I ported the Watchman scanner. The numbers showed a roughly 2x speed-up on some of the metrics, even though the performance critical sections were still only going from C to C. Last night, I took the initial steps towards implementing the "C scanner infrastructure" proposed in this PR; I haven't tested it rigorously yet, and it doesn't have all the features or configuration options, but it works well enough to be considered a "blinking light" demo at least. This PR obviously won't apply to Here are some matcher numbers (I haven't set up scanner benchmarks yet, but I will). As always, CPU time is rendered without parens, wall-clock time (the actually interesting time) is rendered with parens. So, for example, we see average wall clock time for the "big" dataset go from 0.66s in Ruby to 0.47s in Lua. A smaller dataset like the "command-t" one goes from 0.48s in Ruby to 0.24s in Lua. The "pathological" dataset goes from 0.54s to 0.29s etc. Ruby (
|
That is exciting 🎉 As for this patch I no longer work with a large repo so have stopped using it myself. So I'll close it, of course the code will be archived in case it is interesting to anyone. If you want a second pair of eyes on the new patch feel free to ask for a review. But I'm afraid my C is growing rusty as well these days. |
Takes the idea from this 2016 PR that never got merged: wincent/command-t#253 of creating a large `mmap()`-backed allocation and having the scanner read into it from an external process. This is by no means finished yet, but I wanted to get it to the point where it runs, and it does run. Key changes to the scanner: - New `scanner_new_command()` function that takes a script command string and runs it, capturing the results. The output is stored in an `mmap()`-ed buffer, which is retained for the lifetime of the scanner. Associated `str_t` candidates also live in an `mmap()`-ed region. Their `contents` pointers point directly into the buffer, so there is no additional string copying. - The existing `scanner_new_copy()` is updated to use an `mmap()`-backed allocation as well, just for simplicity (so I don't have to maintain book-keeping data-structures for `free()`-ing `malloc()`-ed memory blocks and `munmap()`-ing `mmap()`-ed ones). I will include benchmarks here to "prove" that this doesn't seem to have had any unanticipated performance effects. - Deleted `scanner_new()` because it is was unused. As the only user of the `capacity` feature, I got rid of the `capacity` field. Because scanners use `mmap()` now, they can act as though they didn't have a capacity limit (technically, they do, but it is absurdly large and I can't see it being an issue in real workloads). - Switched from `xmalloc()` to `xcalloc()` in a couple of places to ensure the `scanner_t` struct fields all get initialized to zero. This is important so that we don't end up trying to `munmap()` uninitialized fields like `buffer` when using the `scanner_new_copy()` form of scanner. - The Watchman scanner continues to use `scanner_new_str()`, which is the non-copying version of the scanner. We don't want to call `munmap()` on those candidates, because they were allocated with `malloc(); I considered flagging this by setting `candidates_size` to zero, but given that there was no performance penalty detected by the benchmark run (see below), I opted to instead teach the Watchman scanner to use `xmap()` too. Perf delta info follows. Nothing to see here. The benchmarks use `scanner_new_copy()`, so we are indeed exercising the `mmap()`-ed code path: best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.28685 0.29972 0.11067 [-0.6%] (0.28685) (0.29980) (0.11089) [-1.4%] command-t 0.23675 0.24726 0.08370 [-0.1%] (0.23675) (0.24759) (0.08559) [-1.2%] chromium (subset) 1.63047 1.64562 0.10958 [+0.7%] 0.05 (0.42029) (0.43740) (0.19917) [+2.4%] chromium (whole) 1.88771 1.90492 0.09161 [+0.1%] (0.30267) (0.31403) (0.07257) [+1.6%] big (400k) 2.91391 2.95510 0.14608 [+0.2%] (0.45750) (0.47420) (0.07242) [+0.8%] total 6.96067 7.05262 0.29309 [+0.2%] (1.70912) (1.77301) (0.28325) [+0.7%] Differences from the PR: - That one leans heavily on Ruby-level stuff (ie. both in the calling code, and in the C layer) whereas this one is "pure" C. For example, the `popen` call in the PR is made from Ruby, and lots of Ruby values are passed back and forth. There are calls back into the Ruby layer to do filtering that are not implemented here (and I rather like the idea of keep things in "pure" C, so they never be). - There are lots of features in the PR that aren't implemented yet (and again, I haven't decided when/if I will implement them): things like control over the max files count, ability to pass in a `drop` value to skip over entries, ability to specify the field separator character etc.
To reflect indirect contribution referenced in the parent commit. Cross-referencing again: wincent/command-t#253
I'm probably going to convert the
string -> Paths
constructor to an FD based one. The string input is quite fast but it doesn't allow progress information or file limiting. File limiting could be done by the scanner withhead
but progress information would be difficult to do. It would also be faster as it can be done in parallel with the scanner itself.If you wouldn't mind giving a general overview it would be appreciated. I'm running this right now and it appears to be working. I have tested the
ruby
,find
andgit
scanners.Highlights:
TODO:
find
(if it helps, likely not)Fixes #249