Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example of dynamic results provider, using stdout #353

Conversation

ImmemorConsultrixContrarie
Copy link
Contributor

Something like this could be used for #302. I can't test the code, because I'm too lazy to run it without vim, and I can't run it with vim. But I tried to keep code simple enough.

I don't think that this PR could be simply merged, because I just pulled it together to the point where cargo check stopped producing errors. And thus I'm making this PR a draft.

Dynamic filtering function moved into `vim-clap/src/cmd/filter.rs`.
@ImmemorConsultrixContrarie
Copy link
Contributor Author

I still need to know what you want to do with RPC on higher level, so can't connect that function with your rpc module, and still can't test it, so I don't know if it works at all.

@liuchengxu
Copy link
Owner

liuchengxu commented Mar 21, 2020

@ImmemorConsultrixContrarie Can we make another PR for these improvements that not related the dynamic update part, that would make the review easier.

Here is the dynamic update in my mind: given the query and list of candidates, do the filtering and ranking, this could take a while, so we need to update the top N scored candidates and print the top N to stdout periodically, e.g., print new top N every 50ms if they changes. For the test purpose at the moment, I believe you could write a test to mock this process? I can take the rest of work if this core part is done.

Or we could use a channel to notify the top N update? It's just an idea, I don't know if it's really suitable for our case.

…lts.

Added an interactive test for dynamic filter.

Moved some tests into `mod tests` behind a [cfg(test)].
@ImmemorConsultrixContrarie
Copy link
Contributor Author

Can we make another PR for these improvements that not related the dynamic update part, that would make the review easier.

Welp, theoretically, yes. But I'm too lazy to do it, so no. You can git rebase it as you like, though.

…g and testing easier.

Changed some tests in new cargo-workspace to run on Windows, and some tests to run at all.
@ImmemorConsultrixContrarie
Copy link
Contributor Author

ImmemorConsultrixContrarie commented Mar 23, 2020

Okay, the last commit breaks something, as I can't compile maple now with "error: couldn't read maple\build.rs".
@liuchengxu , does it compile for you?

Update: okay, that's 'cuz I forgot to move "vim-clap/build.rs", after that everything's fine. Will write a fixup right now.

@liuchengxu
Copy link
Owner

@ImmemorConsultrixContrarie mv build.rs crates/maple should fix that problem. But I don't do recommend removing all the maple code to crates/maple as that also breasks the vimscript code. I prefer to make crates/maple a lib, keep the current src/main.rs to call the run() funnction from crates/maple, that will not break any thing.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

@ImmemorConsultrixContrarie mv build.rs crates/maple should fix that problem. But I don't do recommend removing all the maple code to crates/maple as that also breasks the vimscript code. I prefer to make crates/maple a lib, keep the current src/main.rs to call the run() funnction from crates/maple, that will not break any thing.

So, three files: "vim-clap/build.rs" with "vim-clap/src/main.rs" and "vim-clap/Cargo.toml". Should it look like this?

@liuchengxu
Copy link
Owner

liuchengxu commented Mar 23, 2020

So, three files: "vim-clap/build.rs" with "vim-clap/src/main.rs" and "vim-clap/Cargo.toml". Should it look like this?

Yes! And we could name the lib crates/maple to crates/maple_cli or other something. Then nothing will be broken I think.

Now there's "crates/maple_cli/lib.rs" with most code living there.
@ImmemorConsultrixContrarie ImmemorConsultrixContrarie marked this pull request as ready for review March 23, 2020 20:00
@ImmemorConsultrixContrarie
Copy link
Contributor Author

Resolved.
@liuchengxu

@liuchengxu
Copy link
Owner

I have cherry-picked the last several commits unrelated to the dynamic update part into the master, you can merge the latest master again to resolve the confilicts, and then I think we could focus on the core part of this PR. @ImmemorConsultrixContrarie

@ImmemorConsultrixContrarie
Copy link
Contributor Author

cc @liuchengxu

let mut past = std::time::Instant::now();
iter.for_each(|(text, score, indices)| {
// Best results are stored in front.
//XXX I can't say, if bigger score is better or not. Let's assume the bigger the better.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can assure you that bigger score is better in our secerior.

@liuchengxu
Copy link
Owner

Hmm, I have looked at the doc a bit, but I have not found a clear way to use it in clap as it looks like a different workflow, the old way is to feed the output of fd/rg command to the filter.

The dyn_fuzzy_filter_and_rank works great, I'll firstly intergrate this feature into clap.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

ImmemorConsultrixContrarie commented Mar 30, 2020

the old way is to feed the output of fd/rg command to the filter

And what is the info you give fd/rg? Directory to search in with optional ignore rules, or nothing at all and it gets directory with std::env:current_dir()?

@liuchengxu

@liuchengxu
Copy link
Owner

what is the info you give fd/rg?

I merely pass the command string, e.g, fd --type f, to build the std::process::Command, optionally specify the command working directory by calling command.current_dir(), if no explicit working directory specified, use std::env::current_dir() then.

I realize that with the new dynamic filter, if we can build an iterator of output from the command string, then we might almost reach the final goal, possibly like this example in rust-subprocess https://github.com/hniksic/rust-subprocess/blob/master/examples/sample.rs.

Current flow without dyn filter support:

  1. Build std::process::Command C from string, e.g., fd --type f.
  2. Wait for the complete output of C.
  3. Feed the output of C into our dyn filter.

New flow with dyn filter support:

  1. Build an iterator from the string, e.g., fd --type f.
  2. Feed the iterator above into our dyn filter.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

what is the info you give fd/rg?

I merely pass the command string, e.g, fd --type f, to build the std::process::Command, optionally specify the command working directory by calling command.current_dir(), if no explicit working directory specified, use std::env::current_dir() then.

I realize that with the new dynamic filter, if we can build an iterator of output from the command string, then we might almost reach the final goal, possibly like this example in rust-subprocess https://github.com/hniksic/rust-subprocess/blob/master/examples/sample.rs.

Current flow without dyn filter support:

1. Build `std::process::Command` C from string, e.g., `fd --type f`.

2. Wait for the complete output of C.

3. Feed the output of C into our dyn filter.

New flow with dyn filter support:

1. Build an iterator from the string, e.g., `fd --type f`.

2. Feed the iterator above into our dyn filter.

Just all lines from all files? Really? @liuchengxu

@liuchengxu
Copy link
Owner

liuchengxu commented Mar 30, 2020

@ImmemorConsultrixContrarie Hmm, I didn't make myself more clear, not all the lines of all files. For grep, I simply dispatch the rg command given the query and returns the raw output of rg, so it won't be all the lines of all files. But for fd to list the files, I do save all the lines it produces, it's accpetable since that normally has at most a few millions lines.

I believe fzf and skim both use the stream for reading data source and then do the filtering on them, which is missing in clap sadly at present :(.

Using pipe like (cd / && fd --type f) | maple --number 10 filter foo also benefits from dyn filter, I'll try this using std::process::Command later.

Update: I have tried rust-subprocess in #364 , it fits clap well, see the new Source::Exec.

use std::time::{Duration, SystemTime};

#[test]
fn basic_functionality_test() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I reworked that thing, now it's a bit different, and should work, because this basic test works.
I tested dynamic results on Rust src, and it works too.
Now I just need you to test the usability and speed of this new thing.

A bit or renaming will be done later, and there's no docs right now, but it works and could be tested.
Look in the test to get the idea of what sort_and_print function should look like and what to pass into (poorly named) very_simple function.

@liuchengxu

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I have not understand this new lib, which part it can replace and how it can be used in current flow. I'll spend more time after I merge your dyn filter feature into clap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I have not understand this new lib, which part it can replace and how it can be used in current flow. I'll spend more time after I merge your dyn filter feature into clap.

Welp, everything, I guess. It uses the same recursive directory searcher as fd/rg, and when it finds a file, it reads the file, then iterates over lines of that file and applies matcher and scorer onto that line. When file ends, it searches for the next file.
With the last commit I changed setter function, so now it can take any algorithm.

Edit: okay, not everything, because it's only crafted for fuzzy-search.
Edit 2: for effectively multithreaded fuzzy-search that sometimes prints a part of what's already found if the search takes too much time and provides too many results.
Both serialization and IO ops are quite expensive, so I try to avoid doing it too much.

This could happen only with a very frequent search query,
something like `a` or ` `, that could be found in almost any line.
This change was in a previous commit, actually,
this commit just lowers bound number a little and makes two other little changes in module and function visibility.
@ImmemorConsultrixContrarie
Copy link
Contributor Author

Oh, wait. This is used to find files, not text in the files, yeah? Ahahahahahahaahahahahahahahahah. Sorry, that's nervous laughter.

@liuchengxu
Please, tell me, that I didn't end up with thing that fuzzy-searches matches in files instead of files' names and pathes and even though it's finely multithreaded and should work fast, it's useless.

@liuchengxu
Copy link
Owner

@ImmemorConsultrixContrarie It's totally ok for searching filenames. I'm nearly done with #364, then I can take time to interagte your new lib.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

@ImmemorConsultrixContrarie It's totally ok for searching filenames. I'm nearly done with #364, then I can take time to interagte your new lib.

No, it's not. Filename searching is a whole other thing, and it's almost impossible to get multithreaded filename searcher with dynamic results . FFFFFFFFFFFFFFFFUUUUUUUUUU...

To get real improvement in searching filenames, the algorithm should be changed, because right now it re-scores the whole path each time, even though many things share the same directory and could get some prescore.

And directory iterator should be changed if you want fzf-like search syntax, to stop searching if the directory got some !fire in it. Of course, you can just keep going in the directory, but what's the point if all results will be discarded?

Funny enough: the first thing I've done (wasn't even trying to get something cool, lol) was just what you needed, and all things I did after were useless. Hehe. Hehehe.

liuchengxu added a commit that referenced this pull request Apr 4, 2020
* Pick dyn filter code

* Replaced f64 scoring with i32.

* Make it compile

* Intergate with vim initially

* Change ITEMS_TO_SHOW to 100

* Split filter sync/async

* Move to filter/async/dyn.vim

* Split out print_top_items()

* .

* Split out select_top_items_to_show()

* Extract find_bast_score_idx() and try_notify_top_results()

* Fix vint

* Pass enable_icon to try_notify_top_results()

* Fix sorting glitch in top scores

Ref #364 (comment)

* Pass --enable-icon in vimscript

* Update CHANGELOG

* Support neovim too

* Process truncated_map in vim

* Introduce Source::Exec

* Commit state.vim

* Fix vint

* Split out clap#state#handle_message()

* Pass cmd into clap#filter#async#start(cmd)

* Try using dyn filter on grep provider

* .

* Nits

* Use vim job's native api

* Add enable_dyn  to default-features in fuzzy_filter

* Nits

* Use systemlist()

* Only when using maple

* Split out job/stdio.vim

* Clean up dyn.vim

* Remove rpc.vim

* Add job/stdio.vim

* Fix switching between files and history provider

source_type are different

* Check if has g:__clap_forerunner_tempfile

The forerunnner job should also be moved into maple.

* Clear g:__clap_provider_cwd on exit

* Add clap#state#refresh_matches_count_on_forerunner_done()

* Fix vint

* .

* Make maple.vim run filter --sync

* .

* .

* Reset g:__clap_provider_cwd on start

* .

* Put s:clear_state() before setting provider source_type

* Extract clap#state#clear_pre() and clap#state#clear_post()

* .

Co-authored-by: ImmConCon <somewhat.fluffy@gmail.com>
@liuchengxu
Copy link
Owner

liuchengxu commented Apr 4, 2020

@ImmemorConsultrixContrari 🎉 Your dyn filter works in clap now! Really apprecicate your work.

I do agree with that the sorting/filtering approach for file names can be improved due to the current fuzzy filter algorithems are general, it's not optimal for the file names filtering. Taking the job of fd/rg in clap does not make much sense as they already the best tools for the time being IMHO :(.

The next optimization for clap would be making use of the cache of command that has a monster of results, e.g., cd / && fd --type f, find the files under the root directory, we can reuse the cached results for such jobs. That's my next move. fzf-like search syntax is a long-term goal, I actually haven't had a clear idea about that..

Another minor improvement is to send the top lines only when they change, otherwise just notify the total number, this could reduce the work on vim side as higlighting the matched items are not that cheap, especially for vim compared to neovim.

print_json_with_length!(total, lines, indices);

Update: Just tried and it doesn't help much :( , decrease ITEMS_TO_SHOW is the most effective way, the current value is 100, I'll decrease it to 30 or even smaller.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

Update: Just tried and it doesn't help much :( , decrease ITEMS_TO_SHOW is the most effective way, the current value is 100, I'll decrease it to 30 or even smaller.

I can see three ways: less number of things to print, bigger delay between prints, or just don't print anything at all if top results didn't change.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

Hello, @liuchengxu !
I've had some rough times recently (I'm on linux now 'cuz my Windows 7 died with all the files it had, and a bunch of other problems), but now I'm back, probably.

I skimmed through the changelog, but haven't read the code yet, and to get into it fast I'll just ask some questions right there, okay?

So, the first question is about cache-feature. What do you cache actually? And how do you deal with ever-changing file tree (new folders, new files, deleted or moved things)?

I have some ideas about caching paths, as that will help computing next queries without going top-down through the whole system tree again, if invoked in the same folder or its child folder, and some other optimizations. But there are some really rough edges due to ever-changing nature of the file tree, thus I guess you are caching some other thing instead of paths?

The second question is simple: do you still want fzf-like syntax? I haven't seen anything about this in changelog.

@liuchengxu
Copy link
Owner

@ImmemorConsultrixContrarie Definitely okay to ask here!

Let me answer the first cache question. I actually write the entire stdout of some command to a file when the stdout exceeds the threshold https://github.com/liuchengxu/vim-clap/blob/2d761834ef/crates/maple_cli/src/light_command.rs#L128. Then next time I could read from that cache file and no need to run the command again. The cache file is created by the formula vim.clap/cmd_string/cmd_working_dir/cache_file. For the same command running in the same directory, which happens a lot in vim-clap, this strategy just works.

With ever-changing file tree? No, I don't do that, I just provide a way for user to refresh the cache manually :(.

For the second question, yes, I still want fzf-like searching syntax, but I want to work out another problem first. Currently for each query string, maple process will be killed and rerun for the new query. I want to keep maple running for the whole session using the stdio-based RPC. The RPC framework is ready, the problem is to I need to access the whole result list during the session as I have to rerun the filtering job on the result list when new query comes, it's not hard if I already have the whole result list, I can just keep them in memory and access them at any time. But for some command that takes a while to complete, the whole result list is still a streamed stdout, I was wondering if it's good to write the stdout to a file when command is still streaming. But feel free to start ahead if you already have some ideas about fzf-like searching syntax.

@ImmemorConsultrixContrarie
Copy link
Contributor Author

But for some command that takes a while to complete, the whole result list is still a streamed stdout

If I understood it right, you have a problem, where Maple program locks the stdout handle with println!() for too long, stopping other parts of the program from using it?
Have you tried using stderr handle (with eprintln!() macro) for the short and fast messages? It's not the best way to do things, it just looks like the easiest one.
The best way is to not use stdout at all, sending formatted strings as a raw pointer plus length plus capacity, but that would require some unsafe code with manual memory handling, and I don't know if non-Rust side can work with raw pointers.

Btw, I often have problems understanding your answers to my questions, but using github to say "sorry, I didn't understand what you said" doesn't feel right.
I'd rather clarify those misunderstandings via some chat or mail. Tell me, if sending those "re-questions" to the mail in your profile is okay.

@liuchengxu
Copy link
Owner

@ImmemorConsultrixContrarie Email is good for me, here is my email address: xuliuchengxlc@gmail.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants