Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort search results when using -X option if results are not many. Solve Issue #441 #513

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 47 additions & 4 deletions src/exec/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ use self::input::{basename, dirname, remove_extension};
pub use self::job::{batch, job};
use self::token::Token;

/// When `ExecutionMode` is `Batch` and number of results is less
/// than this value, results will be sorted
const SORT_THRESHOLD: usize = 100;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be similar to MAX_BUFFER_LENGTH from src/internal.mod. Could we reuse that constant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I'm slightly in favour of a specific constant, possibly with the same value. Anyway, you have a better picture of the whole code, so if you think that the semantic is similar enough then I'm OK with reusing MAX_BUFFER_LENGTH.


/// Execution mode of the command
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ExecutionMode {
Expand Down Expand Up @@ -176,10 +180,22 @@ impl CommandTemplate {
for arg in &self.args[1..] {
if arg.has_tokens() {
// A single `Tokens` is expected
// So we can directy consume the iterator once and for all
for path in &mut paths {
cmd.arg(arg.generate(&path).as_ref());
has_path = true;
// So we can directly consume the iterator once and for all

// It would be too expensive to collect all the paths at once, so we take
// only the first ones.
let mut first_paths = take(&mut paths, SORT_THRESHOLD);
has_path = !first_paths.is_empty();

if first_paths.len() < SORT_THRESHOLD {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be <= SORT_THRESHOLD?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as it is now first_paths.len() <= SORT_THRESHOLD is always true. Maybe it's clearer if we refactor as

let mut first_paths = take(&mut paths, SORT_THRESHOLD + 1);

if first_paths.len() <= SORT_THRESHOLD {

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what the intention is?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you were concerned about performance and introduced the SORT_THRESHOLD, but I am not sure if this is really required. I would expect the sort time to be much smaller than the actual filesystem-search. I would therefore remove the limit and simplify the logic here. There is a limit to the number of paths for -X anyway (see #410), so I don't think this will cause any problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our motivating reason was this. If sorting time is not an issue, please close this pull request and we'll submit another with less noise.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. I forgot about my "If we would not be restricted, it might make sense to only sort the results if there are less than some fixed number." comment.

I would like to start with an easy solution where we always sort the paths. If this turns out to be a problem (performance wise), we can go back to an approach like this.

// sort the paths because they are not too many
first_paths.sort();
CommandTemplate::add_to_cmd(&mut cmd, arg, &mut first_paths.into_iter());
} else {
CommandTemplate::add_to_cmd(&mut cmd, arg, &mut first_paths.into_iter());

// add remaining paths
CommandTemplate::add_to_cmd(&mut cmd, arg, &mut paths);
}
} else {
cmd.arg(arg.generate("").as_ref());
Expand All @@ -192,6 +208,33 @@ impl CommandTemplate {
ExitCode::Success
}
}

/// Helper function to add found paths as arguments to `cmd`.
fn add_to_cmd<I>(cmd: &mut Command, arg: &ArgumentTemplate, paths: &mut I)
where
I: Iterator<Item = String>,
{
for path in paths {
cmd.arg(arg.generate(&path).as_ref());
}
}
}

/// This function behaves exactly like the standard library `take` for iterators,
/// except that the original iterator is not consumed in the process.
Comment on lines +223 to +224
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what Iterator::peekable() is for?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know there is no equivalent of this function in the standard library: could you elaborate on how you wanted to use peek?

fn take<I,J>(iterator: &mut I, num: usize) -> Vec<J>
where I: Iterator<Item = J>
{
let mut first_elements = Vec::with_capacity(num);
let mut current_size = 0;
for element in iterator {
first_elements.push(element);
current_size += 1;
if current_size >= num {
break;
}
}
first_elements
}

/// Represents a template for a single command argument.
Expand Down