Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #6693

Open
DHowett opened this issue Jun 26, 2020 · 5 comments
Assignees
Labels
Area-CmdPal Command Palette issues and features Area-User Interface Issues pertaining to the user interface of the Console or Terminal Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal.
Milestone

Comments

@DHowett
Copy link
Member

DHowett commented Jun 26, 2020

I have two commands:

  1. Split Pane, split: horizontal
  2. Split Pane, split: horizontal, profile: SSH: Antares

If I search for sp anta, I expect 2 to weight higher than 1. I'm getting the opposite, though:

image

We only run one pass over each command for the whole search term; [SP]lit p[AN]e, spli[T]: horizont[A]l is a complete match; since it's a subset of the Antares version, we never get to the more specific ... profile: SSH:... part.

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jun 26, 2020
@DHowett DHowett added Area-User Interface Issues pertaining to the user interface of the Console or Terminal Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. labels Jun 26, 2020
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jun 26, 2020
@DHowett DHowett removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jun 26, 2020
@DHowett DHowett added this to the Terminal v2.0 milestone Jun 26, 2020
@DHowett DHowett assigned DHowett and unassigned zadjii-msft Jul 23, 2020
@DHowett
Copy link
Member Author

DHowett commented Jul 23, 2020

@zadjii-msft more modest proposal, and one I'd like to work on for Hackathon (!)

It looks like VSCode has three matchers. The first one to return a result just straight-up wins, and there is no weighting. This is important.

  1. Full prefix matching ("needle" will match "[Needle]ss to say")
  2. Word-by-word matching ("needle" will match "[Ne]ed [Ed]ward to [Le]t Alphonse go")
  3. Contiguous substring matching only ("space" will match "sub[space]", but not match "sub[s] must [pace] themselves")

All other matches are sorted strictly alphabetically. This continues to support the case where users memorize commands by their first initial, and doesn't break the case where users might want "p" to match "S[p]lit Pane" for some reason. It'd sort below anything that started with a P because S>P.

I'm guessing that they found users really want a constrained search, not a generic fuzzy search, for their commands. We might be in the same boat because our users will remember key things about the commands (profile name, index, split type, action, etc.).

VSC also returns, as a result of fuzzy match, a list of ranges. If we adopt that, it will make it easier to get the range of text we are supposed to embolden.

@zadjii-msft
Copy link
Member

Wait but why no weighting?

@DHowett
Copy link
Member Author

DHowett commented Jul 23, 2020

So, I think that with "first match wins" and running matchers in that order we'll get the vast majority of things users want without having to get into relative ordering discussions like the one that made me file this bug 😄

This is unsubstantiated and, perhaps, unknowable, but I think that most commands are going to be matched by either full-prefix or word-by-word-prefix. There's a higher likelihood that somebody is going to disambiguate an ambiguous match by typing either successive word-start letters or by simply choosing the right one from the very reduced list.

Unfold for a bunch of matching cases. It was longer than I expected.

As an example, take the command names from the OP (plus a couple more)

  • Split Pane, split: horizontal
  • Split Pane, split: horizontal, profile: SSH: Antares
  • Split Pane, split: horizontal, profile: SSH: Vaudeville
  • Split Pane, split: vertical

and the needles sp anta, span and spv and spve

weights

If we do weighted matching, we get (approximately) the following.

sp anta

 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l
 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Antares
 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Vaudeville
 33     1   12         1          1 = 12
[Sp]lit[ ]P[an]e, spli[t]: vertic[a]l

span

weight <span> in <Split Pane, split: horizontal>: 9
weight <span> in <Split Pane, split: horizontal, profile: SSH: Antares>: 9
weight <span> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9
weight <span> in <Split Pane, split: vertical>: 9

spv

weight <spv> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 8
weight <spv> in <Split Pane, split: vertical>: 8

it's a toss-up, which is fine - the user may disambiguate later

spve

weight <spve> in <Split Pane, split: vertical>: 11
weight <spve> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9
(others 0)

spva flips them around (which is fine)

first-past-the-post

sp anta

Split Pane, split: horizontal
* prefix: not literal `sp anta` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `sp anta`
* wordwise: [Sp]lit ... [Anta]res = TRUE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `sp anta`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Split Pane, split: vertical
* prefix: not literal `sp anta`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Final list:
- Split Pane, split: horizontal, profile: SSH: Antares

span

Same as above, actually.

spv

Split Pane, split: horizontal
* prefix: not literal `spv` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spv` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `spv`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spv` = FALSE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `spv`
* wordwise: [Sp]lit ... [V]audeville = TRUE

Split Pane, split: vertical
* prefix: not literal `spve`
* wordwise: [Sp]lit ... [v]ertical = TRUE

Final list:
- Split Pane, split: horizontal, profile: SSH: Vaudeville
- Split Pane, split: vertical

user disambiguates w/ a or e

spve

Split Pane, split: horizontal
* prefix: not literal `spve` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `spve`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `spve`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: vertical
* prefix: not literal `spve`
* wordwise: [Sp]lit ... [ve]rtical = TRUE

Final list:
- Split Pane, split: vertical

Anyway, I'm not sure fuzzy plus weighting is better. If we want to render the results to the user we are going to bold a bunch of weird letters or we're going to have to do a runoff search to re-weight the matches and optimize for longer substring runs.

If we're going to do that, and reprioritize longer substring runs, we may want to just ... try to guess at the substrings people are going to use (my theory is that they're the beginnings of their action-words) ... which eventually takes us to "why not just match word prefixes?"

@DHowett
Copy link
Member Author

DHowett commented Jul 23, 2020

Further discussion: maybe we should do weighting by algorithm -- full prefix is better than wordwise prefix which is further better than contiguous substring.

@lhecker
Copy link
Member

lhecker commented Aug 4, 2020

For reference and further research I'd like to mention the relatively popular project fzf and its algorithm.

@zadjii-msft zadjii-msft added the Area-CmdPal Command Palette issues and features label Dec 1, 2020
@zadjii-msft zadjii-msft modified the milestones: Terminal Backlog, Backlog Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-CmdPal Command Palette issues and features Area-User Interface Issues pertaining to the user interface of the Console or Terminal Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal.
Projects
None yet
Development

No branches or pull requests

3 participants