Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #6693

DHowett · 2020-06-26T17:20:25Z

I have two commands:

Split Pane, split: horizontal
Split Pane, split: horizontal, profile: SSH: Antares

If I search for sp anta, I expect 2 to weight higher than 1. I'm getting the opposite, though:

We only run one pass over each command for the whole search term; [SP]lit p[AN]e, spli[T]: horizont[A]l is a complete match; since it's a subset of the Antares version, we never get to the more specific ... profile: SSH:... part.

The text was updated successfully, but these errors were encountered:

DHowett · 2020-07-23T01:43:21Z

@zadjii-msft more modest proposal, and one I'd like to work on for Hackathon (!)

It looks like VSCode has three matchers. The first one to return a result just straight-up wins, and there is no weighting. This is important.

Full prefix matching ("needle" will match "[Needle]ss to say")
Word-by-word matching ("needle" will match "[Ne]ed [Ed]ward to [Le]t Alphonse go")
Contiguous substring matching only ("space" will match "sub[space]", but not match "sub[s] must [pace] themselves")

All other matches are sorted strictly alphabetically. This continues to support the case where users memorize commands by their first initial, and doesn't break the case where users might want "p" to match "S[p]lit Pane" for some reason. It'd sort below anything that started with a P because S>P.

I'm guessing that they found users really want a constrained search, not a generic fuzzy search, for their commands. We might be in the same boat because our users will remember key things about the commands (profile name, index, split type, action, etc.).

VSC also returns, as a result of fuzzy match, a list of ranges. If we adopt that, it will make it easier to get the range of text we are supposed to embolden.

zadjii-msft · 2020-07-23T13:29:06Z

Wait but why no weighting?

DHowett · 2020-07-23T17:08:50Z

So, I think that with "first match wins" and running matchers in that order we'll get the vast majority of things users want without having to get into relative ordering discussions like the one that made me file this bug 😄

This is unsubstantiated and, perhaps, unknowable, but I think that most commands are going to be matched by either full-prefix or word-by-word-prefix. There's a higher likelihood that somebody is going to disambiguate an ambiguous match by typing either successive word-start letters or by simply choosing the right one from the very reduced list.

Unfold for a bunch of matching cases. It was longer than I expected.

As an example, take the command names from the OP (plus a couple more)

Split Pane, split: horizontal
Split Pane, split: horizontal, profile: SSH: Antares
Split Pane, split: horizontal, profile: SSH: Vaudeville
Split Pane, split: vertical

and the needles sp anta, span and spv and spve

weights

If we do weighted matching, we get (approximately) the following.

`sp anta`

 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l
 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Antares
 33     1   12         1            1 = 12
[Sp]lit[ ]P[an]e, spli[t]: horizont[a]l, profile: SSH: Vaudeville
 33     1   12         1          1 = 12
[Sp]lit[ ]P[an]e, spli[t]: vertic[a]l

`span`

weight <span> in <Split Pane, split: horizontal>: 9
weight <span> in <Split Pane, split: horizontal, profile: SSH: Antares>: 9
weight <span> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9
weight <span> in <Split Pane, split: vertical>: 9

`spv`

weight <spv> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 8
weight <spv> in <Split Pane, split: vertical>: 8

it's a toss-up, which is fine - the user may disambiguate later

`spve`

weight <spve> in <Split Pane, split: vertical>: 11
weight <spve> in <Split Pane, split: horizontal, profile: SSH: Vaudeville>: 9
(others 0)

spva flips them around (which is fine)

first-past-the-post

`sp anta`

Split Pane, split: horizontal
* prefix: not literal `sp anta` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `sp anta`
* wordwise: [Sp]lit ... [Anta]res = TRUE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `sp anta`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Split Pane, split: vertical
* prefix: not literal `sp anta`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `sp anta` = FALSE

Final list:
- Split Pane, split: horizontal, profile: SSH: Antares

`span`

Same as above, actually.

`spv`

Split Pane, split: horizontal
* prefix: not literal `spv` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spv` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `spv`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spv` = FALSE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `spv`
* wordwise: [Sp]lit ... [V]audeville = TRUE

Split Pane, split: vertical
* prefix: not literal `spve`
* wordwise: [Sp]lit ... [v]ertical = TRUE

Final list:
- Split Pane, split: horizontal, profile: SSH: Vaudeville
- Split Pane, split: vertical

user disambiguates w/ a or e

`spve`

Split Pane, split: horizontal
* prefix: not literal `spve` = FALSE
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: horizontal, profile: SSH: Antares
* prefix: not literal `spve`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: horizontal, profile: SSH: Vaudeville
* prefix: not literal `spve`
* wordwise: `Sp` at start, no further words = FALSE
* substring: no literal `spve` = FALSE

Split Pane, split: vertical
* prefix: not literal `spve`
* wordwise: [Sp]lit ... [ve]rtical = TRUE

Final list:
- Split Pane, split: vertical

Anyway, I'm not sure fuzzy plus weighting is better. If we want to render the results to the user we are going to bold a bunch of weird letters or we're going to have to do a runoff search to re-weight the matches and optimize for longer substring runs.

If we're going to do that, and reprioritize longer substring runs, we may want to just ... try to guess at the substrings people are going to use (my theory is that they're the beginnings of their action-words) ... which eventually takes us to "why not just match word prefixes?"

DHowett · 2020-07-23T18:06:55Z

Further discussion: maybe we should do weighting by algorithm -- full prefix is better than wordwise prefix which is further better than contiguous substring.

lhecker · 2020-08-04T23:12:57Z

For reference and further research I'd like to mention the relatively popular project fzf and its algorithm.

ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jun 26, 2020

DHowett added Area-User Interface Issues pertaining to the user interface of the Console or Terminal Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. labels Jun 26, 2020

ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jun 26, 2020

DHowett removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jun 26, 2020

DHowett added this to the Terminal v2.0 milestone Jun 26, 2020

DHowett assigned zadjii-msft Jun 26, 2020

DHowett assigned DHowett and unassigned zadjii-msft Jul 23, 2020

zadjii-msft mentioned this issue Oct 1, 2020

Scenario: The Command Palette #5400

Open

52 tasks

zadjii-msft added the Area-CmdPal Command Palette issues and features label Dec 1, 2020

zadjii-msft modified the milestones: Terminal v2.0, Terminal Backlog Mar 25, 2021

zadjii-msft modified the milestones: Terminal Backlog, Backlog Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #6693

Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #6693

DHowett commented Jun 26, 2020

DHowett commented Jul 23, 2020 •

edited

Loading

zadjii-msft commented Jul 23, 2020

DHowett commented Jul 23, 2020 •

edited

Loading

weights

`sp anta`

`span`

`spv`

`spve`

first-past-the-post

`sp anta`

`span`

`spv`

`spve`

DHowett commented Jul 23, 2020

lhecker commented Aug 4, 2020

Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #6693

Command Palette search algorithm needs to prioritize "longest substring" match, may need multiple passes #6693

Comments

DHowett commented Jun 26, 2020

DHowett commented Jul 23, 2020 • edited Loading

zadjii-msft commented Jul 23, 2020

DHowett commented Jul 23, 2020 • edited Loading

weights

sp anta

span

spv

spve

first-past-the-post

sp anta

span

spv

spve

DHowett commented Jul 23, 2020

lhecker commented Aug 4, 2020

DHowett commented Jul 23, 2020 •

edited

Loading

DHowett commented Jul 23, 2020 •

edited

Loading

`sp anta`

`span`

`spv`

`spve`

`sp anta`

`span`

`spv`

`spve`