-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ComboboxControl: Added less restrictive search matching. #41952
ComboboxControl: Added less restrictive search matching. #41952
Conversation
👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @markbiek! In case you missed it, we'd love to have you join us in our Slack community, where we hold regularly weekly meetings open to anyone to coordinate with each other. If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information. |
922d8ae
to
eda9e32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given minor comments.
Thank you for your contribution 👏🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the improvement suggestion! It really helps us to see limitations in real-life use cases like this.
API-wise, I'd prefer we figure something out that is either a usability enhancement for the general case, or a customizable extension for any consumer.
For example, I might agree that normalizing em dashes and en dashes down to a hyphen is a usability enhancement for the general case, similar to how deburr is.
However, I think that removing a predefined set of characters from the string is too restrictive and arbitrary to warrant a boolean prop like looseSearch
. Consumers will reasonably want a different set of characters to remove from their strings. If we want to enhance the search matching functionality in this direction, it would make sense to add a more flexible API. For example, a property in the option object that allows the consumer to provide a separate string just for matching. Or a prop that allows the consumer to bring their own string match logic.
What do folks think? If we agree that normalizing en/em dashes to hyphens is a generally good idea and is a non-breaking change, we can just implement that without adding a new prop. I feel like that would be the easiest way forward in this particular case, given the complexity of updating the logic that adds underlines to the matched string range.
84a2cba
to
8124440
Compare
LGTM 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with the points made by @mirka:
API-wise, I'd prefer we figure something out that is either a usability enhancement for the general case, or a customizable extension for any consumer.
With respect to the latest approach taken in the PR (adding an optional match
key to the array of objects passed as the options
prop), I'd like to point out that we're considering deprecating the options
props from CustomSelectControl
and related components like ComboboxControl
in favour of a more declarative approach (see #41466 for more details).
Therefore, I'd prefer to look at alternatives approaches that will allows us to keep this feature even in case we deprecate the options
in the future. Two alternatives come to mind:
- if we believe that "normalising" all hypens matches is a feature that will benefit all users of the component, we may just add this further normalisation in the private implementation of the component. We could even go a step forward and look at "normalising" all punctuation?
- if we want to make this feature more flexible, we could add a new prop that allows a user to add an intermediate layer of "string normalisation" that gets applied before trying to match the search string to the combobox items
In case this fix is urgent, I suggest we look at implementing option 1 for now.
Regarding option 2, we'd need to make sure that, in case we switch to using ariakit
under the hood, we'll have a way to provide the same "custom string normalization" prop to ariakit
's combobox component — @diegohaz , do you have any pointers on this, so that we can future-proof this feature and have it working in case we switch to using ariakit
?
While Ariakit has a basic filtering API built-in, that's optional, and you have full control over the filtering logic as long as you render the matching items using import {
useComboboxState,
Combobox,
ComboboxPopover,
ComboboxItem,
} from "ariakit/combobox";
function ComboboxControl() {
const combobox = useComboboxState();
const matchingSuggestions = useMemo(/* filtering logic here */);
return (
<>
<Combobox state={combobox} />
<ComboboxPopover state={combobox}>
{matchingSuggestions.map((item) => (
<ComboboxItem key={item.value} value={item.value}>{item.label}</ComboboxItem>
))}
</ComboboxPopover>
</>
);
} For reference, here's a video where I use an external API to filter the items: https://www.youtube.com/watch?v=dcm8fjBfro8
I understand that's about deprecating Since But I can think of some APIs to make it more flexible: // Filtering and rendering logic is controlled by the user:
<ComboboxControl>
<ComboboxControlOption value="a" label="A" />
<ComboboxControlOption value="b" label="B" />
<ComboboxControlOption value="c" label="C" />
</ComboboxControl>
// Filtering and rendering logic is controlled by the library:
const options = [
{ value: "a", label: "A" },
{ value: "b", label: "B" },
{ value: "c", label: "C" },
]
<ComboboxControl options={options} />
// Filtering logic is controlled by the library. Rendering logic, by the user:
<ComboboxControl
options={options}
renderOption={(option) => <ComboboxControlOption {...option} />}
/> |
Thanks everyone! This is all really great feedback and I think the direction the control is heading is a good one, especially in terms of making things more declarative and moving to ariakit under the hood.
That said, I would love to implement the above and implement the removal of all punctuation when matching to resolve this particular issue. Is that something that everyone would approve of in the short-term? |
Hey @markbiek, I'd be ok with that. What we could do is to:
This way, we're just making the search slightly looser, but without omitting any search results that would have been previously displayed to the end user. What do you think? I will also add that, for the specific scenario highlighted in Automattic/wp-calypso#64677, the correct solution would be to use option groups for Countries, and options only for state/provinces. |
@diegohaz , thank you so much for your quick reply — it's super useful!
Thank you for the reference, I'll take a closer look at the examples and at the YouTube video.
Yes, that's correct
Thank you again for your suggestion. Probably the best thing to do at this point is go ahead with the shorter term fix as drafted in #41952 (comment), and discuss about the mid-long term approach separately from this PR (maybe in #41466) |
Thanks again everyone for the feedback. I'll move forward with the short term fix described in #41952 (comment) |
c88b161
to
8a95217
Compare
@ciampo I pushed an updated implementation based on our discussion above. Let me know if you have any feedback! |
Sorry, I was AFK for a couple days and wasn't following this thread! What's our plan for the "matched string underline" logic in this scenario? One reason I recommended normalization (as opposed to character removal) as the quickest fix is so we don't have to spend time reworking the underline logic 😆 It will work as before as long as the character count remains the same. Not saying the underline is a must-have feature, but whatever UX regression we'd be introducing by potentially removing it altogether, or the complexity we'd need to add to make it work for fuzzy searches, should be a part of the cost/benefit analysis. |
@mirka has definitely got a point (which I didn't consider in my previous message) — in fact, it's currently possible to reproduce the underlining inconsistency described above in the new Storybook example:
This is very true. At this point, an internal string normalization (which only replaces characters, without changing the length of the string) seems like the most conservative option, since it won't introduce changes to the component's APIs and it won't cause regressions to the match highlighting logic. How do folks feel about this approach? Is there any better alternative, in terms of tradeoffs? (@markbiek , I'm sorry if this PR is taking longer than expected and if you feel pulled in many directions! We need to ensure that we make the best overall decision, factoring in many use-cases 🙇 ) Finally, just noting that this PR needs a rebase after #42169 got merged. |
@ciampo Can you give me more detail about this new approach? I'm not entirely sure I'm understanding. 🙏 |
const LooseMatchComboboxControl = () => { | ||
const [ value, setValue ] = useState( null ); | ||
|
||
const countryStateOptions = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match
key is not needed anymore in this sample data object.
Also, I've added a couple of different dashes to showcase the new "dash normalization" feature
Click to expand
diff --git a/packages/components/src/combobox-control/stories/index.js b/packages/components/src/combobox-control/stories/index.js
index 5f1100a59d..11111ae609 100644
--- a/packages/components/src/combobox-control/stories/index.js
+++ b/packages/components/src/combobox-control/stories/index.js
@@ -291,56 +291,24 @@ const LooseMatchComboboxControl = () => {
const [ value, setValue ] = useState( null );
const countryStateOptions = [
- { value: 'AO:BGO', label: 'Angola — Bengo', match: 'angola bengo' },
- {
- value: 'AO:BLU',
- label: 'Angola — Benguela',
- match: 'angola — benguela',
- },
- { value: 'AO:BIE', label: 'Angola — Bié', match: 'angola bié' },
- {
- value: 'AO:CAB',
- label: 'Angola — Cabinda',
- match: 'angola — cabinda',
- },
- { value: 'AO:CNN', label: 'Angola — Cunene', match: 'angola cunene' },
- { value: 'AO:HUA', label: 'Angola — Huambo', match: 'angola huambo' },
- { value: 'AO:HUI', label: 'Angola — Huíla', match: 'angola huíla' },
- {
- value: 'AO:CCU',
- label: 'Angola — Kuando Kubango',
- match: 'angola kuando kubango',
- },
- {
- value: 'AO:CNO',
- label: 'Angola — Kwanza-Norte',
- match: 'angola kwanza norte',
- },
- {
- value: 'AO:CUS',
- label: 'Angola — Kwanza-Sul',
- match: 'angola kwanza sul',
- },
- { value: 'AO:LUA', label: 'Angola — Luanda', match: 'angola luanda' },
- {
- value: 'AO:LNO',
- label: 'Angola — Lunda-Norte',
- match: 'angola lunda norte',
- },
- {
- value: 'AO:LSU',
- label: 'Angola — Lunda-Sul',
- match: 'angola lunda sul',
- },
- {
- value: 'AO:MAL',
- label: 'Angola — Malanje',
- match: 'angola malanje',
- },
- { value: 'AO:MOX', label: 'Angola — Moxico', match: 'angola moxico' },
- { value: 'AO:NAM', label: 'Angola — Namibe', match: 'angola namibe' },
- { value: 'AO:UIG', label: 'Angola — Uíge', match: 'angola uíge' },
- { value: 'AO:ZAI', label: 'Angola — Zaire', match: 'angola zaire' },
+ { value: 'AO:BGO', label: 'Angola ~ Bengo' },
+ { value: 'AO:BLU', label: 'Angola - Benguela' },
+ { value: 'AO:BIE', label: 'Angola – Bié' },
+ { value: 'AO:CAB', label: 'Angola — Cabinda' },
+ { value: 'AO:CNN', label: 'Angola — Cunene' },
+ { value: 'AO:HUA', label: 'Angola — Huambo' },
+ { value: 'AO:HUI', label: 'Angola — Huíla' },
+ { value: 'AO:CCU', label: 'Angola — Kuando Kubango' },
+ { value: 'AO:CNO', label: 'Angola — Kwanza-Norte' },
+ { value: 'AO:CUS', label: 'Angola — Kwanza-Sul' },
+ { value: 'AO:LUA', label: 'Angola — Luanda' },
+ { value: 'AO:LNO', label: 'Angola — Lunda-Norte' },
+ { value: 'AO:LSU', label: 'Angola — Lunda-Sul' },
+ { value: 'AO:MAL', label: 'Angola — Malanje' },
+ { value: 'AO:MOX', label: 'Angola — Moxico' },
+ { value: 'AO:NAM', label: 'Angola — Namibe' },
+ { value: 'AO:UIG', label: 'Angola — Uíge' },
+ { value: 'AO:ZAI', label: 'Angola — Zaire' },
];
return (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ciampo Can you give me more detail about this new approach? I'm not entirely sure I'm understanding. 🙏
Sure, what I meant is something like this:
Click to expand
diff --git a/packages/components/src/combobox-control/index.js b/packages/components/src/combobox-control/index.js
index 3b86561f1b..7daad5d2d3 100644
--- a/packages/components/src/combobox-control/index.js
+++ b/packages/components/src/combobox-control/index.js
@@ -31,6 +31,76 @@ import withFocusOutside from '../higher-order/with-focus-outside';
const noop = () => {};
+// Inspiration taken from https://jkorpela.fi/dashes.html
+const ALL_UNICODE_DASH_CHARACTERS = new RegExp(
+ `[${ [
+ // - (hyphen-minus)
+ '\u002d',
+ // ~ (tilde)
+ '\u007e',
+ // (soft hyphen)
+ '\u00ad',
+ // ֊ (armenian hyphen)
+ '\u058a',
+ // ־ (hebrew punctuation maqaf)
+ '\u05be',
+ // ᐀ (canadian syllabics hyphen)
+ '\u1400',
+ // ᠆ (mongolian todo soft hyphen)
+ '\u1806',
+ // ‐ (hyphen)
+ '\u2010',
+ // non-breaking hyphen)
+ '\u2011',
+ // ‒ (figure dash)
+ '\u2012',
+ // – (en dash)
+ '\u2013',
+ // — (em dash)
+ '\u2014',
+ // ― (horizontal bar)
+ '\u2015',
+ // ⁓ (swung dash)
+ '\u2053',
+ // superscript minus)
+ '\u207b',
+ // subscript minus)
+ '\u208b',
+ // − (minus sign)
+ '\u2212',
+ // ⸗ (double oblique hyphen)
+ '\u2e17',
+ // ⸺ (two-em dash)
+ '\u2e3a',
+ // ⸻ (three-em dash)
+ '\u2e3b',
+ // 〜 (wave dash)
+ '\u301c',
+ // 〰 (wavy dash)
+ '\u3030',
+ // ゠ (katakana-hiragana double hyphen)
+ '\u30a0',
+ // ︱ (presentation form for vertical em dash)
+ '\ufe31',
+ // ︲ (presentation form for vertical en dash)
+ '\ufe32',
+ // ﹘ (small em dash)
+ '\ufe58',
+ // ﹣ (small hyphen-minus)
+ '\ufe63',
+ // - (fullwidth hyphen-minus)
+ '\uff0d',
+ ].join( '' ) }]`,
+ 'g'
+);
+
+const normalizeTextString = ( value ) => {
+ // TODO: swap deburr with `removeAccents` after rebasing
+ return deburr( value )
+ .toLocaleLowerCase()
+ .replace( ALL_UNICODE_DASH_CHARACTERS, '-' );
+};
+
const DetectOutside = withFocusOutside(
class extends Component {
handleFocusOutside( event ) {
@@ -71,31 +141,18 @@ function ComboboxControl( {
const matchingSuggestions = useMemo( () => {
const startsWithMatch = [];
const containsMatch = [];
- const looseMatch = [];
- const match = deburr( inputValue.toLocaleLowerCase() );
+ const match = normalizeTextString( inputValue );
options.forEach( ( option ) => {
- const optionLabel = deburr( option.label ).toLocaleLowerCase();
- // For loose search matching, remove ASCII characters in the following ranges:
- // ! through @
- // [ through `
- const looseOptionLabel = optionLabel
- // First remove the characters described above.
- .replace( /[!-@—]|[\[-`]/g, '' )
- // Then make sure we don't have any double spaces.
- .replace( /\s{2,}/g, ' ' );
-
- const index = optionLabel.indexOf( match );
+ const index = normalizeTextString( option.label ).indexOf( match );
if ( index === 0 ) {
startsWithMatch.push( option );
} else if ( index > 0 ) {
containsMatch.push( option );
- } else if ( looseOptionLabel.includes( match ) ) {
- looseMatch.push( option );
}
} );
- return startsWithMatch.concat( containsMatch, looseMatch );
+ return startsWithMatch.concat( containsMatch );
}, [ inputValue, options, value ] );
const onSuggestionSelected = ( newSelectedSuggestion ) => {
We may want to be less comprehensive on the list of dash-like characters that we normalize (for discussion).
@ciampo Thanks for the clarification, that's very helpful! My only concern is that it doesn't solve the original issue we were attempting to fix. Even with all of the hyphen-like characters being normalized to an ASCII Let's say have an option and the label is When someone types |
As explained in details in previous messages, any matching logic that changes the length of the string that is being matched (i.e. by ignoring punctuation or collapsing whitespace) will inherently break the text-highliting functionality of the component. We could implement this change (ignore punctuation and repeated whitespace) at the cost of removing the text-highlighting functionality — what do you think, @WordPress/gutenberg-design ? This change would also need @mtias 's seal of approval, since it could be seen like a potential UX regression. |
In terms of the original issue, if you're searching for the Cabinda province specifically, isn't it more likely that you'd just type that rather than 'Angola ca...'? Could there also be situations where the punctuation is critical to the search? It seems a shame to lose the text highlighting, and I suspect that might be an a11y regression. |
@jameskoster I definitely agree that we don't want any regressions! That said, we opened this PR due to a reported issue with Calypso, albeit a minor way. I think the best thing to do here is to implement the normalization described above and we'll take a different approach to fixing Automattic/wp-calypso#64677 on Calypso. Does that seem fair to everyone? |
If it won't break the text highlighting that seems fair to me. |
If you're referring to this suggestion, we can go ahead as that won't introduce regressions with the text highlighting logic (with respect to my suggested changes, we may just want to understand how "comprehensive" we'd like to be with respect to the list "dash-like" characters to normalize). |
8a95217
to
a5ca435
Compare
This was automatically closed and I can't reopen it. The work normalizing hyphens can be found in #42942 |
What?
Added less restrictive search matching to the ComboboxControl
This is related to Calypso issue Automattic/wp-calypso#64677
Why?
A number of places in Calypso use the
ComboboxControl
to display a combination of countries and states using the label formatCOUNTRY -- STATE
. This makes matching difficult unless you type out the entire label.How?
If the "starts-with" and "contains" searches fail, we attempt to match by removing all puncuation from the option. Any results are appended to the other results so we don't omit anything that would have previously been displayed.
Testing Instructions
There is a new Storybook story demonstrating the new prop:
http://localhost:50240/?path=/story/components-comboboxcontrol--loose-match
Screenshots or screencast
Current behavior
CleanShot.2022-06-24.at.13.13.52.mp4
Custom search behavior
CleanShot.2022-06-24.at.13.13.18.mp4