diff --git a/open-api.yaml b/open-api.yaml index 37cd6ee3..1e14a9a6 100644 --- a/open-api.yaml +++ b/open-api.yaml @@ -720,6 +720,22 @@ components: default: - '' example: 'title,description' + highlightPreTag: + name: highlightPreTag + in: query + required: false + description: Specify the tag to put before the highlighted query terms. + schema: + type: string + default: '' + highlightPostTag: + name: highlightPostTag + in: query + required: false + description: Specify the tag to put after the highlighted query terms. + schema: + type: string + default: '' attributesToCrop: name: attributesToCrop in: query @@ -728,6 +744,14 @@ components: type: string example: 'overview:10' description: Comma-separated list of attributes whose values have to be cropped. Cropped attributes are returned in `_formatted` response object. + cropMarker: + name: cropMarker + in: query + description: Sets the crop marker to apply before and/or after cropped part selected within an attribute defined in `attributesToCrop` parameter. + required: false + schema: + type: string + default: '…' cropLength: name: cropLength in: query @@ -735,8 +759,8 @@ components: schema: type: integer example: 5 - default: 200 - description: Length used to crop field values. + default: 10 + description: Sets the total number of words to keep around the matched part of an attribute specified in the `attributesToCrop` parameter. facetsDistribution: name: facetsDistribution in: query @@ -1431,7 +1455,10 @@ paths: - $ref: '#/components/parameters/q' - $ref: '#/components/parameters/attributesToRetrieve' - $ref: '#/components/parameters/attributesToHighlight' + - $ref: '#/components/parameters/highlightPreTag' + - $ref: '#/components/parameters/highlightPostTag' - $ref: '#/components/parameters/attributesToCrop' + - $ref: '#/components/parameters/cropMarker' - $ref: '#/components/parameters/cropLength' - $ref: '#/components/parameters/facetsDistribution' - $ref: '#/components/parameters/filter' @@ -1515,6 +1542,16 @@ paths: type: string example: '["title", "overview"]' default: '[]' + highlightPreTag: + type: string + description: Specify the tag to put before the highlighted query terms. + example: '' + default: '' + highlightPostTag: + type: string + description: Specify the tag to put after the highlighted query terms. + example: '' + default: '' attributesToCrop: type: array description: Array of attributes whose values have to be cropped. Cropped attributes are returned in `_formatted` response object. @@ -1522,10 +1559,14 @@ paths: type: string example: '["overview", "author"]' default: '[]' + cropMarker: + type: string + description: Sets the crop marker to apply before and/or after cropped part selected within an attribute defined in `attributesToCrop` parameter. + default: '…' cropLength: type: number - description: Length used to crop field values. - default: 200 + description: Sets the total number of **words** to keep for the cropped part of an attribute specified in the `attributesToCrop` parameter. + default: 10 matches: type: boolean description: Defines whether an `_matchesInfo` object that contains information about the matches should be returned or not. diff --git a/text/0027-filter-and-facet-behavior.md b/text/0027-filter-and-facet-behavior.md deleted file mode 100644 index be65c82a..00000000 --- a/text/0027-filter-and-facet-behavior.md +++ /dev/null @@ -1,263 +0,0 @@ -- Title: Filter and Facet Behavior -- Specification PR: [#27](https://github.com/meilisearch/specifications/pull/27) -- MeiliSearch Tracking-Issues: [milli/#152](https://github.com/meilisearch/milli/issues/152), [transplant/#140](https://github.com/meilisearch/transplant/issues/140), [transplant/#70](https://github.com/meilisearch/transplant/issues/70), [transplant/#81](https://github.com/meilisearch/transplant/issues/81) - -# Facet and Filter Behavior - -## 1. Functional Specification - -### I. Summary - -With v0.21.0, we are trying to erase the distinction between facets and filtering. `facetFilters` is removed as a query parameter. Instead, all filters are performed with the `filter` parameter. In addition, any attribute you wish to use with `filter` must first be added to attributesForFaceting. - -### II. Motivation - -Because the users need to set the attributes to `attributesForFaceting` no matter what they are using (`filters` or `facetFilters`) during the search, we need to re-define the usage of the filters/facets and stay as backwards compatible as possible with the MeiliSearch v0.20.0. - -### III. Additional Materials - -N/A - -### IV. Explanation - -#### Remove `facetFilters`, Rename `filters` to `filter` - -The usage of `facetFilters` is not needed anymore since everything is doable by only using the `filters` parameter. -We rename `filters` to `filter` mainly because the parameter's value will only ever be a single filter string, array or mixed syntax. Even if the value can be nested allowing for complexity, it's still just a logical expression. This name implicitly describes the action of filtering the results using a single filter expression made up of several logical operators. - -```json -// Settings -{ - "attributesForFaceting": ["author"] -} -// Search -{ - "q": "", - "filters": "price < 20", - "facetFilters": ["author:'JK Rowling'"] -} -``` - -becomes - -```json -// Settings -{ - "attributesForFaceting": ["author", "price"] -} -// Search -{ - "q": "", - "filter": "price < 20 AND author = 'JK Rowling'" -} -``` - -We decided to replace the `:` operator in favor of `=`. - -#### Operator behavior during search - -During search, logical operators should behave as they already do in MeiliSearch v0.20.0. - -Here is a quick reminder of the v0.20.0 operators behavior: - -- `<`, `=<`, `>`, `=>` => only operate on number values. MeiliSearch returns only documents that have numbers in this field. -ex: `price > 19` does not return `"price": "20"` but returns `"price": 20` -> no type conversions are done. -- `=`, `!=`/`NOT` => operate on string and number values. MeiliSearch returns only documents that have numbers, strings, or arrays of strings in this field. - -#### Known limitations - -- cannot filter on `null`, objects, arrays of "undefined elements" (ex: array of `null`) - -#### Accepted syntaxes for `filter` - -Three syntaxes will be accepted for the `filter` parameter during search. `String syntax`, `Array syntax` and `Mixed syntax`. - -##### String syntax - -The string syntax uses the `AND`/`OR`/`NOT` operators combined with parentheses to express a search filter. - -Example: -```json -{ - "filter": "(genres = Comedy OR genres = Romance) AND director = 'Mati Diop'" -} -``` - -##### Array syntax - -The array syntax uses dimensional array to express logical connectives. - -- Inner arrays elements are connected by an OR operator (e.g. [["genres:Comedy", "genres:Romance"]]). -- Outer arrays elements are connected by an AND operator (e.g. ["genres:Romance", "director:Mati Diop"]). - -Example: -```json -{ - "filter": [["genres = Comedy", "genres = Romance"], "director = 'Mati Diop'"] -} -``` - -##### Mixed syntax - -The mixed syntax can mix string and array syntaxes. - -Let's say that we want to translate -```json -{ - "filter": "((genres = Comedy AND genres = Romance) OR genres = Action) AND director != 'Mati Diop'" -} -``` -Example: -```json -{ - "filter": [["genres = Comedy AND genres = Romance", "genres = Action"], "NOT director = comedy"] -} -``` -> Note that string values that are longer than a single word need to be enclosed by quote. `"director = Mati Diop"` will lead to a parsing error. The valid syntax is `"director = 'Mati Diop'"`. - -#### `filters` and `facetsDistribution` behavior - -##### MeiliSearch v0.20.0 with `filters` - -In MeiliSeach v0.20.0, with the following documents - -```json -[ - { "id": 456, "genre": "adventure", "price": 12 }, - { "id": 1, "genre": "fantasy", "price": 456 } -] -``` - -...and the following search - -```json -{ - "q": "", - "filters": "price = 12", - "facetsDistribution": ["genre"] -} -``` - -...we get the following results: - -```json -{ - "hits": [{ "id": 456, "genre": "adventure", "price": 12 }], - "facetsDistribution": { - "genre": { - "fantasy": 1, - "adventure": 1 - } - } -} -``` - -=> `fantasy` is set to `1` despite the fantasy book does not have a `price` equals to `12` as required in the `filters`. -=> the `filters` and the `facetsDistribution` are not related: the `facetsDistribution` is applied before the filters. - -##### MeiliSearch v0.20.0 with `facetFilters` - -In MeiliSeach v0.20.0, with the following documents - -```json -[ - { "id": 456, "genre": "adventure", "price": "12" }, - { "id": 1, "genre": "fantasy", "price": "456" } -] -``` -(the test is done with `price` as strings because MeiliSearch v0.20.0 cannot facet on numbers) - -...and the following search - -```json -{ - "q": "", - "facetFilters": ["price:12"], - "facetsDistribution": ["genre"] -} -``` - -...we get the following results: - -```json -{ - "hits": [{ "id": 456, "genre": "adventure", "price": "12" }], - "facetsDistribution": { - "genre": { - "fantasy": 0, - "adventure": 1 - } - } -} -``` - -=> `fantasy` is set to `0` because the fantasy book does not have a `price` equals to `12` as required in the `facetFilters`. -=> the `filters` and the `facetsDistribution` are related: the `facetsDistribution` is applied after the facet filters. - -##### Final decision for v0.21.0 - -In MeiliSearch v0.21.0, `facetsDistribution` will behave with `filter` the same way it currently does with `facetFilters`: the `facetsDistribution` will be applied after the filters. - -#### TLDR; all the breaking changes - -Here is the summary of all the breaking changes (that are detailed in the paragraphs above): - -- The `facetFilters` parameter during the search is removed. Only `filter` can be used. -- The `filters` parameter is renamed `filter`. -- The users need to set the attributes to `attributesForFaceting` to use the filters during the search via the `filter` parameters. -- The users can now pass an attribute containing numbers (float or integer) in `attributesForFaceting`. It means they can use `filter` and `facetsDistribution` on this numeric field. -- The `filter` parameter can accept three syntaxes: string (with `OR`/`AND`/`NOT`), array and a mixed with string and array. -- The `:` operator does not exist anymore (was previously present in `facetFilters` in v0.20.0) and is replaced by `=`. -- The `facetsDistribution` is now applied after the `filter` parameter. This point is currently not documented, not sure this is useful to add it to the docs. -- All the integer in the user documents are converted into float. So integers with high values lose precision. However, integers from −2^53 to 2^53 (−9007199254740992 to 9007199254740992) can be exactly represented, which is enough in 99% of cases. Not sure this is important to documented it either. - -### V. Impact on Documentation - -See the previous part. - -The documentation should present a complex filter query with multiple levels to precise that MeiliSearch is not limited to deepness level. - -Example: - -```json -{ - "filter": "((genres = Comedy OR genres = Romance) AND (director = 'Mati Diop' OR director = 'Wong Kar-wai')) OR genres = 'Fantasy'" -} -``` - -### VI. Impact on SDKs - -- Remove `facetFilters` from their payload. -- Rename `filters` to `filter`. - -## 2. Technical Aspects - -### I. Abstract - -Internal MeiliSearch engine uses an index data structure for each type encountered in an attribute. Doing this eliminate the need for the user to strictly type the attribute and permits to deal with loosely typed document at indexing time. Thus, facilitating the developper experience. - -As for example, let’s imagine that we want to index two documents with a price attribute. One containing `"price": "20"` and the second `"price": 20` as value. The two attribute values will be stored in two different indexes. - -Using `=` or `!=` operator on `price` attribute will lead the engine to query the two indexes and get matching document ids with an `UNION` operation. So it will return documents matching `"20"` or `20` as values for the `price` attribute. - -#### Implementation Details - -We have a database for the facets, the keys are prefixed by the field_id (u8), a level (u8) and, the facet value (f64). The facet values don't have a level when the type is a string. The data stored under those keys is the document ids that are faceted under those facets values. - -The type of the facet (i.e. f64 or string) is stored in another data structure and this is by using it that we know how to read the facet value. If the facet type is a number we are able to use more operators like greater than or lower than (e.g. <, <=, >, >=, =, !=). - -##### Indexing phase - -When documents come in and fields are declared as facets, we start storing the facet values in the previously described database, the key becomes the facet value (as a globally ordered byte slice) and, the entry data now contains the document id that contains this facet value. Note that if the facet value is a number we store it like `[field id][level][left facet value][right facet value]` where the level is 0 and if it is a string then we don't store the level. - -Once the facet values that are numbers are stored we got a list of facet values prefixed with the field id and the base level (i.e. 0). We use this base level to generate more levels, each level contains groups of 4 groups of the level below, so level 1 aggregates the ids of the documents of each group of 4 facet values of level 0. The left and right facet values are the inclusive bounds of the group, the level 0 group have equal left and right bounds. - -##### Querying phase - -Those levels are used to reduce the number of entries to run through, reducing the time it takes to answer too wide range filter queries, like duration > 0 where 80% of the entries will match. We go through each of the levels going from the higher one, the one which describes the biggest amount of facet values and, we go deeper in the levels to find better fitting bounds. - -## 3. Future Possibilities - -- Provide facet statistics like `min` `max` `sum` `average` -- Rename `attributesForFaceting` for clarity. -- Rename `facetsDistribution` for clarity. diff --git a/text/0034-telemetry-policies.md b/text/0034-telemetry-policies.md index d628a7ab..38206365 100644 --- a/text/0034-telemetry-policies.md +++ b/text/0034-telemetry-policies.md @@ -109,6 +109,11 @@ The collected data is sent to [Segment](https://segment.com/). Segment is a plat | `q.max_terms_number` | Highest number of terms given for the `q` parameter in this batch | 5 | `Documents Searched POST`, `Documents Searched GET` | | `pagination.max_limit` | Highest value given for the `limit` parameter in this batch | 60 | `Documents Searched POST`, `Documents Searched GET` | | `pagination.max_offset` | Highest value given for the `offset` parameter in this batch | 1000 | `Documents Searched POST`, `Documents Searched GET` | +| `formatting.highlight_pre_tag` | `true` if `highlightPreTag` was used in this batch, otherwise `false` | false | `Documents Searched POST`, `Documents Searched GET` | +| `formatting.highlight_post_tag` | `true` if `highlightPostTag` was used in this batch, otherwise `false` | false | `Documents Searched POST`, `Documents Searched GET` | +| `formatting.crop_length` | `true` if `cropLength` was used in this batch, otherwise `false` | false | `Documents Searched POST`, `Documents Searched GET` | +| `formatting.crop_marker` | `true` if `cropMarker` was used in this batch, otherwise `false` | false | `Documents Searched POST`, `Documents Searched GET` | +| `formatting.matches` | `true` if `matches` was used in this batch, otherwise `false` | false | `Documents Searched POST`, `Documents Searched GET` | | `primary_key` | Value given for the `primaryKey` parameter if used, otherwise `null` | id | `Index Created`, `Index Updated`, `Documents Added`, `Documents Updated`| | `payload_type` | All `payload_type` encountered in this batch | ["application/json", "text/plain", "application/x-ndjson"] | `Documents Added`, `Documents Updated` | | `index_creation` | `true` if a document addition or update request triggered index creation in this batch, otherwise `false` | true | `Documents Added`, `Documents Updated` | @@ -206,6 +211,11 @@ This property allows us to gather essential information to better understand on | q.max_terms_number | The maximum number of terms for the `q` parameter among all requests in the aggregated event. | `5` | | pagination.max_limit | The maximum limit encountered among all requests in the aggregated event. | `20` | | pagination.max_offset | The maxium offset encountered among all requests in the aggregated event. | `1000` | +| formatting.highlight_pre_tag | Does `highlightPreTag` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.highlight_post_tag | Does `highlightPostTag` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.crop_length | Does `cropLength` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.crop_marker | Does `cropMarker` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.matches | Does `matches` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | --- @@ -228,6 +238,11 @@ This property allows us to gather essential information to better understand on | q.max_terms_number | The maximum number of terms for the `q` parameter among all requests in the aggregated event. | `5` | | pagination.max_limit | The maximum limit encountered among all requests in the aggregated event. | `20` | | pagination.max_offset | The maxium offset encountered among all requests in the aggregated event. | `1000` | +| formatting.highlight_pre_tag | Does `highlightPreTag` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.highlight_post_tag | Does `highlightPostTag` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.crop_length | Does `cropLength` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.crop_marker | Does `cropMarker` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | +| formatting.matches | Does `matches` has been used in the aggregated event? If yes, `true` otherwise `false` | `false` | --- diff --git a/text/0039-_formatted-field-behavior.md b/text/0039-_formatted-field-behavior.md deleted file mode 100644 index d14f8754..00000000 --- a/text/0039-_formatted-field-behavior.md +++ /dev/null @@ -1,474 +0,0 @@ -- Title: _formatted Field Behavior -- Start Date: 2021-05-05 -- Specification PR: [#39](https://github.com/meilisearch/specifications/pull/39) -- MeiliSearch Tracking-Issues: [transplant/#203](https://github.com/meilisearch/transplant/issues/203) - -# _formatted Field Behavior - -## 1. Functional Specification - -### I. Summary - -`_formatted` is used in conjunction with raw search results to highlight and/or crop around the query term in the attributes of the document. `_formatted` main goal is to enhance the UI and the UX by providing a nice way to catch the user's eyes on the front-end side while searching. - -### II. Motivation - -The goal of this specification is to clarify the behavior of `attributesToRetrieve`, `attributesToHighlight` and `attributesToCrop` on `_formatted` response parameter content. - -### III. Additional Materials - -#### Algolia - -By default, Algolia returns `_hightlightResult` even if no `attributesToHighlight` are set at query time. So, by default the value is `*`. -Setting a specific attribute in `attributesToHighlight` will only give this specific attribute in `_highlightResults`. - -Unlike highlighting, snippeting must be proactively enabled for each attribute to snippet, however Algolia authorize the usage of `*` to snippet all attributes. -Setting a specific attribute in `attributesToSnippet` will only give this specific attribute `_snippetResult`. - -### IV. Explanation - -#### Current MeiliSearch Behavior (0.20) - -Given a document made of three fields `title`, `actor`, and `poster`. -``` -{ - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg" -} -``` - -**Example 1** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["*"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg" - } - ] -} -``` - -Since `attributesToHighlight` and `attributesToCrop` are not set, `_formatted` is not computed. - - -**Example 2** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["*"], - "attributesToHighlight": ["wrongFieldName"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg", - "_formatted": { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg", - } - } - ] -} -``` - -The `_formatted` field appears in the search response as soon as the `attributesToHighlight` and/or `attributesToCrop` parameters are sent as a query parameter and are filled with a value representing an existent or inexistent field. Using an inexistent field is similar to setting `attribuesToHighlight`/`attributesToCrop` to `"*"` but the highlight/cropping is not compute in the `_formatted` field. The fields are returned in a raw format. - -**Example 3** - -Given these search parameters: - -``` -{ - "q": "Prince", - "attributesToRetrieve": ["title"], - "attributesToHighlight": ["actor"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "_formatted": { - "actor": "Prince" - } - } - ] -} -``` - -`_formatted` is only filled with the fields set in `attributesToHighlight` despite the fact that the user only ask for `title` in `attributesToRetrieve`. - -**Example 4** - -Given these search parameters: - -``` -{ - "q": "Prince", - "attributesToRetrieve": ["actor", "title"], - "attributesToHighlight": ["actor"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "_formatted": { - "actor": "Prince" - } - } - ] -} -``` - -`_formatted` is only filled with the fields in `attributesToHighlight` despite the fact that the user ask for `actor` and `title` in `attributesToRetrieve`. - -**Example 5** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["*"], - "attributesToHighlight": ["title"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg", - "_formatted": { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg" - } - } - ] -} -``` - -`_formatted` field behavior that is supposed to be controlled by `attributesToHighlight` and `attributesToCrop` is dependent of `attributesToRetrieve`. - -`_formatted` is filled with all the `attributesToRetrieve` despite the fact that the user only ask for one specific field in `attributesToHighlight` or `attributesToCrop`. The highlighing/cropping is only computed on the targeted field in `attributesToHighlight`. Other fields are returned as raw result in `_formatted`. - -**Example 6** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["title"], - "attributesToHighlight": ["*"] -} -``` - -As a user i get: -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "_formatted": { - "title": "Prince Avalanche" - } - } - ] -} -``` - -`_formatted` is only filled with the `attributesToRetrieve` fields despite the fact that the user may wants all fields to be in `_formatted` and be highlighted or cropped given `attributesToHighlight` or `attributesToCrop` values. - -#### Current Transplant/Milli Behavior (0.21) - -**Example 1** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["*"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg" - } - ] -} -``` - -Since `attributesToHighlight` and `attributesToCrop` are not set, `_formatted` is not computed. - -Same behavior as v0.20 release. - -**Example 2** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["*"], - "attributesToHighlight": ["wrongFieldName"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg", - } - ] -} -``` - -Unlike v0.20, if no field are matched to be formatted, `_formatted` is not computed nor returned in response. - -**Example 3** - -Given these search parameters: - -``` -{ - "q": "Prince", - "attributesToRetrieve": ["title"], - "attributesToHighlight": ["actor"] -} -``` - -As a user I get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "_formatted": { - "actor": "Prince" - } - } - ] -} -``` - -`_formatted` is only filled with the fields set in `attributesToHighlight` despite the fact that the user only ask for `title` in `attributesToRetrieve`. - -Same behavior as v0.20 release. - -**Example 4** - -Given these search paramters: - -``` -{ - "q": "Prince", - "attributesToRetrieve": ["actor", "title"], - "attributesToHighlight": ["actor"] -} -``` - -As a user i get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "_formatted": { - "actor": "Prince" - } - } - ] -} -``` - -Same behavior as v0.20 release. - -**Example 5** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["*"], - "attributesToHighlight": ["title"] -} -``` - -As a user i get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "actor": "Prince", - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg", - "_formatted": { - "title": "Prince Avalanche" - } - } - ] -} -``` - -Unlike v0.20, `_formatted` is only containing fields set in `attributesToHighlight` and `attributesToCrop`. Independently from `attributesToRetrieve` value. - -**Example 6** - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["title"], - "attributesToHighlight": ["*"] -} -``` - -As a user i get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "_formatted": { - "title": "Prince Avalanche", - "actor": "Prince, - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg" - } - } - ] -} -``` - -Unlike v0.20, `attributesToHighlight` set fields to be in `_formatted` independently from `attributesToRetrieve`. - -#### Expected MeiliSearch Behavior (0.21) - -✅ If `attributesToRetrieve` is not set as a parameter, the expected behavior is the same as if `attributesToRetrieve` is equal to `*`. -✅ If `attributesToHighlight` and `attributesToCrop` are not set, do not return `_formatted` and don't compute highlights and crops. -✅ If cumulated fields in `attributesToHighlight` and `attributesToCrop` resolve to only having non-existent fields, do not return `_formatted`. -✅ If `attributesToRetrieve` is equal to `*` and `attributesToHighlight` or `attributesToCrop` are equals to `*`, return `_formatted` and compute highlights and crops on each fields. -✅ If `attributesToRetrieve` is equal to `*` and `attributesToHighlight` or `attributesToCrop` contains a set of fields, return `_formatted` containing fields declared in `attributesToRetrieve` and compute highlights and crops on fields declared in `attributesToHighlight` or `attributesTocCrop`. - -**Edge cases** - -Given these search parameters: - -``` -{ - "q": "Prince", - "attributesToRetrieve": ["title"], - "attributesToHighlight": ["actor"] -} -``` - -I want to get: - -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "_formatted": { - "title": "Prince Avalanche", - "actor": "Prince" - } - } - ] -} -``` - -✅ Stay consistent with the fact that every `attributesToRetrieve` are set in `_formatted` result but do not need to be necessary computed for highlighting and cropping until they are declared in `attributesToHighlight` and `attributesToCrop`. - -Given these search parameters: -``` -{ - "q": "prince", - "attributesToRetrieve": ["title"], - "attributesToHighlight": ["*"] -} -``` - -I want to get: -``` -{ - "hits": [ - { - "title": "Prince Avalanche", - "_formatted": { - "title": "Prince Avalanche", - "actor": "Prince, - "poster": "https://image.tmdb.org/t/p/w1280/3KHiQt54usbHyIjLIMzaDAoIJNK.jpg" - } - } - ] -} -``` - -✅ If `attributesToHighlight` or `attributesToCrop` contains a field that is not declared in `attributesToRetrieve`, it is added to `_formatted` and it is highlighted and/or cropped. - -### V. Impact on Documentation -N/A - -### VI. Impact on SDKs -N/A - -## 2. Technical Aspects -N/A - -## 3. Future Possibilities - -- Add support of `-name_of_attribute` to remove a specific attribute when using `*`. E.g. `"attributesToHighlight": "*, -title"`. All except title. This is really useful when having a lot of attributes in a document. It could work at least for `attributesToRetrieve`, `attributesToHighlight` and, `attributesToCrop`. The usage of these operator will be disjunctive between `attributesToHighlight` and `attributesToCrop` parameters. -- Rename `attributesToRetrieve` diff --git a/text/0089-tenant-tokens.md b/text/0089-tenant-tokens.md index d053674b..8e53977f 100644 --- a/text/0089-tenant-tokens.md +++ b/text/0089-tenant-tokens.md @@ -225,7 +225,7 @@ is equivalent to ``` --- -> The `filter` field accepts an array, a string, and the mixed syntax as described in the [filter and facet specification](0027-filter-and-facet-behavior.md). +> The `filter` field accepts an array, a string, and the mixed syntax as described in the [Search Endpoints Specification](0118-search-api.md#312-filter). ##### 3.2.2.3. Payload example diff --git a/text/0118-search-api.md b/text/0118-search-api.md index cfbb8b2b..3a255138 100644 --- a/text/0118-search-api.md +++ b/text/0118-search-api.md @@ -1,27 +1,25 @@ -- Title: Search API -- Start Date: 2022-02-27 - # Search API -## 1. Functional Specification +## 1. Summary -### 1.1. Summary +The search endpoints retrieve documents from an index. Their returned documents are considered relevant based on the settings of the index and the provided search parameters. -The search endpoints permit to retrieve documents within an index that are the most relevant given a set of parameters forming a search query. +## 2. Motivation +N/A -### 1.2. Explanation +## 3. Functional Specification -Meilisearch exposes 2 routes to perform searches: +Meilisearch exposes 2 routes to perform search requests: - GET `indexes/:index_uid/search` - POST `indexes/:index_uid/search` - 🔴 If the index does not exist, the API returns an [index_not_found](0061-error-format-and-definitions.md#index_not_found) error. -If the instance is secured by a master-key, the auth layer returns the following errors: +If a master key is used to secure a Meilisearch instance, the auth layer returns the following errors: - 🔴 Accessing these routes without the `Authorization` header returns a [missing_authorization_header](0061-error-format-and-definitions.md#missing_authorization_header) error. -- 🔴 Accessing this route with a key that does not have permissions (i.e. other than the master-key) returns an [invalid_api_key](0061-error-format-and-definitions.md#invalid_api_key) error. +- 🔴 Accessing these routes with a key that does not have permissions (i.e. other than the master key) returns an [invalid_api_key](0061-error-format-and-definitions.md#invalid_api_key) error. `POST` HTTP verb errors: @@ -31,23 +29,26 @@ If the instance is secured by a master-key, the auth layer returns the following - 🔴 Sending an empty payload returns a [missing_payload](0061-error-format-and-definitions.md#missing_payload) error. - 🔴 Sending an invalid JSON payload returns a [malformed_payload](0061-error-format-and-definitions.md#malformed_payload) error. -#### 1.2.1. Search payload parameters - -| Field | Type | Required | -|-------------------------|---------------------------|----------| -| q | String | False | -| filter | Array of String - String | False | -| sort | Array of String - String | False | -| facetsDistribution | Array of String - String | False | -| limit | Integer | False | -| offset | Integer | False | -| attributesToRetrieve | Array of String - String | False | -| attributesToHighlight | Array of String - String | False | -| attributesToCrop | Array of String - String | False | -| cropLength | Integer | False | -| matches | Boolean | False | - -##### 1.2.1.1 `q` +### 3.1. Search Payload Parameters + +| Field | Type | Required | +|-------------------------------------------------------|---------------------------|----------| +| [`q`](#311-q) | String | False | +| [`filter`](#312-filter) | Array of String - String | False | +| [`sort`](#313-sort) | Array of String - String | False | +| [`facetsDistribution`](#314-facetsdistribution) | Array of String - String | False | +| [`limit`](#315-limit) | Integer | False | +| [`offset`](#316-offset) | Integer | False | +| [`attributesToRetrieve`](#317-attributestoretrieve) | Array of String - String | False | +| [`attributesToHighlight`](#318-attributestohighlight) | Array of String - String | False | +| [`highlightPreTag`](#319-highlightpretag) | String | False | +| [`highlightPostTag`](#3110-highlightposttag) | String | False | +| [`attributesToCrop`](#3111-attributestocrop) | Array of String - String | False | +| [`cropLength`](#3112-croplength) | Integer | False | +| [`cropMarker`](#3113-cropmarker) | String | False | +| [`matches`](#3114-matches) | Boolean | False | + +#### 3.1.1. `q` - Type: String - Required: False @@ -57,13 +58,15 @@ If the instance is secured by a master-key, the auth layer returns the following - 🔴 Sending a value with a different type than `String` or `null` for `q` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -> When q isn't specified, Meilisearch performs a **placeholder search**. A placeholder search returns all searchable documents in an index, modified by any search parameters used and sorted by that index's custom ranking rules. If the index has no sort or custom ranking rules, the results are returned in the order of their internal database position. +> When q isn't specified, Meilisearch performs a **placeholder search**. A placeholder search returns all searchable documents in an index, modified by any search parameters used and sorted by that index's custom ranking rules. If the index has no sort search parameter or custom ranking rules, the results are returned in the order of their internal database position. > Meilisearch only considers the first ten words of any given search query to deliver a fast search-as-you-type experience. -> `q` supports [Phrase Query](0043-phrase-query.md) expression. +> `q` supports the [Phrase Query](0043-phrase-query.md) expression. -##### 1.2.1.2 `filter` +- 🔴 Sending a value with a different type than `String` or `null` for `q` returns an [bad_request](0061-error-format-and-definitions.md#bad_request) error. + +#### 3.1.2. `filter` - Type: Array of String (POST) | String (POST/GET) - Required: False @@ -77,15 +80,70 @@ Attributes used as filter criteria must be added to the `filterableAttributes` l - 🔴 Sending an invalid syntax for `filter` returns an [invalid_filter](0061-error-format-and-definitions.md#invalid_filter) error. - 🔴 Sending a field not defined as a `filterableAttributes` for `filter` returns an [invalid_filter](0061-error-format-and-definitions.md#invalid_filter) error. -> See [Filter And Facet Behavior](0027-filter-and-facet-behavior.md) +##### 3.1.2.1. Supported Operators And Syntaxes. + +###### 3.1.2.1.1. Supported Operators + +- `<`, `<=`, `>`, `>=`, `TO`; only operate on number values. MeiliSearch returns only documents that have numbers in this field. +- `=`, `!=`/`NOT`; operate on string and number values. MeiliSearch returns only documents that have numbers, strings, or arrays of strings in this field. +- `AND`/`OR`; permits to cumulate several operations. + +`filter` cannot operate on `null`, arrays of "undefined elements" (ex: array of `null`). + +###### 3.1.2.1.2. Supported Syntaxes + +Three syntaxes will be accepted for the `filter` parameter during search. `String syntax`, `Array syntax` and `Mixed syntax`. + +**String syntax** + +The string syntax uses operators combined with parentheses to express a search filter. + +Example: +```json +{ + "filter": "(genres = Comedy OR genres = Romance) AND director = 'Mati Diop'" +} +``` + +**Array syntax** + +The array syntax uses dimensional array to express logical connectives. + +- Inner arrays elements are connected by an `OR` operator (e.g. [["genres:Comedy", "genres:Romance"]]). +- Outer arrays elements are connected by an `AND` operator (e.g. ["genres:Romance", "director:Mati Diop"]). + +Example: +```json +{ + "filter": [["genres = Comedy", "genres = Romance"], "director = 'Mati Diop'"] +} +``` + +**Mixed syntax** + +The mixed syntax can mix string and array syntaxes. + +Let's say that we want to translate +```json +{ + "filter": "((genres = Comedy AND genres = Romance) OR genres = Action) AND director != 'Mati Diop'" +} +``` +Example: +```json +{ + "filter": [["genres = Comedy AND genres = Romance", "genres = Action"], "NOT director = comedy"] +} +``` +> Note that string values that are longer than a single word need to be enclosed by quote. `"director = Mati Diop"` will lead to a parsing error. The valid syntax is `"director = 'Mati Diop'"`. -##### 1.2.1.3 `sort` +#### 3.1.3. `sort` - Type: Array of String (POST) | String (GET) - Required: False - Default: `[]|null` -`sort` contains a sort expression written as a string or an array of strings. It permits to sorts search results at query time according to the specified attributes and indicated order. +`sort` contains a sort expression written as a string or an array of strings. It sorts the search results at query time according to the specified attributes and indicated order. Attributes used as sort criteria must be added to the `sortableAttributes list of an index settings. See [Sortable Attributes Setting API](0123-sortable-attributes-setting-api.md). @@ -95,7 +153,7 @@ Attributes used as sort criteria must be added to the `sortableAttributes list o > See [Sort](0055-sort.md) -##### 1.2.1.4 `facetsDistribution` +#### 3.1.4. `facetsDistribution` - Type: Array of String (POST) | String (GET) - Required: False @@ -108,26 +166,24 @@ It returns the number of documents matching the current search query for each sp This parameter can take two values: - An array of attributes: `facetsDistribution=["attributeA", "attributeB", …]` -- An asterisk `"*"` — this returns a count for all facets present in `filterableAttributes` +- A wildcard `"*"` — this returns a count for all facets present in `filterableAttributes` Attributes used in `facetsDistribution` must be added to the `filterableAttributes` list of an index settings. See [Filterable Attributes Setting API](0123-filterable-attributes-setting-api.md). - 🔴 Sending a value with a different type than `Array of String`(POST), `String`(GET) or `null` for `facetsDistribution` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. - 🔴 Sending a field not defined as a `filterableAttributes` for `facetsDistribution` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -> See [Filter And Facet Behavior](0027-filter-and-facet-behavior.md) - -##### 1.2.1.5 `limit` +#### 3.1.5. `limit` - Type: Integer - Required: False - Default: `20` -Sets the maximum number of documents to be returned by the current search query. +Sets the maximum number of documents to be returned for the search query. -- 🔴 Sending a value with a different type than `Integer` or `null` for `limit` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. +- 🔴 Sending a value with a different type than `Integer` for `limit` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -##### 1.2.1.6 `offset` +#### 3.1.6. `offset` - Type: Integer - Required: False @@ -135,129 +191,494 @@ Sets the maximum number of documents to be returned by the current search query. Sets the starting point in the search results, effectively skipping over a given number of documents. -- 🔴 Sending a value with a different type than `Integer` or `null` for `offset` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. +- 🔴 Sending a value with a different type than `Integer` for `offset` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -##### 1.2.1.7 `attributesToRetrieve` +#### 3.1.7. `attributesToRetrieve` - Type: Array of String (POST) | String (GET) - Required: False -- Default: `[]|null` +- Default: `["*"]`, meaning all the attributes Configures which attributes will be retrieved in the returned documents. -If no value is specified, `attributesToRetrieve` uses the `displayedAttributes` list, which by default contains all attributes found in the documents. +If no value is specified, the default value of `attributesToRetrieve` is used (`["*"]`). This corresponds to the `displayedAttributes` index setting, which by default contains all attributes found in the documents. -> If an attribute has been removed from `displayedAttributes` index settings, `attributesToRetrieve` will silently ignore it and the field will not appear in the returned documents. +> If an attribute is missing from `displayedAttributes` index setting, `attributesToRetrieve` silently ignore it, and the field doesn't appear in the returned search results. - 🔴 Sending a value with a different type than `Array of String`(POST), `String`(GET) or `null` for `attributesToRetrieve` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -##### 1.2.1.8 `attributesToHighlight` +#### 3.1.8. `attributesToHighlight` -- Type: Array[String](POST)|String(GET) +- Type: Array of String (POST) | String(GET) - Required: False - Default: `[]|null` -Highlights matching query terms in the specified attributes by enclosing them in `` tags. +Configures which fields may have highlighted parts, given that they match the requested query terms (i.e. the terms in the [`q`](#311-q) search parameter). Pre/post highlighting tags are applied around each word corresponding to a query term. + +If `attributesToHighlight` is present in the search query, the search results will include a `_formatted` object containing the attributes and their highlighted parts. For more detailed regarding the `_formatted` behavior, see the [3.2.1.1.2. `_formatted`](#32112-formatted) section. -When this parameter is set, returned documents include a `_formatted` object containing the highlighted terms. +If `"*"` is provided as a value (`attributesToHighlight=["*"]`), all the attributes present in `displayedAttributes` setting will be highlighted. -If `"*"` is provided as a value: `attributesToHighlight=["*"]` all the attributes present in `attributesToRetrieve` will be assigned to `attributesToHighlight`. +Highlighted parts are surrounded by the [`highlightPreTag`](#319-highlightpretag) and [`highlightPostTag`](#3110-highlightposttag) parameters. + +`attributesToHighlight` only works on values of the following types: `string`, `number`, `array`, `object`. When highlighted, number attributes are transformed to string. - 🔴 Sending a value with a different type than `Array[String]`(POST), `String`(GET) or `null` for `attributesToHighlight` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -> See [_Formatted Field Behavior](0039-_formatted-field-behavior_.md) +##### 3.1.8.1. searchableAttributes + +Attributes not defined in the `searchableAttributes` index setting are also highlighted if assigned to `attributesToHighlight`. + +##### 3.1.8.2. stopWords + +Attributes defined in the `stopWords` index setting are also highlighted if matched. + +##### 3.1.8.3. Tokenizer Separators + +Tokenizer separators are not highlighted. + +##### 3.1.8.4. synonyms + +Synonyms are also highlighted. + +#### 3.1.9. `highlightPreTag` + +- Type: String +- Required: False +- Default: `""` + +Specifies the string to put **before** every highlighted query terms. + +This parameter is applied to the fields configured in `attributesToHighlight`. If there are none, this parameter has no effect. See [3.1.8. `attributesToHighlight`](#318-attributestohighlight) section. + +- 🔴 Sending a value with a different type than `String` for `highlightPreTag` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -##### 1.2.1.9 `attributesToCrop` +If `attributesToHighlight` is omitted while `highlightPreTag` is specified, there is no error. + +#### 3.1.10. `highlightPostTag` + +- Type: String +- Required: False +- Default: `""` + +Specifies the string to put **after** the highlighted query terms. + +This parameter is applied to the fields from `attributesToHighlight`. If there are none, this parameter has no effect. See [3.1.8. `attributesToHighlight`](#318-attributestohighlight) section. + +- 🔴 Sending a value with a different type than `String` for `highlightPostTag` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. + +If `attributesToHighlight` is omitted while `highlightPostTag` is specified, there is no error. + +#### 3.1.11. `attributesToCrop` - Type: Array[String]|String - Required: False - Default: `[]|null` -Crops the selected attributes' values in the returned results to the length indicated by the `cropLength` parameter. +Defines document attributes to be cropped. Cropped attributes have their values shortened around query terms. + +If `attributesToCrop` is present in the search query, the search results will include a `_formatted` object containing the attributes and their cropped parts. For more detailed regarding the `_formatted` behavior, see the [3.2.1.1.2. `_formatted`](#32112-formatted) section. -When this parameter is set, returned documents include a `_formatted` object containing the cropped terms. +If `"*"` is provided as a value (`attributesToCrop=["*"]`), all the attributes present in `displayedAttributes` setting will be cropped. -Optionally, indicating a custom crop length for any of the listed attributes is possible: `attributesToCrop=["attributeNameA:25", "attributeNameB:150"]`. The custom crop length set in this way has priority over the `cropLength` parameter. +The number of words contained in the cropped value is defined by the `cropLength` parameter. See [3.1.1.12. `cropLength`](#3112-croplength) section. -Instead of supplying individual attributes, it is possible to provide `["*"]` as a value: `attributesToCrop=["*"]`. This will crop the values of all attributes present in `attributesToRetrieve`. +The value of `cropLength` can be customized per attribute. See [3.1.12.1. Custom `cropLength` Defined Per Cropped Attribute](#31121-custom-croplength-defined-per-attribute) section. + +The engine adds a marker by default in front of and/or behind the part selected by the cropper. This marker is customizable. See [3.1.13. `cropMarker`](#3113-cropmarker) section. - 🔴 Sending a value with a different type than `Array[String]`(POST), `String`(GET) or `null` for `attributesToCrop` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -> See [_Formatted Field Behavior](0039-_formatted-field-behavior_.md) +##### 3.1.11.2. searchableAttributes + +Attributes configured in `attributesToCrop` are cropped even if not present in the `searchableAttributes` index setting. + +##### 3.1.11.3. stopWords + +Terms defined in the `stopWords` index setting are counted as words regarding `cropLength`. + +##### 3.1.11.3. Tokenizer Separators -##### 1.2.1.10 `cropLength` +Tokenizer separators aren't counted as words regarding `cropLength`. + +#### 3.1.12. `cropLength` - Type: Integer - Required: False -- Default: `200` +- Default: `10` + +Sets the total number of **words** to keep for the cropped part of an attribute specified in the `attributesToCrop` parameter. It means that if `10` is set for `cropLength`, the cropped part returned in `_formatted` will only be 10 words long. + +This parameter is applied to the fields from `attributesToCrop`. If there are none, this parameter has no effect. See [3.1.11. `attributesToCrop`](#3111-attributestocrop) section. + +Sending a `0` value deactivates the cropping unless a custom crop length is defined for an attribute inside `attributesToCrop`. + +- 🔴 Sending a value with a different type than `Integer` for `cropLength` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. + +##### 3.1.12.1. Custom `cropLength` Defined Per Attribute. + +Optionally, indicating a custom crop length for any of the listed attributes is possible: + +`"attributesToCrop":["attributeNameA:15", "attributeNameB:30"]` + +A custom crop length set in this way has priority over the `cropLength` parameter. + +##### 3.1.12.2 Examples + +###### 3.1.12.1.1. Extending around + +Given an attribute defined in `attributesToCrop` containing: + +`"In his ravenous hatred he found no peace, and with boiling blood he scoured the umbral plains, seeking vengence afgainst the dark lords who had robbed him."` + +With `croplength` defined as `5` and `q` defined as `boiling blood`, the cropped value will be: + +`"…and with boiling blood he…"` + +Cropped query terms are counted as a word regarding `cropLength`. + +Sending more query terms than the `cropLength` value has no impact. The cropped part will contain the `cropLength` number. + +###### 3.1.12.1.2. Keeping a phrase context + +After Meilisearch has chosen the best possible match window (some number of words < `cropLength`), it will add words from before or after the match window until the total number is equal to `cropLength`. In doing so, it will attempt to add context to the match window by choosing words from the same sentence(s) where the match window occurs. + +For instance, for the matching word `Split` the text: + +`"Natalie risk her future. Split The World is a book written by Emily Henry. I never read it."` + +will be cropped like: -Configures the number of characters to keep on each side of the matching query term when using the `attributesToCrop` parameter. +`…Split The World is a book written by Emily Henry…` -If `attributesToCrop` is not configured, `cropLength` has no effect on the returned results. +and not like: -- 🔴 Sending a value with a different type than `Integer` or `null` for `cropLength` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. +`Natalie risk her future. Split The World is a book…` -##### 1.2.1.11 `matches` +#### 3.1.13. `cropMarker` + +- Type: String +- Required: False +- Default: `"…"` (U+2026) + +Sets which string to add before and/or after the cropped text. See [3.1.11. `attributesToCrop`](#3111-attributestocrop) section. + +The specified crop marker is applied by following rules outline in section [3.1.13.1. Applying `cropMarker`](#31131-applying-cropmarker). + +Specifying `cropMarker` to `""` or `null` implies that no marker will be applied to the cropped part. + +This parameter is applied to the fields configured in `attributesToCrop`. If there are none, this parameter has no effect. See [3.1.11. `attributesToCrop`](#3111-attributestocrop) section. + +- 🔴 Sending a value with a different type than `String` or `null` for `cropMarker` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. + +##### 3.1.13.1. Applying `cropMarker` + +###### 3.1.13.1.1. Matched Part To Be Cropped + +The cropping algorithm tries to match the window with the highest density of query terms within the `cropLength` limit. + +The cropping algorithm tries to find the crop window that contains the most relevant matches. + +1. That has the highest count of unique matches + +For example, for the query terms `split the world`, then the interval `the split the split the` has `5` matches but only `2` unique matches (`1` for `split` and `1` for `the`) where the interval `split of the world` has `3` matches and `3` unique matches. So the interval `split of the world` is considered better. + +2. That have the minimum distance between matches + +For example, for the query terms `split the world`, then the interval `split of the world` has a distance of `3` (`2` between `split` and `the`, and `1` between `the` and `world`) where the interval `split the world` has a distance of `2`. So the interval `split the world` is considered better. + +3. That have the highest count of ordered matches + +For example, for the query terms `split the world`, then the interval `the world split` has `2` ordered words where the interval `split the world` has `3`. So the interval `split the world` is considered better. + +Only one cropped part from an attribute is returned. + +If no part is found when selecting a part to be cropped, the returned value in `_formatted` will start at the beginning of the attribute and include a number of words equal to `cropLength`. + +###### 3.1.13.1.2. Positioning Markers + +If the cropped part has been matched against query terms and contains the beginning of the attribute to be cropped, the `cropMarker` is not placed to the left of the cropped part. + +If the cropped part has been matched against query terms and contains the end of the attribute to be cropped, the `cropMarker` is not placed to the right of the cropped part. + +#### 3.1.14. `matches` - Type: Boolean - Required: False - Default: `false` -Adds a `_matchesInfo` object to the search response that contains the location of each occurrence of queried terms across all fields. This is useful when more control is needed than offered by the built-in highlighting/cropping features. +Adds a `_matchesInfo` object to the search response that contains the location of each occurrence of queried terms across all fields. The given positions are in bytes. + +It's useful when more control is needed than offered by the built-in highlighting/cropping features. - 🔴 Sending a value with a different type than `Boolean` or `null` for `matches` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error. -#### 1.2.2. Search response +### 3.2. Search Response Properties -| Field | Type | Required | -|-------------------------|------------------------------|----------| -| hits | Array[Hit] | True | -| limit | Integer | True | -| offset | Integer | True | -| nbHits | Integer | True | -| exhaustiveNbHits | Boolean | True | -| facetsDistribution | Object | False | -| exhaustiveFacetsCount | Boolean | False | -| processingTimeMs | Integer | True | -| query | String | True | +| Field | Type | Required | +|-------------------------------------------------------|------------------------------|----------| +| [`hits`](#321-hits) | Array[Hit] | True | +| [`limit`](#322-limit) | Integer | True | +| [`offset`](#323-offset) | Integer | True | +| [`nbHits`](#324-nbhits) | Integer | True | +| [`exhaustiveNbHits`](#325-exhaustivenbhits) | Boolean | True | +| [`facetsDistribution`](#326-facetsdistribution) | Object | False | +| [`exhaustiveFacetsCount`](#327-exhaustivefacetscount) | Boolean | False | +| [`processingTimeMs`](#328-processingtimems) | Integer | True | +| [`query`](#329-query) | String | True | -##### 1.2.2.1 `hits` +#### 3.2.1. `hits` - Type: Array[Hit] - Required: True -Results of the query as an array of documents. +Results of the search query as an array of documents. + +> Hit object represents a matched document as a search result. + +> The search parameters `attributesToRetrieve` influence the returned payload for a hit. See [3.1.7. `attributesToRetrieve`](#317-attributestoretrieve) section. + +A search result can contain special properties. See [3.2.1.1. `hit` Special Properties](#3211-hits-special-properties) section. -> The search parameters `attributesToRetrieve` influence the returned payload for a document as a search result. See 1.2.1.7 `attributesToRetrieve` section. +##### 3.2.1.1. `hit` Special Properties -> A Hit object that represents a document within the search results can host special attributes. See 1.2.2.9 `hits` special fields section. +| Field | Type | Required | +|--------------------------------------|-------------|----------| +| [`_geoDistance`](#32111-geodistance) | Integer | False | +| [`_formatted`](#32112-formatted) | Object | False | +| [`_matchesInfo`](#32113-matchesinfo) | Object | False | -##### 1.2.2.2 `limit` +###### 3.2.1.1.1. `_geoDistance` + +- Type: Integer +- Required: False + +Search queries using `_geoPoint` returns a `_geoDistance` field containing the distance in meters between the document `_geo` coordinates and the specified `_geoPoint`. + +> See [GeoSearch](0059-geo-search.md) + +###### 3.2.1.1.2. `_formatted` + +- Type: Object +- Required: False + +`_formatted` is an object returned in the search response, only if at least one of the following paramaters has been set in the search query: +- `attributesToHighlight` +- `attributesToCrop` + +If `attributesToHighlight` and `attributesToCrop` are not set, `_formatted` is not returned. + +This `_formatted` object will be present in each returned document in the `hits` field. + +Example: + +```json +{ + "attributesToCrop": ["title"] +} +``` + +```json +{ + "hits": [ + { + "id": 2, + "title": "Pride and Prejudice", + "_formatted": { + "id": "2", + "title": "Pride and Prejudice" + } + }, + { + "id": 456, + "title": "Le Petit Prince", + "_formatted": { + "id": "456", + "title": "Le Petit Prince", + } + } + ], + ... +} +``` + +Which attributes are present in `_formatted`? + +*Remember the main rule: `_formatted` is only present if `attributesToHighlight` or `attributesToCrop` is set.* + +The `_formatted` object contains attributes coming from the original document, depending on the parameters the users set during the search query. Indeed, **`_formatted` contains all the attributes present in `attributesToRetrieve`, `attributesToHighlight`, and `attributesToCrop` combined**. + +Knowing the default value of `attributesToRetrieve` is `["*"]` (so all the attributes present in `displayedAttributes`), if no `attributesToRetrieve` are set in the search query, `_formatted` will return all the `displayedAttributes`. + +Returning attributes in the `_formatted` object does not mean these attributes will be necessarily highlighted or cropped, see the next point. + +Which attributes are highlighted or cropped in `_formatted`? + +No matter which attributes are retrieved in `_formatted` (according to the previous section "Which attributes are present in `_formatted`?"): +- Only the attributes present in `attributesToHighlight` are highlighted. +- Only the attributes present in `attributesToCrop` are cropped. +- Attributes present in both are cropped and highlighted at the same time. + +Some edge cases: +- If cumulated fields in `attributesToHighlight` and `attributesToCrop` resolve to only having non-existent fields, `_formatted` is not returned. + +Some examples: +*The examples work the same with `attributesToCrop`* + +Example 1: + +```json +{ + "q": "t", + "attributesToHighlight": ["title"] +} +``` + +```json +{ + "hits": [ + { + "id": 1, + "title": "The Hobbit", + "author": "J. R. R. Tolkien", + "_formatted": { + "id": "1", + "title": "The Hobbit", + "author": "J. R. R. Tolkien" + } + } + ], + ... +} +``` +-> All the attributes (so `id`, `title` and `author`) are returned in `_formatted` because by default `attributesToRetrieve` is set to `["*"]`. +-> Only `title` is highlighted. + +Example 2: + +```json +{ + "q": "t", + "attributesToHighlight": ["*"] +} +``` + +```json +{ + "hits": [ + { + "id": 1, + "title": "The Hobbit", + "author": "J. R. R. Tolkien", + "_formatted": { + "id": "1", + "title": "The Hobbit", + "author": "J. R. R. Tolkien" + } + } + ], + ... +} +``` +-> `id`, `title` and `author` are returned in `_formatted` because`attributesToHighlight` is set to `["*"]` (but also `attributesToRetrieve` by default). +-> Both `title` and `author` are highlighted because `attributesToHighlight` is set to `["*"]`. + +Example 3: + +```json +{ + "q": "t", + "attributesToRetrieve": ["author"], + "attributesToHighlight": ["title"] +} +``` + +```json +{ + "hits": [ + { + "author": "J. R. R. Tolkien", + "_formatted": { + "title": "The Hobbit", + "author": "J. R. R. Tolkien" + } + } + ], + ... +} +``` +-> Only `author` is returned at the root of the document because defined in the `attributesToRetrieve`. +-> Only `author` and `title` are returned in `_formatted` because the addition of `attributesToRetrieve` and `attributesToHighlight`. +-> Only `title` is highlighted because the only one defined in `attributesToHighlight`. + +Example 4: + +```json +{ + "q": "t", + "attributesToRetrieve": [], + "attributesToHighlight": ["*"] +} +``` + +```json +{ + "hits": [ + { + "_formatted": { + "id": "1", + "title": "The Hobbit", + "author": "J. R. R. Tolkien" + } + } + ], + ... +} +``` +-> No attributes are returned at the root of the document because `attributesToRetrieve` is set to `[]`. +-> All the attributes are returned in `_formatted` because `attributesToHighlight` is set to `["*"]`. +-> All the attributes are highlighted because `attributesToHighlight` is set to `["*"]`. + + +###### 3.2.1.1.3. `_matchesInfo` + +- Type: Object +- Required: False + +Contains the location of each occurrence of queried terms across all fields. The `_matchesInfo` object is added to a search result when the `matches` search parameter is specified to true. + +The beginning of a matching term within a field is indicated by `start`, and its `length` by length. + +`start` and `length` are measured in bytes and not the number of characters. For example, `ü` represents two bytes but one character. + +> See [3.1.14. `matches`](#3114-matches) section. + +#### 3.2.2. `limit` - Type: Integer - Required: True -Gives the `limit` search parameter used for the query. +Returns the `limit` search parameter used for the query. -> See 1.2.1.5 `limit` section. +> See [3.1.5. `limit`](#315-limit) section. -##### 1.2.2.3 `offset` +#### 3.2.3. `offset` - Type: Integer - Required: True -Gives the `offset` search parameter used for the query. +Returns the `offset` search parameter used for the query. -> See 1.2.1.6 `offset` section. +> See [3.1.6. `offset` section](#316-offset) section. -##### 1.2.2.4 `nbHits` +#### 3.2.4. `nbHits` - Type: Integer - Required: True Returns the total number of candidates for the search query. -##### 1.2.2.5 `exhaustiveNbHits` +#### 3.2.5. `exhaustiveNbHits` - Type: Boolean - Required: True @@ -266,17 +687,18 @@ Whether `nbHits` is exhaustive. > Always return `false`. -##### 1.2.2.6 `facetsDistribution` +#### 3.2.6. `facetsDistribution` - Type: Object - Required: False Added to the search response when `facetsDistribution` is set for a search query. It contains the number of remaining candidates for each specified facet in the `facetsDistribution` search parameter. -> See 1.2.1.4 `facetsDistribution` section. -> See [Filter And Facet Behavior](0027-filter-and-facet-behavior.md) +If a field distributed as a facet contains no value, it is returned as a `facetDistribution` field with an empty object as value. + +> See [3.1.4. `facetsDistribution`](#314-facetsdistribution) section. -##### 1.2.2.7 `exhaustiveFacetsCount` +#### 3.2.7. `exhaustiveFacetsCount` - Type: Boolean - Required: False @@ -285,14 +707,14 @@ Whether `facetsDistribution` count is exhaustive. The field `exhaustiveFacetsCou > Always returns `false`. -##### 1.2.2.7 `processingTimeMs` +#### 3.2.8. `processingTimeMs` - Type: Integer - Required: True -Processing time of the search query in milliseconds. +Processing time of the search query in **milliseconds**. -##### 1.2.2.8 `query` +#### 3.2.9. `query` - Type: String - Required: True @@ -300,47 +722,19 @@ Processing time of the search query in milliseconds. Query originating the response. Equals to the `q` search parameter. -> See 1.2.1.1 `q` section. - -##### 1.2.2.9 `hits` special fields - -| Field | Type | Required | -|-------------------------|-------------|----------| -| _geoDistance | Integer | False | -| _formatted | Object | False | -| _matchesInfo | Object | False | - -###### 1.2.2.9.1 `_geoDistance` - -- Type: Integer -- Required: False - -Search queries using `_geoPoint` will always include a `_geoDistance` field containing the distance in meters between the document location and the `_geoPoint`. - -> See [GeoSearch](0059-geo-search.md) - -###### 1.2.2.9.2 `_formatted` - -- Type: Object -- Required: False - -Object containing the cropped/highlighted values of the fields specified in `attributesToHighlight` or/and `attributesToCrop`. - -> See 1.2.1.8 `attributesToHighlight` section and 1.2.1.9 `attributesToCrop` section. - -###### 1.2.2.9.3 `_matchesInfo` - -- Type: Object -- Required: False - -Contains the location of each occurrence of queried terms across all fields. The `_matchesInfo` object is added to a document when `matches` search parameter is specified to true. - -The beginning of a matching term within a field is indicated by start, and its length by length. - -> See 1.2.1.11 `matches` section. +> See [3.1.1. `q`](#311-q) section. ## 2. Technical Details n/a ## 3. Future Possibilities -- Add dedicated errors to replace `bad_request` error. \ No newline at end of file + +- Add dedicated errors to replace `bad_request` error. + +### 3.1. Formatting Search Results + +- Replaces `_matchesInfo` with chars position instead of bytes. It could also be a `mode` to choose `byte` or `char`. +- Move `attributesToHighlight`, `highlightPreTag`, `highlightPostTag`, `attributesToCrop`, `cropLength` and `cropMarker` into a `formatter` objet. +- Add an option to only highlight complete query term. +- Expose the `formatter` resource as an index setting. +- Highlight a phrase search as a single highlighted section.