Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Improve formatted spec #146

Merged
merged 7 commits into from
May 13, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 183 additions & 15 deletions text/0118-search-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,11 +197,11 @@ Sets the starting point in the search results, effectively skipping over a given

- Type: Array of String (POST) | String (GET)
- Required: False
- Default: `[]|null`
- Default: `["*"]`, meaning all the attributes

Configures which attributes will be retrieved in the returned documents.

If no value is specified, `attributesToRetrieve` uses the `displayedAttributes` index setting, which by default contains all attributes found in the documents.
If no value is specified, the default value of `attributesToRetrieve` is used (`["*"]`). This corresponds to the `displayedAttributes` index setting, which by default contains all attributes found in the documents.

> If an attribute is missing from `displayedAttributes` index setting, `attributesToRetrieve` silently ignore it, and the field doesn't appear in the returned search results.

Expand All @@ -215,9 +215,9 @@ If no value is specified, `attributesToRetrieve` uses the `displayedAttributes`

Configures which fields may have highlighted parts, given that they match the requested query terms (i.e. the terms in the [`q`](#311-q) search parameter). Pre/post highlighting tags are applied around each word corresponding to a query term.

Search results include a `_formatted` object containing the highlighted parts when this parameter is defined. See [3.2.1.1.2. `_formatted`](#32112-formatted) section.
If `attributesToHighlight` is present in the search query, the search results will include a `_formatted` object containing the attributes and their highlighted parts. For more detailed regarding the `_formatted` behavior, see the [3.2.1.1.2. `_formatted`](#32112-formatted) section.

If `"*"` is provided as a value: `attributesToHighlight=["*"]` all the attributes present in `displayedAttributes` setting will be automatically assigned to `_formatted`.
If `"*"` is provided as a value (`attributesToHighlight=["*"]`), all the attributes present in `displayedAttributes` setting will be highlighted.
Copy link
Member Author

@curquiza curquiza May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed every mentions/explanations regarding the _formatted behavior from this part, except to redirect to the dedicated section, to avoid any confusion


Highlighted parts are surrounded by the [`highlightPreTag`](#319-highlightpretag) and [`highlightPostTag`](#3110-highlightposttag) parameters.

Expand Down Expand Up @@ -275,16 +275,16 @@ This parameter is applied to the fields from `attributesToHighlight`. If there a

Defines document attributes to be cropped. Cropped attributes have their values shortened around query terms.

If `attributesToCrop` is present in the search query, the search results will include a `_formatted` object containing the attributes and their cropped parts. For more detailed regarding the `_formatted` behavior, see the [3.2.1.1.2. `_formatted`](#32112-formatted) section.

If `"*"` is provided as a value (`attributesToCrop=["*"]`), all the attributes present in `displayedAttributes` setting will be cropped.

Comment on lines +278 to +281
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the same "format" as for attributesToHighlight. First, talking and redirect to the formatted section, then the * value behavior, and finally the available parameters.

The number of words contained in the cropped value is defined by the `cropLength` parameter. See [3.1.1.12. `cropLength`](#3112-croplength) section.

The value of `cropLength` can be customized per attribute. See [3.1.12.1. Custom `cropLength` Defined Per Cropped Attribute](#31121-custom-croplength-defined-per-attribute) section.

The engine adds a marker by default in front of and/or behind the part selected by the cropper. This marker is customizable. See [3.1.1.13. `cropMarker`](#31113-cropmarker) section.

Search results include a `_formatted` object containing the cropped attributes representation when this parameter is defined. See [3.2.1.1.2. `_formatted`](#32112-formatted) section.

If `"*"` is provided as a value: `attributesToCrop=["*"]` all the attributes present in the `displayedAttributes` setting will be automatically assigned to `_formatted`.

- 🔴 Sending a value with a different type than `Array[String]`(POST), `String`(GET) or `null` for `attributesToCrop` returns a [bad_request](0061-error-format-and-definitions.md#bad_request) error.

##### 3.1.11.2. searchableAttributes
Expand Down Expand Up @@ -460,14 +460,182 @@ Search queries using `_geoPoint` returns a `_geoDistance` field containing the d
- Type: Object
- Required: False

`_formatted` returns highlighted and cropped attributes specified in `attributesToHighlight` and/or `attributesToCrop` of a search result.
`_formatted` is an object returned in the search response, only if at least one of the following paramaters has been set in the search query:
- `attributesToHighlight`
- `attributesToCrop`

If `attributesToHighlight` and `attributesToCrop` are not set, `_formatted` is not returned.

This `_formatted` object will be present in each returned document in the `hits` field.

Example:

```json
{
"q": "",
curquiza marked this conversation as resolved.
Show resolved Hide resolved
"attributesToCrop": ["title"]
}
```

```json
{
"hits": [
{
"id": 2,
"title": "Pride and Prejudice",
"_formatted": {
"id": "2",
"title": "Pride and Prejudice"
}
},
{
"id": 456,
"title": "Le Petit Prince",
"_formatted": {
"id": "456",
"title": "Le Petit Prince",
}
}
],
...
}
```

Which attributes are present in `_formatted`?

The `_formatted` object will contain attributes coming from the original document, depending on the parameters the users set during the search query. Indeed, **the attributes present in `_formatted` are the addition of the attributes present in `attributesToRetrieve`, `attributesToHighlight`, and `attributesToCrop`**.
curquiza marked this conversation as resolved.
Show resolved Hide resolved

Kmowing the default value of `attributesToRetrieve` is `["*"]` (so all the attributes present in `displayedAttributes`), if no `attributesToRetrieve` are set in the search query, `_formatted` will return all the `displayedAttributes`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that’s not the case 🤔
If you don’t send any attributesToRetrieve then your _formatted field should be empty no?

{ "q": "hello" }

Should not return any _formatted.

Copy link
Member

@gmourier gmourier May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a clarification about the _formatted behavior when not sending attributesToRetrieve but sending attributesToHighlight or attributesToCrop (the previous sentence explains that _formatted will be present only if attributesToHighlight or attributesToCrop are specified).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to clarify this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an update, tell me if it's better

curquiza marked this conversation as resolved.
Show resolved Hide resolved

Returning attributes in the `_formatted` object does not mean these attributes will be necessarily highlighted or cropped, see the next point.

- If `attributesToHighlight` and `attributesToCrop` are not set, `_formatted` is not returned.
Which attributes are highlighted or cropped in `_formatted`?

No matter how many attributes are retrieved in `_formatted` following the previous rule:
curquiza marked this conversation as resolved.
Show resolved Hide resolved
- Only the attributes present in `attributesToHighlight` are highlighted.
- Only the attributes present in `attributesToCrop` are cropped.
- Attributes present in both are cropped and highlighted at the same time.

Some edge cases:
- If cumulated fields in `attributesToHighlight` and `attributesToCrop` resolve to only having non-existent fields, `_formatted` is not returned.
- If `attributesToRetrieve` is equal to `*` and `attributesToHighlight` or `attributesToCrop` are equals to `*`, `_formatted` is returned and contains `displayedAttributes` setting fields then compute highlights and crops on each received fields.
- If `attributesToRetrieve` is equal to `*` and `attributesToHighlight` or `attributesToCrop` contains a set of fields, `_formatted` is returned and contains `displayedAttributes` setting fields but only compute highlights and crops on fields declared in `attributesToHighlight` or `attributesToCrop`.
- If a list of fields is defined for `attributesToRetrieve` and `attributesToHighlight` / `attributesToCrop` are equals to `*`, `_formatted` is returned and contains `displayedAttributes` setting fields then compute highlights and crops on each received fields.
- If a list of fields is defined for `attributesToRetrieve` and `attributesToHighlight` / `attributesToCrop` contains a list of fields, `_formatted` is returned and contains `attributesToRetrieve` fields, plus the fields set in `attributesToHighlight` or `attributesToCrop` then compute highlights and crops only for fields defined in `attributesToHighlight` / `attributesToCrop` parameters.

Some examples:
Copy link
Member Author

@curquiza curquiza May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples are here to help, but do NOT replace the explanation: there are just here to illustrate the sentences above.
The sentences above should replace each point you wrote in the previous list (that I erased). Ensure this is the case 😇

Copy link
Member

@gmourier gmourier May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO It's much clearer that way, I think it's inevitable to have examples even if I'd like to avoid them to make the specifications lighter. Thank you for this!

*The examples work the same with `attributesToCrop`*

Example 1:

```json
{
"q": "t",
"attributesToHighlight": ["title"]
}
```

```json
{
"hits": [
{
"id": 1,
"title": "The Hobbit",
"author": "J. R. R. Tolkien",
"_formatted": {
"id": "1",
"title": "<em>T</em>he Hobbit",
"author": "J. R. R. Tolkien"
}
}
],
...
}
```
-> All the attributes (so `id`, `title` and `author`) are returned in `_formatted` because by default `attributesToRetrieve` is set to `["*"]`.
-> Only `title` is highlighted.

Example 2:

```json
{
"q": "t",
"attributesToHighlight": ["*"]
}
```

```json
{
"hits": [
{
"id": 1,
"title": "The Hobbit",
"author": "J. R. R. Tolkien",
"_formatted": {
"id": "1",
"title": "<em>T</em>he Hobbit",
"author": "J. R. R. <em>T</em>olkien"
}
}
],
...
}
```
-> `id`, `title` and `author` are returned in `_formatted` because`attributesToHighlight` is set to `["*"]` (but also `attributesToRetrieve` by default).
-> Both `title` and `author` are highlighted because `attributesToHighlight` is set to `["*"]`.

Example 3:

```json
{
"q": "t",
"attributesToRetrieve": ["author"],
"attributesToHighlight": ["title"]
}
```

```json
{
"hits": [
{
"author": "J. R. R. Tolkien",
"_formatted": {
"title": "<em>T</em>he Hobbit",
"author": "J. R. R. Tolkien"
}
}
],
...
}
```
-> Only `author` is returned at the root of the document because defined in the `attributesToRetrieve`.
-> Only `author` and `title` are returned in `_formatted` because the addition of `attributesToRetrieve` and `attributesToHighlight`.
-> Only `title` is highlighted because the only one defined in `attributesToHighlight`.

Example 4:

```json
{
"q": "t",
"attributesToRetrieve": [],
"attributesToHighlight": ["*"]
}
```

```json
{
"hits": [
{
"_formatted": {
"id": "1",
"title": "<em>T</em>he Hobbit",
"author": "J. R. R. <em>T</em>olkien"
}
}
],
...
}
```
-> No attributes are returned at the root of the document because `attributesToRetrieve` is set to `[]`.
-> All the attributes are returned in `_formatted` because `attributesToHighlight` is set to `["*"]`.
-> All the attributes are highlighted because `attributesToHighlight` is set to `["*"]`.


###### 3.2.1.1.3. `_matchesInfo`

Expand Down Expand Up @@ -566,4 +734,4 @@ n/a
- Move `attributesToHighlight`, `highlightPreTag`, `highlightPostTag`, `attributesToCrop`, `cropLength` and `cropMarker` into a `formatter` objet.
- Add an option to only highlight complete query term.
- Expose the `formatter` resource as an index setting.
- Highlight a phrase search as a single highlighted section.
- Highlight a phrase search as a single highlighted section.