You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Index config file for receiving logs in OpenTelemetry format.# Link: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#version: 0index_id: otel-logsdoc_mapping:
field_mappings:
- name: timestamptype: i64fast: true
- name: nametype: texttokenizer: default
- name: severitytype: texttokenizer: raw
- name: bodytype: texttokenizer: defaultrecord: position
- name: attributestype: jsonindexing_settings:
timestamp_field: timestampsearch_settings:
default_search_fields: [severity, body]
Create & Ingest some data
cargo r index create --index-config config/tutorials/otel-logs/index-config.yaml --config config/quickwit.yaml
cargo r index ingest --index otel-logs --config config/quickwit.yaml --input-path documents.json
with documents.json:
{"attributes":{"syslog":{"facility":"daemon","procid":7816,"version":2}},"body":"<26>2 2022-05-10T17:59:56.706Z for.us benefritz 7816 ID715 - A bug was encountered but not in Vector, which doesn't have bugs","name":"ID715","resource":{"host":{"hostname":"for.us"},"service":{"name":"benefritz"},"source_type":"demo_logs"},"severity":"ERROR","timestamp":1652205596}
{"attributes":{"syslog":{"facility":"daemon","procid":7816,"version":2}},"body":"<26>2 2022-05-10T17:59:56.706Z for.us benefritz 7816 ID715 - A bug was encountered but not in Vector, which doesn't have bugs","name":"ID715","resource":{"host":{"hostname":"for.us"},"service":{"name":"benefritz"},"source_type":"demo_logs"},"severity":"ERROR","timestamp":1652205596}
{"attributes":{"syslog":{"facility":"daemon","procid":7816,"version":2}},"body":"<26>2 2022-05-10T17:59:56.706Z for.us benefritz 7816 ID715 - A bug was encountered but not in Vector, which doesn't have bugs","name":"ID715","resource":{"host":{"hostname":"for.us"},"service":{"name":"benefritz"},"source_type":"demo_logs"},"severity":"ERROR","timestamp":1652205596}
Failed to search some docs
cargo r index search --index otel-logs --config ./config/quickwit.yaml --query "ERROR"
...
2022-05-10T20:09:59.490Z ERROR quickwit_search::fetch_docs: Error when fetching docs in splits. split_ids=["01G2QRHZ9AHCE25BYCZ8EAN2KW"] error=searcher-doc-async
Caused by:
0: An IO error occurred: 'trailing characters at line 1 column 59'
1: trailing characters at line 1 column 59
Investigation
Where does the error is raised?
The error happens when tantivy tries to deserialize the JSON value
Strangely, it continues working if you add the field "name" but it breaks if you add the field "severity".
Something I don't understand on the serialization/deserialization of tantivy FieldValue
I had a look at how tantivy serializes/deserializes the document FieldValue. One thing that is strange is that we don't set the bytes length of the JSON. This means that we expect serde_json to read bytes until the last byte... This works if the JSON field is the last field to read, but if you have other fields to read after the JSON, it breaks.
I'm almost sure my rationale is wrong somewhere. @fulmicoton where is the flaw in my rationale? :)
The text was updated successfully, but these errors were encountered:
Index config
Create & Ingest some data
with documents.json:
Failed to search some docs
Investigation
Where does the error is raised?
The error happens when tantivy tries to deserialize the JSON value
It does not happen when you have only one field in the schema and this field is of JSON type
I tried to simplify a bit the index and it's working for example with the following doc mapping
Strangely, it continues working if you add the field "name" but it breaks if you add the field "severity".
Something I don't understand on the serialization/deserialization of tantivy
FieldValue
I had a look at how tantivy serializes/deserializes the document
FieldValue
. One thing that is strange is that we don't set the bytes length of the JSON. This means that we expectserde_json
to read bytes until the last byte... This works if the JSON field is the last field to read, but if you have other fields to read after the JSON, it breaks.I'm almost sure my rationale is wrong somewhere. @fulmicoton where is the flaw in my rationale? :)
The text was updated successfully, but these errors were encountered: