Aborting aggregation because memory limit was exceeded #3837

yangshike · 2023-09-15T02:04:15Z

The quickwit grafana plugin looks great, but there are also some issues with its use：

Through the Grafana plugin, I only searched for data within an hour, which is around a few million, but the volumes logs reported an error：

Failed to load log volume for this query
Internal error: Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 1.36 PB.

because I added a condition to the Lucene query. If I didn't add a condition, it would be fine

other ：
1、Can the level fields and their values in the data source configuration be customized so that I can distinguish between other types of data, rather than just general log levels

2、Can I use Quickwit as a data source to create reports similar to curve and pie charts,

3、Or support aggregation in languages such as count, group by, etc. on Granafa
and Comparison of some calculations： > 、>=、 < 、!= and so on

The text was updated successfully, but these errors were encountered:

PSeitz · 2023-09-15T04:02:41Z

Can you share the aggregation query that is send to the backend?

What condition did you add?

yangshike · 2023-09-15T05:13:46Z

In addition to the general log, we also have some other data without these fields (error, info)
other:
There is also a Message field name in the current plugin
Can only configure one, would you like to write more

yangshike · 2023-09-15T05:49:43Z

A certain index has been running for some time, and I want to modify one field to fast: true. Do I need to delete the index and rebuild it?

because when i got error, then ,i want to change tihis field to fast :true
Internal error: (Internal error: An invalid argument was passed: 'Field "name" is not configured as fast field'`.

PSeitz · 2023-09-15T09:32:23Z

A certain index has been running for some time, and I want to modify one field to fast: true. Do I need to delete the index and rebuild it?

because when i got error, then ,i want to change this field to fast :true Internal error: (Internal error: An invalid argument was passed: 'Field "name" is not configured as fast field'`.

Yes you need to re-index everything currently.

2、Can I use Quickwit as a data source to create reports similar to curve and pie charts,
Yes that's possible with the aggregations API.

On the error Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 1.36 PB, can you check the logs on what query is sent to the backend. (or check what the UI is sending)

yangshike · 2023-09-18T00:58:09Z

on the error(log volumn ):
this query is send to the backend :
"POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256"

Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 1.36 PB
This error only occurs when use grafana。 in the quickwit-UI is ok！

PSeitz · 2023-09-19T06:08:17Z

on the error(log volumn ): this query is send to the backend : "POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256"

Can you provide the payload of the POST request and your index configuration?

yangshike · 2023-09-19T06:35:57Z

[19/Sep/2023:14:35:39 +0800] "POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256 HTTP/1.1" 200 302 "" "Go-http-client/1.1" "" {"ignore_unavailable":true,"index":"qiniu_crm","search_type":"query_then_fetch"}\n{"aggs":{"2":{"aggs":{"3":{"date_histogram":{"field":"msg_time","fixed_interval":"60000ms","min_doc_count":0,"extended_bounds":{"min":1695094538615,"max":1695105338615}}}},"terms":{"field":"level","size":100,"order":{"_count":"desc"},"min_doc_count":0}}},"query":{"bool":{"filter":{"range":{"msg_time":{"gte":"2023-09-19T03:35:38.615Z","lte":"2023-09-19T06:35:38.615Z"}}}}},"size":0}\n

yangshike · 2023-09-19T06:36:54Z

doc_mapping:
field_mappings:
- name: msg_time
type: datetime
input_formats:
- unix_timestamp
output_format: unix_timestamp_millis
stored: true
indexed: true
fast: true
precision: milliseconds
- name: content
type: text
tokenizer: chinese_compatible
record: position
stored: true
indexed: true
fast: true
- name: level
type: text
stored: true
indexed: true
fast: true
- name: server_ip
type: text
stored: true
indexed: true
fast: true
tokenizer: raw
- name: service_name
type: text
stored: true
indexed: true
fast: true
tokenizer: raw
- name: host_name
type: text
stored: true
indexed: true
fast: true
tokenizer: raw
- name: time
type: datetime
input_formats:
- rfc3339
- "%Y-%m-%d %H:%M:%S.%f"
output_format: "%Y-%m-%d %H:%M:%S.%f"
tag_fields: ["service_name"]
timestamp_field: msg_time

search_settings:
default_search_fields: [content]

indexing_settings:
commit_timeout_secs: 10
retention:
period: 180 days
schedule: daily

yangshike · 2023-09-19T06:40:53Z

Only search 5-minute data，also get this error!

[19/Sep/2023:14:39:26 +0800] "POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256 HTTP/1.1" 200 303 "" "Go-http-client/1.1" "" {"ignore_unavailable":true,"index":"qiniu_crm","search_type":"query_then_fetch"}\n{"aggs":{"2":{"aggs":{"3":{"date_histogram":{"field":"msg_time","fixed_interval":"1000ms","min_doc_count":0,"extended_bounds":{"min":1695105266331,"max":1695105566331}}}},"terms":{"field":"level","size":100,"order":{"_count":"desc"},"min_doc_count":0}}},"query":{"bool":{"filter":{"range":{"msg_time":{"gte":"2023-09-19T06:34:26.331Z","lte":"2023-09-19T06:39:26.331Z"}}}}},"size":0}\n

Failed to load log volume for this query
:Internal error: Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 81.36 PB.
It's an occasional error, not every time

some times get this error:

Internal error: Aborting aggregation because bucket limit was exceeded. Limit: 65000, Current: 102714.

fulmicoton · 2023-09-19T06:47:49Z

Here is the second one indented

"aggs": {
	"2": {
		"aggs": {
			"3": {
				"date_histogram": {
					"field": "msg_time",
					"fixed_interval": "1000ms",
					"min_doc_count": 0,
					"extended_bounds": {
						"min": 1695105266331,
						"max": 1695105566331
					}
				}
			}
		},
		"terms": {
			"field": "level",
			"size": 100,
			"order": {
				"_count": "desc"
			},
			"min_doc_count": 0
		}
	}
}, "query": {
	"bool": {
		"filter": {
			"range": {
				"msg_time": {
					"gte": "2023-09-19T06:34:26.331Z",
					"lte": "2023-09-19T06:39:26.331Z"
				}
			}
		}
	}
}, "size": 0
}

fulmicoton · 2023-09-19T06:56:01Z

that should be 300 buggets, in a term subaggregation...
This sounds like a bug? @PSeitz

PSeitz · 2023-09-19T07:48:14Z

Only search 5-minute data，also get this error!

[19/Sep/2023:14:39:26 +0800] "POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256 HTTP/1.1" 200 303 "" "Go-http-client/1.1" "" {"ignore_unavailable":true,"index":"qiniu_crm","search_type":"query_then_fetch"}\n{"aggs":{"2":{"aggs":{"3":{"date_histogram":{"field":"msg_time","fixed_interval":"1000ms","min_doc_count":0,"extended_bounds":{"min":1695105266331,"max":1695105566331}}}},"terms":{"field":"level","size":100,"order":{"_count":"desc"},"min_doc_count":0}}},"query":{"bool":{"filter":{"range":{"msg_time":{"gte":"2023-09-19T06:34:26.331Z","lte":"2023-09-19T06:39:26.331Z"}}}}},"size":0}\n

Failed to load log volume for this query :Internal error: Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 81.36 PB. It's an occasional error, not every time

Thanks, that's helpful. I couldn't reproduce it locally so far. Can you edit the JSON payload and resend the request, by setting "min_doc_count": 1 in the date_histogram and post what's returned?

If you get bucket limit was exceeded you can set size: 3 in terms.

"date_histogram": {
	"field": "msg_time",
	"fixed_interval": "1000ms",
	"min_doc_count": 1,
	"extended_bounds": {
		"min": 1695105266331,
		"max": 1695105566331
	}
}

some times get this error:

Internal error: Aborting aggregation because bucket limit was exceeded. Limit: 65000, Current: 102714.

It seems like the intermediate result after merging has too many buckets, we probably would need to prune before counting when converting to the final result. I created a separate issue here: quickwit-oss/tantivy#2182

PSeitz · 2023-09-20T04:58:31Z

some times get this error:

Internal error: Aborting aggregation because bucket limit was exceeded. Limit: 65000, Current: 102714.

Do you get this for the same request you posted?

yangshike · 2023-09-20T05:20:55Z

yes, Same query，Both of these are occasional occurrences。

yangshike · 2023-09-20T07:20:08Z

this is logs of grafana!

I reinstalled Grafana and crawled the logs again. When I found an error, the stats of this log were 400：

[20/Sep/2023:15:19:39 +0800] "POST /api/ds/query?ds_type=quickwit-quickwit-datasource&requestId=explore_left_logs_volume_0 HTTP/1.1" 400 176 "http://test123.cn/explore?left=%7B%22datasource%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22quickwit-quickwit-datasource%22,%22uid%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22%7D,%22query%22:%22%22,%22alias%22:%22%22,%22metrics%22:%5B%7B%22id%22:%223%22,%22type%22:%22logs%22,%22settings%22:%7B%22limit%22:%22100%22%7D%7D%5D,%22bucketAggs%22:%5B%5D,%22timeField%22:%22msg_time%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D&orgId=1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36" "" {"queries":[{"refId":"log-volume-A","query":"","metrics":[{"type":"count","id":"1"}],"timeField":"msg_time","bucketAggs":[{"id":"2","type":"terms","settings":{"min_doc_count":"0","size":"0","order":"desc","orderBy":"_count"},"field":"level"},{"id":"3","type":"date_histogram","settings":{"interval":"auto","min_doc_count":"0","trimEdges":"0"},"field":"msg_time"}],"datasource":{"type":"quickwit-quickwit-datasource","uid":"eb289c78-48e7-453b-9458-faadd67f9157"},"datasourceId":1,"intervalMs":60000,"maxDataPoints":1512}],"range":{"from":"2023-09-20T06:19:38.241Z","to":"2023-09-20T07:19:38.241Z","raw":{"from":"now-1h","to":"now"}},"from":"1695190778241","to":"1695194378241"}

yangshike · 2023-09-20T07:25:30Z

When there is no error reported. Status=200

[20/Sep/2023:15:23:31 +0800] "POST /api/ds/query?ds_type=quickwit-quickwit-datasource&requestId=explore_left_logs_volume_0 HTTP/1.1" 200 137649 "http://test123.cn/explore?left=%7B%22datasource%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22quickwit-quickwit-datasource%22,%22uid%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22%7D,%22query%22:%22%22,%22alias%22:%22%22,%22metrics%22:%5B%7B%22id%22:%223%22,%22type%22:%22logs%22,%22settings%22:%7B%22limit%22:%22100%22%7D%7D%5D,%22bucketAggs%22:%5B%5D,%22timeField%22:%22msg_time%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D&orgId=1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36" "" {"queries":[{"refId":"log-volume-A","query":"","metrics":[{"type":"count","id":"1"}],"timeField":"msg_time","bucketAggs":[{"id":"2","type":"terms","settings":{"min_doc_count":"0","size":"0","order":"desc","orderBy":"_count"},"field":"level"},{"id":"3","type":"date_histogram","settings":{"interval":"auto","min_doc_count":"0","trimEdges":"0"},"field":"msg_time"}],"datasource":{"type":"quickwit-quickwit-datasource","uid":"eb289c78-48e7-453b-9458-faadd67f9157"},"datasourceId":1,"intervalMs":60000,"maxDataPoints":1512}],"range":{"from":"2023-09-20T06:23:30.565Z","to":"2023-09-20T07:23:30.565Z","raw":{"from":"now-1h","to":"now"}},"from":"1695191010565","to":"1695194610565"}

yangshike · 2023-09-20T08:48:36Z

by the way，Very much expected : Log search, where matches are made, hoping for highlighting

PSeitz · 2023-09-20T09:14:13Z

Thanks, I could reproduce it locally.

The issue is a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000.
The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible.
This normalization happens only for date type fields, as other field types don't have precision settings. In the query above the column_type parameter was empty therefore normalization did not happen.

One field can have multiple field types associated, therefore we delay setting the column_type until working on the tantivy segment level.
In the case of empty results, as in the terms aggregation with min_doc_count: 0, there may be no column type set.
The actual root cause is a missing propagation of the column_type, when merging two intermediate results and one has no column_type and the other has one.

Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes quickwit-oss/quickwit#3837

* Fix DateHistogram bucket gap Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes quickwit-oss/quickwit#3837 * use older nightly for time crate (breaks build)

yangshike added the enhancement New feature or request label Sep 15, 2023

fulmicoton changed the title ~~about grafana plugins~~ Aborting aggregation because memory limit was exceeded Sep 19, 2023

fulmicoton added bug Something isn't working high-priority labels Sep 19, 2023

PSeitz mentioned this issue Sep 21, 2023

Fix DateHistogram bucket gap quickwit-oss/tantivy#2183

Merged

PSeitz closed this as completed in quickwit-oss/tantivy#2183 Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborting aggregation because memory limit was exceeded #3837

Aborting aggregation because memory limit was exceeded #3837

yangshike commented Sep 15, 2023 •

edited

Loading

PSeitz commented Sep 15, 2023

yangshike commented Sep 15, 2023 •

edited

Loading

yangshike commented Sep 15, 2023 •

edited

Loading

PSeitz commented Sep 15, 2023 •

edited

Loading

yangshike commented Sep 18, 2023 •

edited

Loading

PSeitz commented Sep 19, 2023 •

edited

Loading

yangshike commented Sep 19, 2023

yangshike commented Sep 19, 2023

yangshike commented Sep 19, 2023 •

edited

Loading

fulmicoton commented Sep 19, 2023

fulmicoton commented Sep 19, 2023

PSeitz commented Sep 19, 2023 •

edited

Loading

PSeitz commented Sep 20, 2023

yangshike commented Sep 20, 2023

yangshike commented Sep 20, 2023 •

edited

Loading

yangshike commented Sep 20, 2023

yangshike commented Sep 20, 2023

PSeitz commented Sep 20, 2023 •

edited

Loading

Aborting aggregation because memory limit was exceeded #3837

Aborting aggregation because memory limit was exceeded #3837

Comments

yangshike commented Sep 15, 2023 • edited Loading

PSeitz commented Sep 15, 2023

yangshike commented Sep 15, 2023 • edited Loading

yangshike commented Sep 15, 2023 • edited Loading

PSeitz commented Sep 15, 2023 • edited Loading

yangshike commented Sep 18, 2023 • edited Loading

PSeitz commented Sep 19, 2023 • edited Loading

yangshike commented Sep 19, 2023

yangshike commented Sep 19, 2023

yangshike commented Sep 19, 2023 • edited Loading

fulmicoton commented Sep 19, 2023

fulmicoton commented Sep 19, 2023

PSeitz commented Sep 19, 2023 • edited Loading

PSeitz commented Sep 20, 2023

yangshike commented Sep 20, 2023

yangshike commented Sep 20, 2023 • edited Loading

yangshike commented Sep 20, 2023

yangshike commented Sep 20, 2023

PSeitz commented Sep 20, 2023 • edited Loading

yangshike commented Sep 15, 2023 •

edited

Loading

yangshike commented Sep 15, 2023 •

edited

Loading

yangshike commented Sep 15, 2023 •

edited

Loading

PSeitz commented Sep 15, 2023 •

edited

Loading

yangshike commented Sep 18, 2023 •

edited

Loading

PSeitz commented Sep 19, 2023 •

edited

Loading

yangshike commented Sep 19, 2023 •

edited

Loading

PSeitz commented Sep 19, 2023 •

edited

Loading

yangshike commented Sep 20, 2023 •

edited

Loading

PSeitz commented Sep 20, 2023 •

edited

Loading