-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aborting aggregation because memory limit was exceeded #3837
Comments
Can you share the aggregation query that is send to the backend? What condition did you add? |
In addition to the general log, we also have some other data without these fields (error, info) |
A certain index has been running for some time, and I want to modify one field to fast: true. Do I need to delete the index and rebuild it? because when i got error, then ,i want to change tihis field to fast :true |
Yes you need to re-index everything currently.
On the error |
on the error(log volumn ): Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 1.36 PB |
Can you provide the payload of the POST request and your index configuration? |
[19/Sep/2023:14:35:39 +0800] "POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256 HTTP/1.1" 200 302 "" "Go-http-client/1.1" "" {"ignore_unavailable":true,"index":"qiniu_crm","search_type":"query_then_fetch"}\n{"aggs":{"2":{"aggs":{"3":{"date_histogram":{"field":"msg_time","fixed_interval":"60000ms","min_doc_count":0,"extended_bounds":{"min":1695094538615,"max":1695105338615}}}},"terms":{"field":"level","size":100,"order":{"_count":"desc"},"min_doc_count":0}}},"query":{"bool":{"filter":{"range":{"msg_time":{"gte":"2023-09-19T03:35:38.615Z","lte":"2023-09-19T06:35:38.615Z"}}}}},"size":0}\n |
doc_mapping: search_settings: indexing_settings: |
Only search 5-minute data,also get this error! [19/Sep/2023:14:39:26 +0800] "POST /api/v1/_elastic/_msearch?max_concurrent_shard_requests=256 HTTP/1.1" 200 303 "" "Go-http-client/1.1" "" {"ignore_unavailable":true,"index":"qiniu_crm","search_type":"query_then_fetch"}\n{"aggs":{"2":{"aggs":{"3":{"date_histogram":{"field":"msg_time","fixed_interval":"1000ms","min_doc_count":0,"extended_bounds":{"min":1695105266331,"max":1695105566331}}}},"terms":{"field":"level","size":100,"order":{"_count":"desc"},"min_doc_count":0}}},"query":{"bool":{"filter":{"range":{"msg_time":{"gte":"2023-09-19T06:34:26.331Z","lte":"2023-09-19T06:39:26.331Z"}}}}},"size":0}\n Failed to load log volume for this query some times get this error: Internal error: |
Here is the second one indented "aggs": {
"2": {
"aggs": {
"3": {
"date_histogram": {
"field": "msg_time",
"fixed_interval": "1000ms",
"min_doc_count": 0,
"extended_bounds": {
"min": 1695105266331,
"max": 1695105566331
}
}
}
},
"terms": {
"field": "level",
"size": 100,
"order": {
"_count": "desc"
},
"min_doc_count": 0
}
}
}, "query": {
"bool": {
"filter": {
"range": {
"msg_time": {
"gte": "2023-09-19T06:34:26.331Z",
"lte": "2023-09-19T06:39:26.331Z"
}
}
}
}
}, "size": 0
} |
that should be 300 buggets, in a term subaggregation... |
Thanks, that's helpful. I couldn't reproduce it locally so far. Can you edit the JSON payload and resend the request, by setting "min_doc_count": 1 in the If you get "date_histogram": {
"field": "msg_time",
"fixed_interval": "1000ms",
"min_doc_count": 1,
"extended_bounds": {
"min": 1695105266331,
"max": 1695105566331
}
}
It seems like the intermediate result after merging has too many buckets, we probably would need to prune before counting when converting to the final result. I created a separate issue here: quickwit-oss/tantivy#2182 |
Do you get this for the same request you posted? |
yes, Same query,Both of these are occasional occurrences。 |
this is logs of grafana! I reinstalled Grafana and crawled the logs again. When I found an error, the stats of this log were 400: [20/Sep/2023:15:19:39 +0800] "POST /api/ds/query?ds_type=quickwit-quickwit-datasource&requestId=explore_left_logs_volume_0 HTTP/1.1" 400 176 "http://test123.cn/explore?left=%7B%22datasource%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22quickwit-quickwit-datasource%22,%22uid%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22%7D,%22query%22:%22%22,%22alias%22:%22%22,%22metrics%22:%5B%7B%22id%22:%223%22,%22type%22:%22logs%22,%22settings%22:%7B%22limit%22:%22100%22%7D%7D%5D,%22bucketAggs%22:%5B%5D,%22timeField%22:%22msg_time%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D&orgId=1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36" "" {"queries":[{"refId":"log-volume-A","query":"","metrics":[{"type":"count","id":"1"}],"timeField":"msg_time","bucketAggs":[{"id":"2","type":"terms","settings":{"min_doc_count":"0","size":"0","order":"desc","orderBy":"_count"},"field":"level"},{"id":"3","type":"date_histogram","settings":{"interval":"auto","min_doc_count":"0","trimEdges":"0"},"field":"msg_time"}],"datasource":{"type":"quickwit-quickwit-datasource","uid":"eb289c78-48e7-453b-9458-faadd67f9157"},"datasourceId":1,"intervalMs":60000,"maxDataPoints":1512}],"range":{"from":"2023-09-20T06:19:38.241Z","to":"2023-09-20T07:19:38.241Z","raw":{"from":"now-1h","to":"now"}},"from":"1695190778241","to":"1695194378241"} |
When there is no error reported. Status=200 [20/Sep/2023:15:23:31 +0800] "POST /api/ds/query?ds_type=quickwit-quickwit-datasource&requestId=explore_left_logs_volume_0 HTTP/1.1" 200 137649 "http://test123.cn/explore?left=%7B%22datasource%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22quickwit-quickwit-datasource%22,%22uid%22:%22eb289c78-48e7-453b-9458-faadd67f9157%22%7D,%22query%22:%22%22,%22alias%22:%22%22,%22metrics%22:%5B%7B%22id%22:%223%22,%22type%22:%22logs%22,%22settings%22:%7B%22limit%22:%22100%22%7D%7D%5D,%22bucketAggs%22:%5B%5D,%22timeField%22:%22msg_time%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D&orgId=1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36" "" {"queries":[{"refId":"log-volume-A","query":"","metrics":[{"type":"count","id":"1"}],"timeField":"msg_time","bucketAggs":[{"id":"2","type":"terms","settings":{"min_doc_count":"0","size":"0","order":"desc","orderBy":"_count"},"field":"level"},{"id":"3","type":"date_histogram","settings":{"interval":"auto","min_doc_count":"0","trimEdges":"0"},"field":"msg_time"}],"datasource":{"type":"quickwit-quickwit-datasource","uid":"eb289c78-48e7-453b-9458-faadd67f9157"},"datasourceId":1,"intervalMs":60000,"maxDataPoints":1512}],"range":{"from":"2023-09-20T06:23:30.565Z","to":"2023-09-20T07:23:30.565Z","raw":{"from":"now-1h","to":"now"}},"from":"1695191010565","to":"1695194610565"} |
by the way,Very much expected : Log search, where matches are made, hoping for highlighting |
Thanks, I could reproduce it locally. The issue is a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor One field can have multiple field types associated, therefore we delay setting the |
Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes quickwit-oss/quickwit#3837
* Fix DateHistogram bucket gap Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes quickwit-oss/quickwit#3837 * use older nightly for time crate (breaks build)
The quickwit grafana plugin looks great, but there are also some issues with its use:
Through the Grafana plugin, I only searched for data within an hour, which is around a few million, but the volumes logs reported an error:
Failed to load log volume for this query
Internal error:
Aborting aggregation because memory limit was exceeded. Limit: 500.00 MB, Current: 1.36 PB
.because I added a condition to the Lucene query. If I didn't add a condition, it would be fine
other :
1、Can the level fields and their values in the data source configuration be customized so that I can distinguish between other types of data, rather than just general log levels
2、Can I use Quickwit as a data source to create reports similar to curve and pie charts,
3、Or support aggregation in languages such as count, group by, etc. on Granafa
and Comparison of some calculations: > 、>=、 < 、!= and so on
The text was updated successfully, but these errors were encountered: