-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos query responds too slow #6845
Comments
Can you share some details about your setup please? What downstream store apis are you using for example, every bit of information helps! |
Our thanos deployment consists of receive, query, ruler, store with object storage(minio). Each component runs on virtual machine. We use thanos store for long term data and ruler for recording rule. But, to simplify the issues it can be ignored. The problems we have are two kinds.
These patterns were repeated on every query. As a result we sometimes failed to calculate recording rules. (The recording rules are executed for every minute.) The second problem is query replies to HTTP request very slowly. These two problems are not happens when we restart query. However, after some time later, it happens again. |
Without knowing how much data you fetched, it is hard for us to understand its performance. |
I think the query and the data size is not important. Because even just simple HTTP GET request takes long under undesirable situation.
Anyway, we have five metrics and the most heavy one has almost 8,000 series and takes about 5 seconds to fetch in normal situation. |
The 5s is spent on server only? Or it is the RTT from client to server then back to client. As I said, better to set up some tracing and continuous profiling tools to help narrow down the issue and understand what's slow here. |
As tcpdump shows, there is no problems in TCP layer because the server immediately acked to the client request. However, the application layer protocol, HTTP, response is slow. Here are our configurations. receive (3.3.3.3)
query (2.2.2.2)
ruler (1.1.1.1)
|
Thanos, Prometheus and Golang version used:
0.32.4
Object Storage Provider:
minio (not related to this issue)
What happened:
Thanos Query responds to HTTP requests very slowly including
/api/v1/query
from thanos ruler and basic/
resource as well.The CPU and memory usage of query was almost idle.
What you expected to happen:
It should respond quickly.
How to reproduce it (as minimally and precisely as possible):
We have 10 queries and 3 rulers running on each VM and other component(receive, store as well). After staring the service, some time later we notice that our recording rule is empty for some time durations because we set the timeout to 1 minute for ruler. When we dig into the tcpdump, query reponds after 2 minutes.
Full logs to relevant components:
ruler : 1.1.1.1
query : 2.2.2.2
Request : No. 129
Response : No. 372
Anything else we need to know:
The text was updated successfully, but these errors were encountered: