-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PromQL query returns unrelated series #865
Comments
@nikunjgit @robskillington Does one of you have an idea what might be going on here? I'm not really familiar with the Prometheus stuff |
I'll look into this, might have something to do with the matchers and special characters (wonder if we're specifying regexp instead of exact match for some of the matchers we're sending to the index). |
M3DB 0.4.2 I finally managed to query directly on httpClusterListenAddress (port 9003):
I would expect 1 result, not 2:
|
Most likely I am not translating the promql query the same as m3coordinator does, but another finding: PromQL
Result:
m3ql query:
Result: {
"results": [
{
"id": "__name__=node_memory_SwapTotal_bytes,instance=alertmanager01:9100,job=node-exporter,",
"tags": [
{
"name": "__name__",
"value": "node_memory_SwapTotal_bytes"
},
{
"name": "instance",
"value": "alertmanager01:9100"
},
{
"name": "job",
"value": "node-exporter"
},
],
"datapoints": null
},
{
"id": "__name__=node_memory_SwapTotal_bytes,instance=m3db-node01:9100,job=node-exporter,",
"tags": [
{
"name": "instance",
"value": "m3db-node01:9100"
},
{
"name": "job",
"value": "node-exporter"
},
{
"name": "__name__",
"value": "node_memory_SwapTotal_bytes"
},
],
"datapoints": null
},
{
"id": "__name__=node_memory_SwapTotal_bytes,instance=m3db-node03:9100,job=node-exporter,",
"tags": [
{
"name": "__name__",
"value": "node_memory_SwapTotal_bytes"
},
{
"name": "instance",
"value": "m3db-node03:9100"
},
{
"name": "job",
"value": "node-exporter"
},
],
"datapoints": null
}
],
"exhaustive": true
} So, where m3coordinator sometimes also returns unrelated series, m3db itself returns the correct serie but for fields it shouldn't. |
Did some digging around and it seems the order of query terms is relevant as well. Probably explains the difference in my previous post between query from Prometheus and my direct m3db query. Query term order instance - name:
Result:
Query term order name - instance:
I would expect to get the same results...
|
@sboschman thanks for the detailed report! I'll dig into this today. |
@sboschman #860 addressed the m3coordinator related portion, and #883 addresses the m3db bits. There's more details in the PRs if you're interested in the root cause. I'll land #883 tomorrow and cut a release after. |
@sboschman I cut release |
Upgraded everything to 0.4.3, and the Grafana dashboard for the Prometheus node-exporter looks a lot better. The multiple series error is gone. 👍 Just noticed that sometimes a grafana graph does not render, giving an error like (but after a page refresh it is oke again)
In the m3coordinator log are the follwoing errors (no idea if they are related though):
Don't have time next week to look into this, so might take a while before I can add some more info. |
@sboschman cheers, thanks for the update. Re the log messages pertaining I'm unsure about the other issues at the moment. Will dig around. Re: "Invalid label name" - @nikunjgit could this be related to the tag translation coordinator does? |
@prateek About the invalid label name, are you referring to the flow including the function 'FromM3IdentToMetric' in index.go for "tag translation"? Did some poor man's debugging and the raw bytes for the ID of FetchTaggedIDResult_ (rpc.go) look incorrect. The ID should be like 'name=...', but sometimes it contains gibberish like 'se_size_bytes_bucket,handler=/query,instance=prometheus01,job=no'. So I suspect the issue is more m3db related than coordinator. Assuming FetchTaggedIDResult_ contains what goes over the wire. |
@sboschman Not exactly. At write time, m3coordinator generates an ID for a given prom metric as the concatenation of all it's tag name/value (after sorting) pairs, link to the code. So it'd be expected to see IDs being of the form Re: Mind pasting the entire ID you think is corrupted? |
I think I have the same problem regarding invalid metric names. This problem is easiest to reproduce on For example, I have a The error I for for this panel is: The correct query would be: Another example from Errors on prometheus side (they were taken later the the upper errors, so they are not connected):
Refreshing the dashboard via the button in the upper right corner usually renders all the panels successfully. If not, another 1 or 2 clicks help. |
Did some debugging and this is what I have found so far (after digging through m3coordinator and realizing the data received was faulty): Looks like the document contains a lot of fields (100+ instead of the normal <10) when it gets added to the results (https://github.com/m3db/m3/blob/master/src/dbnode/storage/index/results.go#L73). The field name and value pairs in that case are pretty much 'garbage' and correspond to what you see in m3coordinator and grafana. So the next thing I want to look into is the offset used to read a document from the raw data. If the offset is wrong, I can imagine it ending up with 100+ fields and the name/values being garbage. Disclaimer: Just started with go, so baby steps at a time here. |
h/t to @sboschman for all the debugging, he found a race in the code which would cause this behaviour. Addressed in #938 |
@sboschman @matejzero landed a fix for the issue in #938. Please give it a shot when you get a chance ( |
LGTM, been clicking and refreshing for a while now and haven't spotted any failed graphs yet. So closing this issue, thanks for the fixes! |
Good on my end too. |
Glad to hear it! Thanks again @sboschman & @matejzero! |
As mentioned on Gitter, I run into some strange issue with queries against M3DB returning unrelated series. This results in Grafana throwing errors like "many-to-many matching not allowed" or "Multiple Series Error" on f.e. the default Prometheus node-exporter dashboard.
Prometheus: 2.3.2
M3Coordinator: 0.4.1 (running on the same host as prometheus)
M3DB: 0.4.1 (4 node cluster, 2 replicasets of 2 nodes each with 3 seednodes)
Prometheus config: read_recent: true
PromQL (latest value):
Result (omitted the values):
PromQL (1 day window):
Result:
Didn't expect to get a 'node_memory_CmaFree_bytes' serie, doesn't even match the instance...
PromQL (2 day window):
Result:
These extra series have values up to 'now', so they are not old series or something.
PromQL (1 week window):
A total of 6 series show up.
PromQL (2 week window):
Result:
All the 'up' series show up (different instance tags ofc), but no other series.
I have two Prometheus instances scraping the same nodes. On the 2nd node I changed the config to
read_recent: false
. The 2nd node only shows the requested serie, as expected.The text was updated successfully, but these errors were encountered: