-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cont_id tag from docker plugin #1015
Conversation
including some potentially interested parties for review. See description above for an overview of the changes this would bring, some of which would require adjustment to your queries. |
@sparrc can you make this one optional? I know it's an edge case at this point but graphing metrics localized to a single container is based on container_id and if we remove that then we can't track the behavior of the container over time. |
Yeah container id is definitely needed |
@gena01 @jchauncey what if we made it a field instead of a tag? |
@sparrc can you remind me what the difference is? |
In line-protocol terms:
basically, tags are indexed and fields are not. This means that There's a more detailed doc here: https://docs.influxdata.com/influxdb/v0.12/concepts/schema_and_data_layout/#encouraged-schema-design |
What is the reasoning behind removing it? Just trying to get a little context. |
some users complained because when you bring a docker container up and down, the ID will change. This means that for every container restart you will create a new container_id tag value, which creates a new series in InfluxDB. Creating a new series increases the cardinality and too many series can bog down the performance of the database. So the reason behind this is because it can easily create a very poorly performing InfluxDB schema design, since most docker users stop & restart containers frequently. |
Ah ok. My first thought is more philosophical in that I wonder how they are graphing the data? I would want to slice the data along the container_id so when new containers come online or go away I would see that reflected in the graph. Container names are pointless in almost all cases because they are either going to be teh same or some random junk that is hard to parse anyways. If going from tag to field helps with performance than thats fine with me. |
@jchauncey, that should still work, The main difference would be that |
K i doubt people are doing the second (unless they have some templated query they are using to filter a graph) |
cc @tripledes @asdfsx @AdithyaBenny @zstyblik @sporokh including some more people who I think will want to know about this potential change |
@sparrc Thanks a lot for notifying. Much appreciate. |
thx for notifying!!! |
@sparrc turning container ID into field makes sense as you've explained. |
@sparrc would it make sense to make cont_name a field as well? |
I suppose it depends on how the user is using container names in docker. Some people might setup their containers with the same name on restart, others might go with the randomized docker default names..... but since docker randomizes by default, I'm thinking that you're right and that we should make container_name a field as well. Anyone here have other thoughts? Do you ever do |
Knowing how containers are named in kubernetes it would probably be best if container name was a field. |
I'm one of those hit by "high cardinality of docker measurements" issue. From my point, #1017 would fix the problem. You would simply filter-out tag which has randomized value. However, we have static container names, which is used in Grafana queries like:
Would I be able to GROUP BY name if it was a field ?? |
@sparrc thanks for pinging. I guess this change makes sense specially for InfluxDB, right? Just wondering would the change affect other outputs which are not that flexible from a data structure point of view...any idea? |
May be the best way is to make it configurable ? In my particular case
|
Investigating it further, I actually think that we'll need to keep container_name a tag (but I will keep container_id a field). The reason is that, if we don't make container_name a tag, then there is nothing to uniquely identify two containers with the same image. If container_name is creating too much cardinality then users can use the new To @tripledes point, this is true for InfluxDB, Graphite, and prometheus |
Thinking about this a bit more, I think that it makes the most sense to rename a few of the measurements, specifically:
would be renamed to:
Why? Because these metrics are specifically tracking per-container stats. The problem with per-container stats, in some use-cases, is that if containers are short-lived AND names are not kept consistent, then the series cardinality will balloon very quickly. So adding "_container" to each metric would:
NOTE to clarify, we will still be changing container_id to a field, and container_name will remain a tag (so GROUP BY and WHERE will still work on container_name). thoughts? |
Sounds good to me. |
@sparrc totally agree with you considering container_name. I think it should be keep as tag, first of all cause you've already mentioned - we removing the ability to identify two containers with the same image, and the second one - stable production environments run by docker/docker-compose do not use randomly generated names. |
90dd487
to
f559214
Compare
OK, this has now been merged into master as I think I've given everyone time to provide input, and overall I felt like there was positive feedback on this change, despite it being a breaking change. see the changelog for a full rundown thanks all for the input! |
this would make container_id a field
also renaming cont_name and cont_image to container_name and
container_image.
closes #1014