-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Agent] Error extracting container id in kubernetes #27216
Comments
Pinging @elastic/integrations (Team:Integrations) |
After debugging this offline we found that the error comes from add_kubernetes_metadata. While Elastic Agent does not explicitly enables the processor, the underlying Filebeat process
runs with default config like Despite it seems harmless it is filling up the logs and hence we need a way to better handle it. I think that improving the logging in processor's code might help here, tbh I don't see any reason of logging this kind of messages as @masci , @exekias , @MichaelKatsoulis do you think we can push it for 7.15 (even as a bug-fix after ff)? |
how about not enabling this by default? we mostly rely on dynamic inputs, also for K8S logs, so |
You mean not enabling the processors in Beat's configs in general right? Or disable them when they are run by Agent only? |
Being that Elastic Agent is the now GA and the way forward, maybe just disabling them in the default configuration would be okay? |
This would be a breaking change. I was thinking more about disabling them in agent only |
This is something that needs to be updated in the agent side so that it will not use the default filebeat configuration. Removing add_kubernetes_metadata from the default config will affect also non-agent uses of filebeat. |
@blakerouse would it be possible to remove this processor from the Agent beats? |
@exekias At the moment we rely on the default configuration that is shipped with a beat, by changing that 1 behavior we affect all other beats that might rely on something from their default configuration. We would need to send an empty list of processors in the configuration through the control protocol, but would filebeat even reload that section? I understand the removing of the default breaks for others, but only in the case they are using the default configuration without any changes, correct? Is filebeat even usable with a default configuration and no changes? |
To my understanding @exekias So removing the processor from the default config wouldn't affect that. |
Could we maybe leverage the if statements in the filebeat/metricbeat yaml like in packetbeat.yml ? |
What @MichaelKatsoulis proposed above sounds good to me. We need a condition to verify that metadata already exists, maybe check for |
Any option sounds good to me, also consider that given the proximity of 8.0 the possibility of doing this as a breaking change is not that far |
After re-thinking this and chatting offline with Mike I think we can avoid doing the change on the configuration level with the 2 options below:
Personally I'm +1 for applying both changes. |
@exekias If you don't have any objection with that proposal I will create a PR to fix this. I believe it is the correct approach as the |
SGTM! |
Any idea when this is going to make it into a release? This bug is still present in 7.15.0 |
@adammike this one will be fixed with 7.16 version of elastic-agent. |
Seeing as how 7.15.1 is not out yet, I assume 7.16 months away? |
7.16 is not coupled with any of 7.15.x releases, the scopes are different. However 7.16 is not freezed yet, so it will take some time but not so much :). Btw this is not a critical bug you can just ignore it, right? The only problem is that it might overflow the logs/disk. |
In our case, using Elastic Cloud, this simply kills everything, because it logs/sends these like 10 times PER SECOND. |
Hey @tomsseisums , sorry to hear that :(. Could you use |
@ChrsMark Elastic Cloud itself is limited to 7.15 and upgrading agent to 7.16 snapshot results in:
|
Well, in Elastic Cloud you can choose snapshot versions in GCP Belgium I think. However this will take you out of any support/sla so be sure that you actually want to do this and what implications it would have in possible updates. |
Concluding.... Isn't possible to use the Elastic Agent to monitor K8s today, using Elastic Cloud? |
@ChrsMark In our case, it seems like the issue still remains, at least to some extent. When elastic agent is started, it still fills the logs with like 10k to 20k entries per minute. Although, it seems to cool down after a while and errors disappear when it starts skipping the |
I just started an Elastic cloud trial and I see this error using 8.0, is the issue fixed? |
Hey folks! It is identified that the issue persists but for another reason explained at #29767. This will be resolved properly with elastic/elastic-agent#90 so I would suggest following this issue too (fyi @ph). |
@ChrsMark For me, the errors (also in the range of 20k per minute) appear to be caused by fact that Personally I don't care if that processor is enabled by default without a way to change it, so long as it doesn't fail at this magnitude if what it's trying to find is empty (which is clearly a valid scenario). NotesMy on-prem cluster is running with the Mounting individual folders like |
Hey @WoodyWoodsta! If the processor is failing while it is enabled intentionally then we should handle it in another issue. In case of Elastic Agent the processor should not be enabled by default and this is the purpose of this issue. Are you running Elastic Agent and seeing this issue? If so please keep track of elastic/elastic-agent#90 (fyi @jlind23). If you still want to use the processor but you hit issues please open another issue cause that should be a different use case. In any case at the moment the Elastic Agent would automatically add k8s metadata without the need of enabling the processor for most of the cases. |
@ChrsMark Thanks - I was just wanting to point out that on top of the processor being enabled/disabled by default (which seems to be the focus of the discussions in related issues and threads), if If that sounds like a separate thing to you, I'm more than happy to open a new issue! |
Yes @WoodyWoodsta feel free to file a different issue for this :), it's highly possible that this is a configuration issue or just a corner case we need to fix. Let's take the discussions there once we have the new issue though. |
Running the https://github.com/elastic/beats/tree/master/deploy/kubernetes/elastic-agent-standalone deployment in GKE with version 7.13.4 results in the running filebeat to keep logging the following error:
I was trying to reproduce #25435, but came across this issue instead.
The text was updated successfully, but these errors were encountered: