[Elastic Agent] Error extracting container id in kubernetes #27216

blakerouse · 2021-08-03T13:30:21Z

Running the https://github.com/elastic/beats/tree/master/deploy/kubernetes/elastic-agent-standalone deployment in GKE with version 7.13.4 results in the running filebeat to keep logging the following error:

[elastic_agent.filebeat][error] Error extracting container id - source value does not contain matcher's logs_path '/var/lib/docker/containers/'.

I was trying to reproduce #25435, but came across this issue instead.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-08-03T14:20:17Z

Pinging @elastic/integrations (Team:Integrations)

ChrsMark · 2021-08-03T14:28:02Z

After debugging this offline we found that the error comes from add_kubernetes_metadata. While Elastic Agent does not explicitly enables the processor, the underlying Filebeat process

/usr/share/elastic-agent/state/data/install/filebeat-7.13.4-linux-x86_64/filebeat -E setup.ilm.enabled=false ...

runs with default config like /usr/share/elastic-agent/state/data/install/filebeat-7.13.4-linux-x86_64/filebeat.yml which has the processor enabled by default. This means that despite the processor is able to get initialised properly it fails since the matchers are not properly configured.

Despite it seems harmless it is filling up the logs and hence we need a way to better handle it. I think that improving the logging in processor's code might help here, tbh I don't see any reason of logging this kind of messages as errors, maybe we can switch them to debug.

@masci , @exekias , @MichaelKatsoulis do you think we can push it for 7.15 (even as a bug-fix after ff)?

exekias · 2021-08-03T14:59:24Z

how about not enabling this by default? we mostly rely on dynamic inputs, also for K8S logs, so add_k8s_metadata, while still useful for other cases, is not doing much atm (I believe?). Same goes for add_docker_metadata

ChrsMark · 2021-08-03T15:18:05Z

You mean not enabling the processors in Beat's configs in general right? Or disable them when they are run by Agent only?

blakerouse · 2021-08-03T18:26:51Z

Being that Elastic Agent is the now GA and the way forward, maybe just disabling them in the default configuration would be okay?

exekias · 2021-08-04T14:41:29Z

Being that Elastic Agent is the now GA and the way forward, maybe just disabling them in the default configuration would be okay?

This would be a breaking change. I was thinking more about disabling them in agent only

MichaelKatsoulis · 2021-08-10T08:05:58Z

This would be a breaking change. I was thinking more about disabling them in agent only

This is something that needs to be updated in the agent side so that it will not use the default filebeat configuration.

Removing add_kubernetes_metadata from the default config will affect also non-agent uses of filebeat.

exekias · 2021-08-10T08:30:03Z

@blakerouse would it be possible to remove this processor from the Agent beats?

blakerouse · 2021-08-10T12:24:25Z

@exekias At the moment we rely on the default configuration that is shipped with a beat, by changing that 1 behavior we affect all other beats that might rely on something from their default configuration.

We would need to send an empty list of processors in the configuration through the control protocol, but would filebeat even reload that section?

I understand the removing of the default breaks for others, but only in the case they are using the default configuration without any changes, correct? Is filebeat even usable with a default configuration and no changes?

MichaelKatsoulis · 2021-08-10T13:22:31Z

To my understanding @exekias add_kubernetes_metadata processor purpose is needed in Kubernetes environments only.
The default config of filebeat is overridden in the proposed manifests we have for deploying filebeat in kubernetes. configmap.

So removing the processor from the default config wouldn't affect that.

MichaelKatsoulis · 2021-08-11T11:22:19Z

Could we maybe leverage the if statements in the filebeat/metricbeat yaml like in packetbeat.yml ?

ChrsMark · 2021-08-31T08:08:56Z

What @MichaelKatsoulis proposed above sounds good to me. We need a condition to verify that metadata already exists, maybe check for kubernetes.namespace and if so skip the processor. But still, will it be considered a breaking change in Beats? Maybe even if it is considered a breaking change it is for a good reason in general since it protects metadata being overridden by the processor. Thoughts @exekias ?

exekias · 2021-08-31T08:15:37Z

Any option sounds good to me, also consider that given the proximity of 8.0 the possibility of doing this as a breaking change is not that far

ChrsMark · 2021-08-31T12:15:50Z

After re-thinking this and chatting offline with Mike I think we can avoid doing the change on the configuration level with the 2 options below:

Resolve the log flooding issue by fixing logging levels. The level for the specific messages was set to error by Improve some logging messages for add_kubernetes_metadata processor #16866 but previously was set to debug so maybe we can revisit this change.
Skip add_kubernetes_metadata enrichment if k8s metadata already present. This is something we already do in add_cloud_metadata processor when meta are already there by aws module.

Personally I'm +1 for applying both changes.

MichaelKatsoulis · 2021-08-31T13:43:21Z

Skip add_kubernetes_metadata enrichment if k8s metadata already present. This is something we already do in add_cloud_metadata processor when meta are already there by aws module.

@exekias If you don't have any objection with that proposal I will create a PR to fix this. I believe it is the correct approach as the add_kubernetes_metadata processor is not actually needed in scenarios where the metadata are already present due kubernetes dynamic provider.
And it also tackles the problem in its source rather than updating configuration files.

exekias · 2021-09-01T10:03:59Z

SGTM!

adammike · 2021-10-01T21:03:29Z

Any idea when this is going to make it into a release? This bug is still present in 7.15.0

ChrsMark · 2021-10-03T17:23:34Z

@adammike this one will be fixed with 7.16 version of elastic-agent.

adammike · 2021-10-04T13:58:36Z

Seeing as how 7.15.1 is not out yet, I assume 7.16 months away?

ChrsMark · 2021-10-05T07:09:58Z

7.16 is not coupled with any of 7.15.x releases, the scopes are different. However 7.16 is not freezed yet, so it will take some time but not so much :). Btw this is not a critical bug you can just ignore it, right? The only problem is that it might overflow the logs/disk.

tomsseisums · 2021-11-15T11:30:50Z

Btw this is not a critical bug you can just ignore it, right? The only problem is that it might overflow the logs/disk.

In our case, using Elastic Cloud, this simply kills everything, because it logs/sends these like 10 times PER SECOND.

ChrsMark · 2021-11-15T11:40:06Z

Hey @tomsseisums , sorry to hear that :(. Could you use 7.16.0-SNAPSHOT until the official release of 7.16 (it's coming really soon)? Unfortunately we have missed the 7.15.x releases.

tomsseisums · 2021-11-15T11:51:55Z

@ChrsMark Elastic Cloud itself is limited to 7.15 and upgrading agent to 7.16 snapshot results in:

Error: fail to enroll: fail to execute request to fleet-server: status code: 400, fleet-server returned an error: UnsupportedVersion, message: version is not supported

ChrsMark · 2021-11-15T12:31:01Z

Well, in Elastic Cloud you can choose snapshot versions in GCP Belgium I think. However this will take you out of any support/sla so be sure that you actually want to do this and what implications it would have in possible updates.

and-stuber · 2021-11-17T21:30:36Z

Concluding.... Isn't possible to use the Elastic Agent to monitor K8s today, using Elastic Cloud?

LaurisJakobsons · 2021-12-28T13:48:08Z

@ChrsMark In our case, it seems like the issue still remains, at least to some extent. When elastic agent is started, it still fills the logs with like 10k to 20k entries per minute. Although, it seems to cool down after a while and errors disappear when it starts skipping the add_kubernetes_metadata (what supposedly was added as a fix to this problem). Is that the best we can do for now?

rjbaucells · 2022-02-20T16:41:06Z

I just started an Elastic cloud trial and I see this error using 8.0, is the issue fixed?

ChrsMark · 2022-02-21T10:21:29Z

Hey folks! It is identified that the issue persists but for another reason explained at #29767. This will be resolved properly with elastic/elastic-agent#90 so I would suggest following this issue too (fyi @ph).

WoodyWoodsta · 2022-08-31T10:50:19Z

@ChrsMark For me, the errors (also in the range of 20k per minute) appear to be caused by fact that /var/lib/docker/containers is an empty folder. I don't see anywhere in the discussions anyone proposing to solve what is probably the immediate problem: that add_kubernetes_metadata fails if that folder is empty.

Personally I don't care if that processor is enabled by default without a way to change it, so long as it doesn't fail at this magnitude if what it's trying to find is empty (which is clearly a valid scenario).

Notes

My on-prem cluster is running with the containerd CRI.

Mounting individual folders like /var/log/pods and /var/log/containers for containerd kubernetes log ingestion avoids the problem. Mounting the entire /var/log folder (which I need for syslog and auth ingestion) introduces the problem for me, so I'm just under the assumption that it's the empty /var/lib/docker/containers which is the cause of the issue.

ChrsMark · 2022-08-31T11:24:48Z

@ChrsMark For me, the errors (also in the range of 20k per minute) appear to be caused by fact that /var/lib/docker/containers is an empty folder. I don't see anywhere in the discussions anyone proposing to solve what is probably the immediate problem: that add_kubernetes_metadata fails if that folder is empty.

Personally I don't care if that processor is enabled by default without a way to change it, so long as it doesn't fail at this magnitude if what it's trying to find is empty (which is clearly a valid scenario).

Notes

My on-prem cluster is running with the containerd CRI.

Mounting individual folders like /var/log/pods and /var/log/containers for containerd kubernetes log ingestion avoids the problem. Mounting the entire /var/log folder (which I need for syslog and auth ingestion) introduces the problem for me, so I'm just under the assumption that it's the empty /var/lib/docker/containers which is the cause of the issue.

Hey @WoodyWoodsta! If the processor is failing while it is enabled intentionally then we should handle it in another issue. In case of Elastic Agent the processor should not be enabled by default and this is the purpose of this issue.

Are you running Elastic Agent and seeing this issue? If so please keep track of elastic/elastic-agent#90 (fyi @jlind23).

If you still want to use the processor but you hit issues please open another issue cause that should be a different use case. In any case at the moment the Elastic Agent would automatically add k8s metadata without the need of enabling the processor for most of the cases.

WoodyWoodsta · 2022-08-31T12:27:33Z

@ChrsMark Thanks - I was just wanting to point out that on top of the processor being enabled/disabled by default (which seems to be the focus of the discussions in related issues and threads), if /var/lib/docker/containers directory is empty, it fails. If I've understood correctly, anyone with that processor enabled, whether it's intentional or not, that uses containerd for their cluster, but has docker installed on a node will have this error.

If that sounds like a separate thing to you, I'm more than happy to open a new issue!

ChrsMark · 2022-08-31T12:34:32Z

Yes @WoodyWoodsta feel free to file a different issue for this :), it's highly possible that this is a configuration issue or just a corner case we need to fix. Let's take the discussions there once we have the new issue though.

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 3, 2021

ChrsMark added the Team:Integrations Label for the Integrations team label Aug 3, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 3, 2021

MichaelKatsoulis mentioned this issue Sep 1, 2021

Skip add_kubernetes_metadata processor if kubernetes metadata are aleady there #27689

Merged

6 tasks

MichaelKatsoulis closed this as completed in #27689 Sep 3, 2021

MichaelKatsoulis mentioned this issue Dec 2, 2021

docs: Prepare Changelog for 7.16.0 #29245

Merged

ChrsMark mentioned this issue Jan 10, 2022

Remove add_kubernetes_metadata from default Agent's configs #29767

Closed

elastic locked as resolved and limited conversation to collaborators Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Elastic Agent] Error extracting container id in kubernetes #27216

[Elastic Agent] Error extracting container id in kubernetes #27216

blakerouse commented Aug 3, 2021

elasticmachine commented Aug 3, 2021

ChrsMark commented Aug 3, 2021

exekias commented Aug 3, 2021

ChrsMark commented Aug 3, 2021

blakerouse commented Aug 3, 2021

exekias commented Aug 4, 2021

MichaelKatsoulis commented Aug 10, 2021

exekias commented Aug 10, 2021

blakerouse commented Aug 10, 2021

MichaelKatsoulis commented Aug 10, 2021

MichaelKatsoulis commented Aug 11, 2021

ChrsMark commented Aug 31, 2021

exekias commented Aug 31, 2021

ChrsMark commented Aug 31, 2021

MichaelKatsoulis commented Aug 31, 2021

exekias commented Sep 1, 2021

adammike commented Oct 1, 2021

ChrsMark commented Oct 3, 2021

adammike commented Oct 4, 2021

ChrsMark commented Oct 5, 2021

tomsseisums commented Nov 15, 2021 •

edited

Loading

ChrsMark commented Nov 15, 2021

tomsseisums commented Nov 15, 2021

ChrsMark commented Nov 15, 2021

and-stuber commented Nov 17, 2021

LaurisJakobsons commented Dec 28, 2021

rjbaucells commented Feb 20, 2022

ChrsMark commented Feb 21, 2022

WoodyWoodsta commented Aug 31, 2022

ChrsMark commented Aug 31, 2022 •

edited

Loading

Notes

WoodyWoodsta commented Aug 31, 2022

ChrsMark commented Aug 31, 2022

[Elastic Agent] Error extracting container id in kubernetes #27216

[Elastic Agent] Error extracting container id in kubernetes #27216

Comments

blakerouse commented Aug 3, 2021

elasticmachine commented Aug 3, 2021

ChrsMark commented Aug 3, 2021

exekias commented Aug 3, 2021

ChrsMark commented Aug 3, 2021

blakerouse commented Aug 3, 2021

exekias commented Aug 4, 2021

MichaelKatsoulis commented Aug 10, 2021

exekias commented Aug 10, 2021

blakerouse commented Aug 10, 2021

MichaelKatsoulis commented Aug 10, 2021

MichaelKatsoulis commented Aug 11, 2021

ChrsMark commented Aug 31, 2021

exekias commented Aug 31, 2021

ChrsMark commented Aug 31, 2021

MichaelKatsoulis commented Aug 31, 2021

exekias commented Sep 1, 2021

adammike commented Oct 1, 2021

ChrsMark commented Oct 3, 2021

adammike commented Oct 4, 2021

ChrsMark commented Oct 5, 2021

tomsseisums commented Nov 15, 2021 • edited Loading

ChrsMark commented Nov 15, 2021

tomsseisums commented Nov 15, 2021

ChrsMark commented Nov 15, 2021

and-stuber commented Nov 17, 2021

LaurisJakobsons commented Dec 28, 2021

rjbaucells commented Feb 20, 2022

ChrsMark commented Feb 21, 2022

WoodyWoodsta commented Aug 31, 2022

Notes

ChrsMark commented Aug 31, 2022 • edited Loading

Notes

WoodyWoodsta commented Aug 31, 2022

ChrsMark commented Aug 31, 2022

tomsseisums commented Nov 15, 2021 •

edited

Loading

ChrsMark commented Aug 31, 2022 •

edited

Loading