-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricbeat panics when getting the cluster UUID #34384
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Looking at the function flow it is difficult to find how we could end up with a null reference here. Discussed this with @belimawr and one hypothesis is that we pass a pointer to a pointer to the json.Unmarshal call which is unusual and may trigger an edge case. I'll try to reproduce and see if there is a specific state that causes this error |
Playing around with json.Unmarshal I was able to get to a state where both the returned error and passed pointer are nil after the operation, and it's when the bytes would be literal
When only passing a pointer (and not pointer of a pointer), the |
One thing to keep in mind here is that it is possible this was always happening and we just never noticed it because the Metricbeat log files prior to 8.6 were kept separate from the main Elastic Agent log files. In 8.6 we merged all of the log files into one, which makes problems like this much more obvious. We likely would have only noticed this before if someone were specifically reading the agent monitoring Metricbeat logs on a regular basis, which I don't think was the case. This is the same reason nobody noticed the regular logs that Metricbeat couldn't obtain a cluster UUID at all prior to 8.6. |
It has been fixed already by #34480 |
main
, 8.6Metricbeat sometimes panics during startup when calling this function:
beats/metricbeat/module/beat/stats/stats.go
Lines 79 to 100 in 64f98ca
Here is an example of the log generated:
Pretty printed stack trace
It seems that
state.Monitoring
on:beats/metricbeat/module/beat/stats/stats.go
Line 85 in 64f98ca
is
nil
probably due to the order things get initialised when running under Elastic-Agent.Steps to Reproduce
This issue does not happen all the time, it seems to be a race condition, so one might have to try a few times before reproducing it.
I have been able to reproduce it consistently on Linux
elastic-package stack up -v --version=8.7.0-SNAPSHOT -d
elastic-package-stack-fleet-server-1
, most of the times it will panic.You can tail the docker logs, once the container is created, with:
The text was updated successfully, but these errors were encountered: