-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat] Monitoring metrics don't work when containerised #6620
Comments
My initial guess is that it cannot find its own metrics in This problem extends from the fact that @kvch Can you try to reproduce this? I think we need a test case for this, and we can discuss some possible solutions. |
@Constantin07 Just so we are on the same page I should clarify that those log messages indicate a problem with the self-monitoring feature that was added in 6.2 that let's the Beat report it's own CPU/memory/load information in the log (with a message of "Non-zero metrics in the last 30s") and to X-Pack Monitoring if configured. Your regular metrics from the system module should not be affected. |
@andrewkroh I can reproduce problem. |
How about #6641 fix? It is just enough to run metricbeat with system.hostfs argument to reproduce problem, it is not related to docker itself. |
I think a general solution without modifying gosigar could be to obtain the pid of the process from |
@andrewkroh yes, you are right. I do see in Elasticsearch regular metrics but the error log message to me appears misleading (it's kind of "it works" nut not completely). |
@jsoriano can you give more details about general solution? In hostfs proc dir we have host processes, /hostfs/proc/self/status - it will be not an metricbeat process. I will try later to check myself, but for now on I don't understand how it will work. |
@ewgRa host proc dir contains all processes running in any PID namespace of the machine, this includes the namespace in which the metricbeat process runs. The special file I say it could be a general solution because it'd also work when no namespacing is used:
|
@jsoriano thanks for brilliant idea, I made changes, it works, close to magic level :) Can you review it again? Failed CI looks like not related to my changes. I see only two problems/limitations from this solution:
But I think this is acceptable edge cases. |
Fix self metrix when containerised #6620
Fixed by #6641 |
Thanks |
Is there a workaround for this? I am running 6.2.4 and issue still persist. |
+1 still seeing: I added hostPid: true in accordance with: #6734 Edit: I just realized that's not where it goes. Moved in in accordance with https://kubernetes.io/docs/concepts/policy/pod-security-policy/. Still no dice. We see the same errors in the logs as before. |
@grantcurell what version of metricbeat are you using? could you also share the configuration you are using to start it? |
Update: It took me a second to get it in the right place, but Metricbeat is 6.2.4. The logs are clean now and I'm not receiving the error, but the behavior of the dashboard is erratic and I'm not sure why.
Update: Took me a second to get it in the right place and I'm no longer receiving the error in the logs. What is strange is the dashboard behaves very erratically and the data is incorrect. For example: I'm doing a controlled test where I'm pumping 5Gb/s into a security sensor I have, am confirming with traditional monitoring tools that the sensor is receiving the expected 5Gb/s (4.65 to be exact with the loss from overhead), but Metricbeat's reading for inbound traffic jumps around all over the place. Anywhere from 17MB/s to 120MB/s and it changes on each 5 second interval I have the dashboard set to. The other problem I have can be seen below. If I set the time period to anything less than 30 minutes the entire top part of the dashboard zeros out, but the accompanying data continues to display appropriately - including network speed which you can see is appropriately sitting at 600MB/s (4800 Mb/s). Additional Info: This is running on Kubernetes 1.9.7 |
@grantcurell thanks for all the details. 6.2.4 didn't include yet the fix for this, you need 6.3.0 or later, in any case the Regarding the other problems, it'd be great if you could confirm them with a more modern metricbeat version and open specific issues. |
Can I update metricbeat independently of Elasticsearch in this case? |
If you are using Elasticsearch 6.X this should be fine, check the product compatibility matrix. |
@jsoriano upgrading to Metricbeat 6.4.2 didn't fix the problem. I still get a bunch of strange partial data if the time interval is anything less than 30 minutes. Ex the Kubernetes dashboard: or But move it to 30 minutes and you get: |
Hello,
I'm still getting these errors in metricbeat logs with version 6.2.3 when deployed as docker container:
but there is no clue what could be the problem. I'm running an official metricbeat docker image and trying to pull stats from host.
The
metricbeat
container is run using this command:cat system.yml
Any idea what could be the cause ?
The text was updated successfully, but these errors were encountered: