-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative to Ganglia #1159
Comments
Thanks for that note - do you have suggestions for open-source alternatives? Perhaps something simple for a load dashboard using Grafana or the like.... |
I did find netdata to be very interesting: https://github.com/netdata/netdata xdmod is another one that seems well maintained: https://github.com/ubccr/xdmod/ EDIT: Other mentions: It would be nice if openHPC would push for an ELK-stack or TICK-stack. This would allow for an integration into existing monitoring solutions. |
Now that Ganglia (and Nagios) are deprecated from OpenHPC 2.0, which alternatives are officially recommended for monitoring? |
@e-alfred I can offer my 2cent tips: choose a modern monitoring software were tasks are neatly separated like Prometheus for collect and store the data, in combination with Grafana for the dashboards. Or another stack like TICK, as already mentioned by @Spenhouet. What I don't like about Ganglia is that the approach is a monolithic one, without nearly any possibility of decoupling single components (e.g. replace and/or extend gmond is not an easy task) not to mention change/adapt the web dashboard. While Ganglia was an important piece of monitoring technology (with the added bonus of being open source) the approach in the field is nowadays quite dated (IMHO). Modern monitoring software should be modular, with different components and every one of them should be replaceable, which means now we can talk about monitoring pipelines (or stacks) and not referring to a single software in charge of every single aspect of monitoring. In particular, modern monitoring software components should fall in the following four macro categories regarding their tasks with neatly distinction between each of them:
|
Just forgot to add that I am also a great believer in Netdata, despite it's not exactly following the clear tasks division I have mentioned before. It can be easily installed (via packages too nowadays) and give immediately a very rich and detailed dashboard accessible via web browser. But there is an important issue (not yet) solved: there is no support for Infiniband network monitoring. |
Greetings, For the Infiniband collector, I currently monitor all elements that would be provided by Also, I'd like to provide a bit of experience about netdata usage in HPC environment (real experience, not just ideas):
fwiw, I'm using netdata the only monitoring tool in my 1K+ HFT nodes & 1K+ HPC nodes. Best of both worlds. Cheers |
Hello! |
While I really like Netdata, it has two problems/disadvantages:
All of this makes it not completely useful on e. g. diskless systems and for centralized monitoring. |
Allow me to correct the points you made: Netdata does alert itself, and provides a wide range of notification methods by default (https://my-netdata.io/infographic.html): mail, http request, irc, syslog, pagerduty, slack, alerta, awssns, dynatrace, flock, hangouts, matrix, messagebird, discord, pushbullet, prowl, telegram, twilio, pushover... It also provides a cloud-based long-term retention, service (netdata.cloud) or you can integrate it in any TSDB you already have that understands one of these formats: graphite, json, mongodb, opentsdb, prometheus. if you don't want to have a TSDB, you can also set one or more netdata servers to act as collector (streaming feature), and they'll hold the data of others nodes, without other tool than the default binary. It also allows to not have any data stored locally, useful for diskless nodes. If you don't want the web server, you can either filter IP Addresses, or disable it entirely by a single configuration:
and it'll only push the data to backends or netdata collector. If you don't want an instance of netdata everywhere, you can set a single netdata instance to monitor multiple out-of-band elements, like BMC (through freeipmi), fping, or the multiple collectors available. My current workflow is the following:
And I think I've a pretty acceptable infiniband monitoring now. If you can provide feedback, It'll be greatly appreciated ❤️ |
netdata is not a replacement for Ganglia. |
That's the case: it's already deployed in banks, industry, hpc and other air-gaped environment. I also stated it in the comment:
You mismatched the cloud dashboard and the agent, or just didn't read the doc at all. As a general matter, please read thoughtfully before making a (wrong) opinion. |
Top10 banks have enough resources to build their own netdata.cloud alternative. What about enthusiasts without frontend dev experience? Ganglia not only gmond service. It is also web frontend. If we accept that netdata monitor can/should replace gmond. Then grafana integration must be described at installation guide and some kind of dashboard provided. |
Hello, I'm deeply interested on this thread, but accordingly to this site: https://www.netdata.cloud/integrations/#featured there are some support for exporters. I think we may be able to archive historical data on those services, is this correct @Saruspete? If yes, the question from @severgun would be answered. Also, can netdata read from them, after exporting, to display the historical data? At this moment we are considering |
No dev experience is required: just 1 configuration file and it'll start sending data to the TSDB of your choice (eg opentsdb / grafana)
Netdata also provide a web frontend, which by default only monitors the local host. any netdata instance can be a central collector and show the data other nodes sends it.
And that's all: you have a live graph of your choice.
Indeed, I can help you providing these configuration templates (a lot of them are already in the github documentation, but not really well indexed on the website) But even better: you can let end-users chose which solution they want, eg :
Yes, netdata is able to send its collected values into any of these databases (and many more: 33 in fact
Of course. you can also chose to send to multiple & different db at the same time.
I don't think so: the agent can keep its values in a highly compressed local DB (in 1G in
I'm using beegfs, and was planning on doing the beegfs collector. I can either create a wrapper over beegfs-mon, or add an output format in beegfs itself. The former would allow integration with older versions of beegfs, while the latter would be more efficient.
As a side note, this is not how most companies for which IT is not a core business works: IT is seen as a cost-center, and they are not willing to do internal development + take the blame in case of issue, while they can just pay for a paid support and send the blame to them ("nobody ever got fired for buying ibm"). Final note: I have no part, no interest, am not employee of netdata. I'm only pushing a tool which has tremendous potential in all kinds of workloads, enabled me to investigate & fix hidden issues with major constructors, and enforce high code standards to trust them in the future. |
I recommend Nightingale github page,it has advatanges:
|
Maybe, we can still work with Ganglia through a centos7-ganglia-web-docker-container |
Any updates? @berlin2123 that's some abomination tbh. It might work, but it's not supposed to be like that. If you really want to use Ganglia that much just clone/migrate the repository to GitHub and keep it updated. I bet there are still people who are willing to maintain the project. |
@vkhodygo It doesn't look to me like anyone is still willing to maintain the Ganglia software. There are dozens of forks of the project but nothing has been done. I think that Ganglia is not usable with this and I would therefore join the question if OpenHPC already has a plan what to use instead? |
Ganglia still exists in EPEL for CentOS Stream 8 and CentOS Stream 9. Eventually we'll have to find a solution that is maintained. I have a VictoriaMetrics server collecting statistics from my HPC cluster, with each node running the Prometheus node_exporter agent. I suppose it should be "simple" to create a custom Grafana dashboard to show critical metrics for the cluster nodes. |
ganglia-web version 3.7.6 is released 3 days age, which directly works fine inside RHEL9/8 with php8/7 now. The official rpms (EPEL) or deb packages may be available recently. |
ganglia-web (rpm) in version 3.7.6 is in epel-testing |
3.7.6 is in epel (not-testing) now. Feel free to test. An issue that has been identified is that the MONTH and YEAR pages still have problems in php8 (el9), which has been fixed in PULL 379 of ganglia-web. Feel free to submit other issues !!! |
There is also a bug in physical_view.php Issue |
A friendly reminder that this issue had no activity for 30 days. |
Ganglia currently has no maintainers: https://sourceforge.net/p/ganglia/mailman/message/36795542/
For the centOS8.1 version you might want to think about replacing Ganglia.
The text was updated successfully, but these errors were encountered: