Skip to content

Configuring Ganglia

waldner edited this page Aug 27, 2015 · 4 revisions

Here we'll see how Ganglia is managed in Aloja.

Gmond

Each node that we want to monitor with Ganglia has to run the Ganglia monitoring daemon (aka gmond).

Since UDP multicast, Ganglia's preferred way to exchange data, is not always supported, one node is chosen as the gmond "receiver" (normally the cluster's main node, eg node -00), and all nodes are configured to send to this node using UDP unicast. This allows to use the same configuration file on all the cluster nodes, including the receiver node itself.

The receiver node is thus the only one that knows the status of the entire cluster, so this is the node to which gmetad, Ganglia's meta collector daemon, needs to connect to receive the cluster data (see below for details about gmetad). In the following, we assume that gmond receiver node is node -00 in a cluster (eg, al-73-00 for cluster al-73).

To install gmond on a node, include the following in the cluster or node configuration:

extraLocalCommands="
...
install_ganglia_gmond;
config_ganglia_gmond $clusterName;
..."

$clusterName is the value that ends up in gmond's config file (/etc/ganglia/gmond.conf):

cluster {
  name = "al-73"    /* this one */

Normally it will be the real cluster name, in special cases (eg standalone nodes) you may want to change it to something appropriate.

Gmetad

Gmetad is the daemon in charge of collecting data about one or more clusters, by connecting via TCP port 8649 (by default) to a gmond daemon that knows the status of the cluster (in our case, the node with the receiving gmond, eg the -00 node).

So we should arrange for this port on the node running the receiving gmond to be reachable by gmetad. Usually this is achieved using an SSH tunnel, directly against node -00's public IP if available, or against the appropriate forwarded port of the cluster's public IP. If gmetad is in the same internal network as the receiving gmond, no tunnel is necessary (see examples below).

If, on a gmetad node, one or more SSH tunnels are needed, we need to install the ssh-tunnel package, with:

extraLocalCommands="
...
install_ssh_tunnel <tunnel1> <tunnel2> ...;
..."

where <tunnel1>, <tunnel2> etc. are the tunnels we want to run on the node. This would be a cluster name, if the SSH parameter to connect to it can be determined from the cluster configuration files, for example:

extraLocalCommands="
...
install_ssh_tunnel 'al-73' 'al-41';
..."

This sets up a tunnel named after the cluster, and sets up a listening TCP port on localhost connecting to which we actually connect to port 8649 of the receiving gmond in the cluster. The actual local TCP port is the same as the one used at the cluster border to forward SSH connections to the internal node; in the example above we would set up localhost:27300 and localhost:24100 to connect to the al-73's and al-41's receiving gmond respectively.

For special cases we might need to specify the exact SSH command line manually, we can do it provided that we put an asterisk at the beginning. The format is "tunnel_name ssh_args", for example:

extraLocalCommands="
...
install_ssh_tunnel 'al-73' '*minerva -o StrictHostKeyChecking=no -L 8888:127.0.0.1:8649 user@minerva-101';
..."

So here, besides localhost:27300 for cluster al-73, we also set up localhost:8888 to reach minerva's receiving gmond.

Once we know which clusters we want to reach, and which host/port pairs to use to connect to them, we can install gmetad as follows:

extraLocalCommands="
...
install_ganglia_gmetad;
config_ganglia_gmetad 'al-73 localhost:27300' 'minerva localhost:8888' 'local local-gmond:8649';
..."

The format of each argument is 'clustername host:port'. Here we are monitoring 3 clusters: al-73 (reachable via the previously set up SSH tunnel on localhost:27300), minerva (reachable via the manually-configured SSH tunnel on localhost:8888) and another one called local whose gmond is directly reachable without any tunnel, so we go directly to its port 8649.

The above config results in the following lines in /etc/ganglia/gmetad.conf:

...
data_source "al-73" localhost:27300
data_source "minerva" localhost:8888
data_source "local" local-gmond:8649
...

Ganglia web interface

This needs to be installed on a machine with a running gmetad, and will show all the clusters that gmetad knows about. No configuration is necessary, just install it with

extraLocalCommands="
vm_install_webserver;
...
install_ganglia_web;
..."

And go to http://webserver_ip_or_address/ganglia/ to access it.