Usage Documentation

All RHEV Checks are executed via RHEV REST-API, so the following requirements must be met:

HTTPS connection from Monitoring server to RHEV Manager (default: TCP/443)
User for REST-API login
- RHEV 3.0/oVirt 3.0: only admin@internal or another admin user can connect to REST-API
- RHEV 3.1/oVirt 3.1 (and newer): create Admin-role with "Login Permissions" and assign to this user
- oVirt 3.3: add role "viewer" to this user

The following checks can be performed by this plugin:

General Options

-vv: verbose Mode
-vvv: debug Mode
-p <port>: Port of REST-API (default: 443)

Note that check_rhev3 <= 1.3 uses 8443 as default port and check_rhev3 >= 1.4 uses 443

--ca-file: Path to RHEV-CA
-A <api>: REST-API path (default: /api)
-a <User>@<Domain>:<Password>: Authentication credentials
-f <Auth file>: Authentication file
-t <timeout>: Seconds before connection times out (default: 15)
-w <warn>: warning value
-c <crit>: critical value
-V: display version of plugin and exit
-h: Print detailed help screen

Authentication can be done either with -a

$ check_rhev3 -a <User>@Domain>:<Password>

or with -f (advantage of this version is that your password isn't visible in Icinga/Nagios configuration!)

$ check_rhev3 -f <auth file>

Syntax of auth file must be:

username=<User>@<Domain>
password=<Password>

Deprecated options since check_rhev3 1.3 (and newer):

-o: Use cookie based authenticaticated sessions (requires RHEV >= 3.1)

Cookie based authentication will be used with version 1.3 (and newer) per default. See https://github.com/ovido/check_rhev3/issues/1 for details.

Datacenter Checks

Datacenter checks are executed generally in the following way:

$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -D <datacenter> [-l <check>] [-s <subcheck>]

You can monitor more then one datacenter with option -D:

For example you have 3 datacenters with the following names:

testing
production01
production02

Depending on your -D argument you can monitor a specific, multiple or all datacenters:

-D * : monitor all datacenters (testing, production01 and production02)
-D production* : monitor all production* datacenters (production01 and production02)
-D production01: monitor only production01 datacenter

Datacenter Status

Get the status of your datacenter(s) with the following option:

-l status

Hint: If you don't specify a check with -l, the status of the datacenter will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -D default -l status
RHEV OK: Datacenters ok - 1/1 Datacenters with state UP|Datacenters=1;1;1;0;

Datacenter Version

Get the version of your datacenter and exit:

-l version

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -D default -l version
RHEV OK: Version ok - default: 3.0

Datacenter Storagedomain status

Get the status of all attached storagedomains with:

-l storage -s status

Hint: If you don't specify a subcheck with -s, the status of the storagedomains will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -D default -l storage -s status
RHEV OK: Datacenters ok - 3/3 Storagedomains with state Active|storagedomains=3;3;3;0;

Datacenter Storagedomain usage

Get the used disk space of all attached storagedomains:

-l storage -s usage [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -D default -l storage -s usage -w 60 -c 80
RHEV WARNING: storage warning - 70.90% used (Default-iscsi) |storage_Default-iscsi=70.90%;60;80;0;

Datacenter Storagedomain overall usage

Get the overall used disk space of all attached storagedomains:

-l storage -s overall-usage [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -D default -l storage -s overall-usage -w 60 -c 80
RHEV WARNING: storage warning - 70.90% used (default) |ovstorage=70.90%;60;80;0;

Cluster Checks

Cluster checks are executed generally in the following way:

$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -C <cluster> [-l <check>] [-s <subcheck>]

You can monitor more then one cluster with option -C:

For example you have 3 cluster with the following names:

testing
production01
production02

Depending on your -C argument you can monitor a specific, multiple or all cluster:

-C * : monitor all clusters (testing, production01 and production02)
-C production* : monitor all production* cluster (production01 and production02)
-C production01: monitor only production01 cluster

Cluster Host Status

Get the status of all cluster hosts with:

-l hosts [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value for hosts which have to be UP. If you don't specify it, all hosts have to be UP.

Hint: If you don't specify a check with -l, the status of the hosts will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -C default -l hosts -w 1 -c 1
RHEV OK: Cluster ok - 1/1 Hosts with state UP|hosts=1;1;1;0;

Cluster VM Status

Get the status of all cluster vms with:

-l vms [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value for vms which have to be UP. If you don't specify it, all vms have to be UP.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -C default -l vms -w 20 -c 25
RHEV CRITICAL: Cluster critical - 6/32 Vms with state UP|vms=6;20;25;0;

Cluster Network Status

Get the status of all cluster networks with:

-l networks [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value for networks which have to be Operationals. If you don't specify it, all networks have to be Operational.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -C default -l networks -w 7 -c 7
RHEV OK: Clusters ok - 7/7 Networks with state Operational|networks=7;7;7;0;

Host Checks

Host checks are executed generally in the following way:

$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -R <Host> [-l <check>] [-s <subcheck>]

You can monitor more then one Host with option -R:

For example you have 3 hosts with the following names:

rhev-test01
rhev-prod01
rhev-prod02

Depending on your -R argument you can monitor a specific, multiple or all hosts:

-R * : monitor all hosts (rhev-test01, rhev-prod02 and rhev-prod03)
-R rhev-prod* : monitor all rhev-prod* hosts (rhev-prod01 and rhev-prod02)
-R rhev-prod01: monitor only rhev-prod01 host

Host Status

Get the status of your host(s) with the following option:

-l status

Hint: If you don't specify a check with -l, the status of the hosts will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l status
RHEV OK: Hosts ok - 1/1 Hosts with state UP|Hosts=1;1;1;0;

Host VM Status

Get the status of the vms running on this host(s) with the following option:

-l vms

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l vms
RHEV CRITICAL: Vms critical - 4/41 Vms with state UP

If you specify -v (verbose) you see more detailed information:

$ check_rhev3 -H rhevm -a admin@internal:password -R -l vms -v
RHEV CRITICAL: Vms critical - 4/41 Vms with state UP [Details: 4 up, 1 suspended, 36 down]

Host Load

Get the load usage (5min average) of your host with the following option:

-l load [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, the default value for warning equals the number of CPU cores and critical equals twice the number of CPU cures. E.g. you have 2 4-core CPUs, the default warning value is 8 and the default critical value is 16.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l load -w 2 -c 4
RHEV OK: cpu.load.avg.5m ok - 0.010  (rhevh) |cpu.load.avg.5m=0.010;2;4;0;

Host CPU utilization

Get the cpu utilization of your host with:

-l cpu -s usage [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Hint: If you don't specify a subcheck with -s, the cpu usage of this host will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l cpu -s usage -w 60 -c 80
RHEV OK: cpu ok - 11% used (rhevh) |cpu=11%;60;80;0; cpu.current.user=7;cpu.current.system=4;cpu.current.idle=89;

Host KSM usage

Get the percentage of CPU usage for Kernel SamePage Merging with:

-l ksm [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l ksm -w 60 -c 80
RHEV OK: ksm.cpu.current ok - 3% used (rhevh) |ksm.cpu.current=3%;60;80;0;

Host Memory usage

Monitor the memory usage in percentage with:

-l memory -s mem [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Hint: If you don't specify a subcheck with -s, the memory usage of this host will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l memory -s mem -w 60 -c 80
RHEV CRITICAL: memory critical - 80.11% used (rhevh) |memory=80.11%;60;80;0; memory.cached=0;memory.used=9485712097.28;memory.buffers=0;

Host Swap usage

Get the swap space usage:

-l memory -s swap [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l memory -s swap
RHEV OK: swap ok - 11.71% used (rhevh) |swap=11.71%;60;80;0;

Host Network status

Get the status of all network interfaces:

-l network -s status

Specify a network interface:

-l network -s status -n <nic>

Hint: If you don't specify a subcheck with -s, the nic status of this host will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s status
RHEV OK: Hosts ok - 2/2 Nics with state Active|nics=2;2;2;0;

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s status -n eth0
RHEV OK: Hosts ok - 1/1 Nics with state Active|nics=1;1;1;0;

Host Network traffic

Get the network traffic of all nics in Mbit/s:

-l network -s traffic [-w <warn>] [-c <crit>]

Specify a network interface:

-l network -s traffic -n <nic> [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 500 (warning) and 700 (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s traffic -w 2048 -c 3062
RHEV OK: traffic ok - eth0: 0 Mbit/s eth1: 0 Mbit/s |traffic_eth0=0MB;62.5;87.5;0; traffic_eth1=0MB;62.5;87.5;0;

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s traffic -w 2048 -c 3062 -n eth0
RHEV OK: traffic ok - eth0: 0 Mbit/s |traffic_eth0=0MB;62.5;87.5;0;

Host Network errors

Get the network errors of all nics:

-l network -s errors [-w <warn>] [-c <crit>]

Specify a network interface:

-l network -s errors -n <nic> [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s errors -w 5 -c 10 -n eth0
RHEV OK: errors ok - eth0: 0 Errors (rhevh) |errors_eth0=0c;5;10;0;

Storagedomain Checks

Storagedomain checks are executed generally in the following way:

$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -S <storagedomain> [-l <check>]

You can monitor more then one storagedomain with option -S:

For example you have 3 storagedomains with the following names:

isos
exports
iscsi01

Depending on your -S argument you can monitor a specific, multiple or all storagedomains:

-S * : monitor all storagedomains (isos, exports, iscsi01)
-S is* : monitor all is* storagedomains (isos and iscsi01)
-S iscsi01: monitor only iscsi01 storagedomain

Storagedomain Usage

Monitor used disk space of storagedomains:

-l usage

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -S isos -l usage -w 60 -c 80
RHEV OK: storage ok - 39.78% used (isos) |storage_isos=39.78%;60;80;0;

Virtual Machine Checks

VM checks are executed generally in the following way:

$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -M <VM> [-l <check>] [-s <subcheck>]

You can monitor more then one VM with option -M:

For example you have 3 vms with the following names:

rhev-test01
rhev-prod01
rhev-prod02

Depending on your -M argument you can monitor a specific, multiple or all VMs:

-M * : monitor all VMs (rhev-test01, rhev-prod02 and rhev-prod03)
-M rhev-prod* : monitor all rhev-prod* VMs (rhev-prod01 and rhev-prod02)
-M rhev-prod01: monitor only rhev-prod01 VM

VM Status

Get the status of your virtual machine(s) with the following option:

-l status

Hint: If you don't specify a check with -l, the status of this VM will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l status
RHEV OK: Vms ok - 1/1 Vms with state UP|Vms=1;1;1;0;

VM CPU utilization

Monitor CPU utilization with:

-l cpu [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l cpu -w 60 -c 80
RHEV OK: cpu ok - 17% used (vm) |cpu=17%;60;80;0; cpu.current.guest=17;cpu.current.hypervisor=0;

VM Memory utilization

Monitor Memory utilization with:

-l memory

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l memory -w 60 -c 80
RHEV CRITICAL: memory critical - 82.00% used (vm) |memory=82.00%;60;80;0;

VM Network traffic

Get network traffic in MBit/s with:

-l network -s traffic

Specify a nic with:

-l network -s traffic -n <nic>

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 500 (warning) and 700 (critical) is used.

Hint: If you don't specify a subcheck with -s, the traffic of all nics will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l network -s traffic -w 900 -c 1000
RHEV OK: traffic ok - nic1: 0 Mbit/s (vm) |traffic_nic1=0MB;62.5;87.5;0;

VM Network errors

Get the network errors of all nics:

-l network -s errors [-w <warn>] [-c <crit>]

Specify a network interface:

-l network -s errors -n <nic> [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l network -s errors -w 5 -c 10
RHEV OK: errors ok - nic1: 0 Errors (vm) |errors_nic1=0c;5;10;0;

Virtual Machine Pool Checks

VM Pool checks are executed generally in the following way:

$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -P <VM-Pool> [-l <check>]

You can monitor more then one VM Pool with option -P:

For example you have 3 VM Pools with the following names:

rhev-test01
rhev-prod01
rhev-prod02

Depending on your -P argument you can monitor a specific, multiple or all VM Pools:

-P * : monitor all VM Pools (rhev-test01, rhev-prod02 and rhev-prod03)
-P rhev-prod* : monitor all rhev-prod* VM Pools (rhev-prod01 and rhev-prod02)
-P rhev-prod01: monitor only rhev-prod01 VM Pool

VM Pool Usage

Get the number of running virtual machine(s) of this VM Pool(s):

-l usage [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value for number of free VMs.

Hint: If you don't specify a check with -l, the usage of this VM Pool will be checked.

Example:

$ check_rhev3 -H rhevm -a admin@internal:password -P pool -l usage -w 1 -c 2
RHEV WARNING: VM Pool warning - 3/4 vms free|vmpool=1;1;2;0;

Icinga/Nagios Definitions

Now that you know how to use this plugin using the command line, here's a short description on how to integrate it into Icinga/Nagios.

For details see Icinga and Nagios documentation available on:

Command

First of all, you have to define the check_rhev3 command:

define command{
  command_name check_rhev3
  command_line $USER1$/check_rhev3 -H $_RHEVM$ -a $ARG1$ $ARG2$
}

In this example we use a custom object variable $_RHEVM$, which represents the IP address or hostname of the RHEV Manager. See http://docs.icinga.org/latest/en/customobjectvars.html for details on custom object variables.

Host

To monitor a RHEV Hypervisor you have to create a host:

define host{
  use       linux-server
  host_name rhevh
  alias     RHEV Hypervisor
  address   192.168.1.2
  _rhevm    192.168.1.1
}

The hypervisor will be monitored through a REST-API call of the RHEV-Manager, so you Icinga/Nagios-server doesn't need access to the IP of your hypervisor, but must be able to connect to RHEV-Manger. IP-Adress or hostname of RHEV-Manager is specified via custom object variable $_RHEVM$.

Note: You have to set this variable for all hosts, monitored with this plugin!

An example for a VM would be:

define host{
  use       linux-server
  host_name my-vm
  alias     My virtual machine
  address   192.168.1.3
  _rhevm    192.168.1.1
}

Again, we use $_RHEVM$ variable to speak with RHEV-Manager.

Service

After defining hosts, you have to create services and assign these services to your hosts:

define service{
  use                 generic-service
  host_name           rhevh
  service_description RHEV CPU Check
  check_command       check_rhev3!admin@internal:password!-R $HOSTNAME$ -l cpu
}

In this example, a CPU check for RHEV-Hypervisor rhevh is defined.

Note that we use variable $HOSTNAME$ as name of RHEV Host. So make sure that host_name for your Icinga/Nagios host definition matches the name of this host in your RHEV environment. Otherwise you can hardcode the name in Icinga/Nagios configuration!

Memory Check for your VM with warning and critical value:

define service{
  use                 generic-service
  host_name           my-vm
  service_description RHEV Memory Check
  check_command       check_rhev3!admin@internal:password!-R $HOSTNAME$ -l memory -s usage -w 70 -c 90
}

As for RHEV Hypervisor, make sure that host_name matches vm name in RHEV!

Usage Documentation

General Options

Deprecated options since check_rhev3 1.3 (and newer):

Datacenter Checks

Datacenter Status

Datacenter Version

Datacenter Storagedomain status

Datacenter Storagedomain usage

Datacenter Storagedomain overall usage

Cluster Checks

Cluster Host Status

Cluster VM Status

Cluster Network Status

Host Checks

Host Status

Host VM Status

Host Load

Host CPU utilization

Host KSM usage

Host Memory usage

Host Swap usage

Host Network status

Host Network traffic

Host Network errors

Storagedomain Checks

Storagedomain Usage

Virtual Machine Checks

VM Status

VM CPU utilization

VM Memory utilization

VM Network traffic

VM Network errors

Virtual Machine Pool Checks

VM Pool Usage

Icinga/Nagios Definitions

Command

Host

Service

Clone this wiki locally