-
Notifications
You must be signed in to change notification settings - Fork 26
Usage Documentation
All RHEV Checks are executed via RHEV REST-API, so the following requirements must be met:
- HTTPS connection from Monitoring server to RHEV Manager (default: TCP/443)
- User for REST-API login
- RHEV 3.0/oVirt 3.0: only admin@internal or another admin user can connect to REST-API
- RHEV 3.1/oVirt 3.1 (and newer): create Admin-role with "Login Permissions" and assign to this user
- oVirt 3.3: add role "viewer" to this user
The following checks can be performed by this plugin:
- -vv: verbose Mode
- -vvv: debug Mode
- -p <port>: Port of REST-API (default: 443)
Note that check_rhev3 <= 1.3 uses 8443 as default port and check_rhev3 >= 1.4 uses 443
- --ca-file: Path to RHEV-CA
- -A <api>: REST-API path (default: /api)
- -a <User>@<Domain>:<Password>: Authentication credentials
- -f <Auth file>: Authentication file
- -t <timeout>: Seconds before connection times out (default: 15)
- -w <warn>: warning value
- -c <crit>: critical value
- -V: display version of plugin and exit
- -h: Print detailed help screen
Authentication can be done either with -a
$ check_rhev3 -a <User>@Domain>:<Password>
or with -f (advantage of this version is that your password isn't visible in Icinga/Nagios configuration!)
$ check_rhev3 -f <auth file>
Syntax of auth file must be:
username=<User>@<Domain>
password=<Password>
- -o: Use cookie based authenticaticated sessions (requires RHEV >= 3.1)
Cookie based authentication will be used with version 1.3 (and newer) per default. See https://github.com/ovido/check_rhev3/issues/1 for details.
Datacenter checks are executed generally in the following way:
$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -D <datacenter> [-l <check>] [-s <subcheck>]
You can monitor more then one datacenter with option -D:
For example you have 3 datacenters with the following names:
- testing
- production01
- production02
Depending on your -D argument you can monitor a specific, multiple or all datacenters:
- -D * : monitor all datacenters (testing, production01 and production02)
- -D production* : monitor all production* datacenters (production01 and production02)
- -D production01: monitor only production01 datacenter
Get the status of your datacenter(s) with the following option:
-l status
Hint: If you don't specify a check with -l, the status of the datacenter will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -D default -l status
RHEV OK: Datacenters ok - 1/1 Datacenters with state UP|Datacenters=1;1;1;0;
Get the version of your datacenter and exit:
-l version
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -D default -l version
RHEV OK: Version ok - default: 3.0
Get the status of all attached storagedomains with:
-l storage -s status
Hint: If you don't specify a subcheck with -s, the status of the storagedomains will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -D default -l storage -s status
RHEV OK: Datacenters ok - 3/3 Storagedomains with state Active|storagedomains=3;3;3;0;
Get the used disk space of all attached storagedomains:
-l storage -s usage [-w <warn> ] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -D default -l storage -s usage -w 60 -c 80
RHEV WARNING: storage warning - 70.90% used (Default-iscsi) |storage_Default-iscsi=70.90%;60;80;0;
Get the overall used disk space of all attached storagedomains:
-l storage -s overall-usage [-w <warn> ] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -D default -l storage -s overall-usage -w 60 -c 80
RHEV WARNING: storage warning - 70.90% used (default) |ovstorage=70.90%;60;80;0;
Cluster checks are executed generally in the following way:
$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -C <cluster> [-l <check>] [-s <subcheck>]
You can monitor more then one cluster with option -C:
For example you have 3 cluster with the following names:
- testing
- production01
- production02
Depending on your -C argument you can monitor a specific, multiple or all cluster:
- -C * : monitor all clusters (testing, production01 and production02)
- -C production* : monitor all production* cluster (production01 and production02)
- -C production01: monitor only production01 cluster
Get the status of all cluster hosts with:
-l hosts [-w <warn> ] [-c <crit>]
You can optional specifiy a warning and critical value for hosts which have to be UP. If you don't specify it, all hosts have to be UP.
Hint: If you don't specify a check with -l, the status of the hosts will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -C default -l hosts -w 1 -c 1
RHEV OK: Cluster ok - 1/1 Hosts with state UP|hosts=1;1;1;0;
Get the status of all cluster vms with:
-l vms [-w <warn> ] [-c <crit>]
You can optional specifiy a warning and critical value for vms which have to be UP. If you don't specify it, all vms have to be UP.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -C default -l vms -w 20 -c 25
RHEV CRITICAL: Cluster critical - 6/32 Vms with state UP|vms=6;20;25;0;
Get the status of all cluster networks with:
-l networks [-w <warn> ] [-c <crit>]
You can optional specifiy a warning and critical value for networks which have to be Operationals. If you don't specify it, all networks have to be Operational.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -C default -l networks -w 7 -c 7
RHEV OK: Clusters ok - 7/7 Networks with state Operational|networks=7;7;7;0;
Host checks are executed generally in the following way:
$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -R <Host> [-l <check>] [-s <subcheck>]
You can monitor more then one Host with option -R:
For example you have 3 hosts with the following names:
- rhev-test01
- rhev-prod01
- rhev-prod02
Depending on your -R argument you can monitor a specific, multiple or all hosts:
- -R * : monitor all hosts (rhev-test01, rhev-prod02 and rhev-prod03)
- -R rhev-prod* : monitor all rhev-prod* hosts (rhev-prod01 and rhev-prod02)
- -R rhev-prod01: monitor only rhev-prod01 host
Get the status of your host(s) with the following option:
-l status
Hint: If you don't specify a check with -l, the status of the hosts will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l status
RHEV OK: Hosts ok - 1/1 Hosts with state UP|Hosts=1;1;1;0;
Get the status of the vms running on this host(s) with the following option:
-l vms
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l vms
RHEV CRITICAL: Vms critical - 4/41 Vms with state UP
If you specify -v (verbose) you see more detailed information:
$ check_rhev3 -H rhevm -a admin@internal:password -R -l vms -v
RHEV CRITICAL: Vms critical - 4/41 Vms with state UP [Details: 4 up, 1 suspended, 36 down]
Get the load usage (5min average) of your host with the following option:
-l load [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, the default value for warning equals the number of CPU cores and critical equals twice the number of CPU cures. E.g. you have 2 4-core CPUs, the default warning value is 8 and the default critical value is 16.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l load -w 2 -c 4
RHEV OK: cpu.load.avg.5m ok - 0.010 (rhevh) |cpu.load.avg.5m=0.010;2;4;0;
Get the cpu utilization of your host with:
-l cpu -s usage [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.
Hint: If you don't specify a subcheck with -s, the cpu usage of this host will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l cpu -s usage -w 60 -c 80
RHEV OK: cpu ok - 11% used (rhevh) |cpu=11%;60;80;0; cpu.current.user=7;cpu.current.system=4;cpu.current.idle=89;
Get the percentage of CPU usage for Kernel SamePage Merging with:
-l ksm [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l ksm -w 60 -c 80
RHEV OK: ksm.cpu.current ok - 3% used (rhevh) |ksm.cpu.current=3%;60;80;0;
Monitor the memory usage in percentage with:
-l memory -s mem [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.
Hint: If you don't specify a subcheck with -s, the memory usage of this host will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l memory -s mem -w 60 -c 80
RHEV CRITICAL: memory critical - 80.11% used (rhevh) |memory=80.11%;60;80;0; memory.cached=0;memory.used=9485712097.28;memory.buffers=0;
Get the swap space usage:
-l memory -s swap [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l memory -s swap
RHEV OK: swap ok - 11.71% used (rhevh) |swap=11.71%;60;80;0;
Get the status of all network interfaces:
-l network -s status
Specify a network interface:
-l network -s status -n <nic>
Hint: If you don't specify a subcheck with -s, the nic status of this host will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s status
RHEV OK: Hosts ok - 2/2 Nics with state Active|nics=2;2;2;0;
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s status -n eth0
RHEV OK: Hosts ok - 1/1 Nics with state Active|nics=1;1;1;0;
Get the network traffic of all nics in Mbit/s:
-l network -s traffic [-w <warn>] [-c <crit>]
Specify a network interface:
-l network -s traffic -n <nic> [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 500 (warning) and 700 (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s traffic -w 2048 -c 3062
RHEV OK: traffic ok - eth0: 0 Mbit/s eth1: 0 Mbit/s |traffic_eth0=0MB;62.5;87.5;0; traffic_eth1=0MB;62.5;87.5;0;
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s traffic -w 2048 -c 3062 -n eth0
RHEV OK: traffic ok - eth0: 0 Mbit/s |traffic_eth0=0MB;62.5;87.5;0;
Get the network errors of all nics:
-l network -s errors [-w <warn>] [-c <crit>]
Specify a network interface:
-l network -s errors -n <nic> [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -R rhevh -l network -s errors -w 5 -c 10 -n eth0
RHEV OK: errors ok - eth0: 0 Errors (rhevh) |errors_eth0=0c;5;10;0;
Storagedomain checks are executed generally in the following way:
$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -S <storagedomain> [-l <check>]
You can monitor more then one storagedomain with option -S:
For example you have 3 storagedomains with the following names:
- isos
- exports
- iscsi01
Depending on your -S argument you can monitor a specific, multiple or all storagedomains:
- -S * : monitor all storagedomains (isos, exports, iscsi01)
- -S is* : monitor all is* storagedomains (isos and iscsi01)
- -S iscsi01: monitor only iscsi01 storagedomain
Monitor used disk space of storagedomains:
-l usage
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -S isos -l usage -w 60 -c 80
RHEV OK: storage ok - 39.78% used (isos) |storage_isos=39.78%;60;80;0;
VM checks are executed generally in the following way:
$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -M <VM> [-l <check>] [-s <subcheck>]
You can monitor more then one VM with option -M:
For example you have 3 vms with the following names:
- rhev-test01
- rhev-prod01
- rhev-prod02
Depending on your -M argument you can monitor a specific, multiple or all VMs:
- -M * : monitor all VMs (rhev-test01, rhev-prod02 and rhev-prod03)
- -M rhev-prod* : monitor all rhev-prod* VMs (rhev-prod01 and rhev-prod02)
- -M rhev-prod01: monitor only rhev-prod01 VM
Get the status of your virtual machine(s) with the following option:
-l status
Hint: If you don't specify a check with -l, the status of this VM will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l status
RHEV OK: Vms ok - 1/1 Vms with state UP|Vms=1;1;1;0;
Monitor CPU utilization with:
-l cpu [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l cpu -w 60 -c 80
RHEV OK: cpu ok - 17% used (vm) |cpu=17%;60;80;0; cpu.current.guest=17;cpu.current.hypervisor=0;
Monitor Memory utilization with:
-l memory
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l memory -w 60 -c 80
RHEV CRITICAL: memory critical - 82.00% used (vm) |memory=82.00%;60;80;0;
Get network traffic in MBit/s with:
-l network -s traffic
Specify a nic with:
-l network -s traffic -n <nic>
You can optional specifiy a warning and critical value. If you don't specify it, a default value of 500 (warning) and 700 (critical) is used.
Hint: If you don't specify a subcheck with -s, the traffic of all nics will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l network -s traffic -w 900 -c 1000
RHEV OK: traffic ok - nic1: 0 Mbit/s (vm) |traffic_nic1=0MB;62.5;87.5;0;
Get the network errors of all nics:
-l network -s errors [-w <warn>] [-c <crit>]
Specify a network interface:
-l network -s errors -n <nic> [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -M vm -l network -s errors -w 5 -c 10
RHEV OK: errors ok - nic1: 0 Errors (vm) |errors_nic1=0c;5;10;0;
VM Pool checks are executed generally in the following way:
$ check_rhev3 -H <RHEV-Manager> -a <User>@<Domain>:<Password> -P <VM-Pool> [-l <check>]
You can monitor more then one VM Pool with option -P:
For example you have 3 VM Pools with the following names:
- rhev-test01
- rhev-prod01
- rhev-prod02
Depending on your -P argument you can monitor a specific, multiple or all VM Pools:
- -P * : monitor all VM Pools (rhev-test01, rhev-prod02 and rhev-prod03)
- -P rhev-prod* : monitor all rhev-prod* VM Pools (rhev-prod01 and rhev-prod02)
- -P rhev-prod01: monitor only rhev-prod01 VM Pool
Get the number of running virtual machine(s) of this VM Pool(s):
-l usage [-w <warn>] [-c <crit>]
You can optional specifiy a warning and critical value for number of free VMs.
Hint: If you don't specify a check with -l, the usage of this VM Pool will be checked.
Example:
$ check_rhev3 -H rhevm -a admin@internal:password -P pool -l usage -w 1 -c 2
RHEV WARNING: VM Pool warning - 3/4 vms free|vmpool=1;1;2;0;
Now that you know how to use this plugin using the command line, here's a short description on how to integrate it into Icinga/Nagios.
For details see Icinga and Nagios documentation available on:
- http://docs.icinga.org/latest/en/hostchecks.html
- http://docs.icinga.org/latest/en/servicechecks.html
First of all, you have to define the check_rhev3 command:
define command{
command_name check_rhev3
command_line $USER1$/check_rhev3 -H $_RHEVM$ -a $ARG1$ $ARG2$
}
In this example we use a custom object variable
To monitor a RHEV Hypervisor you have to create a host:
define host{
use linux-server
host_name rhevh
alias RHEV Hypervisor
address 192.168.1.2
_rhevm 192.168.1.1
}
The hypervisor will be monitored through a REST-API call of the RHEV-Manager, so you Icinga/Nagios-server doesn't need access to the IP of your hypervisor, but must be able to connect to RHEV-Manger.
IP-Adress or hostname of RHEV-Manager is specified via custom object variable
Note: You have to set this variable for all hosts, monitored with this plugin!
An example for a VM would be:
define host{
use linux-server
host_name my-vm
alias My virtual machine
address 192.168.1.3
_rhevm 192.168.1.1
}
Again, we use
After defining hosts, you have to create services and assign these services to your hosts:
define service{
use generic-service
host_name rhevh
service_description RHEV CPU Check
check_command check_rhev3!admin@internal:password!-R $HOSTNAME$ -l cpu
}
In this example, a CPU check for RHEV-Hypervisor rhevh is defined.
Note that we use variable
Memory Check for your VM with warning and critical value:
define service{
use generic-service
host_name my-vm
service_description RHEV Memory Check
check_command check_rhev3!admin@internal:password!-R $HOSTNAME$ -l memory -s usage -w 70 -c 90
}
As for RHEV Hypervisor, make sure that host_name matches vm name in RHEV!