Monitoring
NRPE Listener Configuration¶
EPAS supports remote monitoring from centralized monitoring solutions implementing the Nagios monitoring suite and its protocol. The monitoring is performed using the Nagios Remote Plugin Executor (NRPE) functionality. The configuration of the NRPE listener can be enabled and defined by navigating to the System → Log Data page, in the Nagios monitoring section. Several checks are performed and reported by EPAS, for the EPAS MASTER, WORKER(s) and AGENT(s):
- System status
- Temperature readings
- Fan readings
- Disk space status
- License information
- All events reported return the OK, WARNING, CRITICAL or UNKNOWN status levels, depending on the result of the operations.
For enabling the Nagios monitoring, use the following steps:
- Tick the checkbox for enabling the feature.
- Enter the listening port (TCP) on which the NRPE daemon should listen on the EPAS appliance. The default port is
5666. - Enter a comma delimited (no whitespace) IP address list, corresponding to all IP addresses which are allowed to access the NRPE service on the EPAS MASTER.
- Click Update Nagios settings to save the settings and enable the EPAS Nagios listener.

NRPE Commands¶
After the above configuration is performed, the EPAS MASTER exposes five (5) NRPE commands on the defined port:
- check_systems
- check_disks
- check_fans
- check_temps
- check_license
| Command | Output | Return Values |
|---|---|---|
| check_systems | For each EPAS system: - Serial number - System type - System status Note: Agent systems report offline status only when used with the Enforcer functionality. Example:: EPAS SYSTEM CHECK - OK || 4163-8053-5384-65A2=WORKER/ONLINE| 17AA-E9B8-52FD-B51A=AGENT/ONLINE| |
CRITICAL: One or more systems are reported offline (shutdown, rebooting or non-responsive). WARNING: System data could not be retrieved due to operational errors. OK: All EPAS systems are online. UNKNOWN: NRPE EPAS functionality encountered an unexpected error, please contact support. |
| check_disks | For each EPAS system and for each hard disk: - Serial number - Disk letter - Total space - Free space - Smart status Example: EPAS DISK CHECK - OK || 6455-0486-47CA-08A2=A/220G/176G/OK| 6455-0486-47CA-08A2=B/1800G/1600G/OK| 6455-0486-47CA-08A2=C/1800G/1600G/OK| 9F8C-9F3F-C870-7678=A/235G/219G/OK| 9F8C-9F3F-C870-7678=B/1800G/1600G/OK| |
CRITICAL: One or more disks have reported high failure probability (SMART) or are critically low on disk-space (less than 20GB). WARNING: One or more disks are reporting low disk space (less than 50GB). Alternatively, system data could not be retrieved due to operational errors. OK: All EPAS disks are within operational parameters. UNKNOWN: NRPE EPAS functionality encountered an unexpected error, please contact support. |
| check_license | For the EPAS MASTER, it reports the expiration date of the current license, and the number of days until it expires. Example: EPAS LICENSE CHECK - OK LICENSE=2019-08-31/136 |
CRITICAL: License will expire in less than three days or has already expired. WARNING: License will expire in less than 7 days. OK: License is operational. UNKNOWN: NRPE EPAS functionality encountered an unexpected error, please contact support. |
| check_fans | For each EPAS system and each installed airflow sensor: - Serial number - Sensor number - Rotations per minute Example: EPAS FANS CHECK - OK || 4163-8053-5384-65A2=1/7560/OK| 4163-8053-5384-65A2=2/7560/OK| 4163-8053-5384-65A2=3/7560/OK| 4163-8053-5384-65A2=4/7560/OK| 4163-8053-5384-65A2=5/7560/OK| 4163-8053-5384-65A2=6/7560/OK| 4163-8053-5384-65A2=7/7560/OK| 4163-8053-5384-65A2=8/7560/OK| 17AA-E9B8-52FD-B51A=1/5550/OK| 17AA-E9B8-52FD-B51A=2/5475/OK| |
CRITICAL: One or more sensors is reporting no fan rotations. WARNING: One or more sensors reports low sensor readings, inadequate for efficient airflow; one or more sensors is reporting errors. Alternatively, system data could not be retrieved due to operational errors. OK: All EPAS airflow sensors are within operational parameters. UNKNOWN: NRPE EPAS functionality encountered an unexpected error, please contact support. |
| check_temps | For each EPAS system and each installed temperature sensor: - Serial number - Sensor number - Sensor type (CPU, GPU, SYS, RAM) - Temperature (Celsius) Example: EPAS TEMP CHECK - OK || 4163-8053-5384-65A2=1/CPU/22/| 4163-8053-5384-65A2=2/CPU/20/| 4163-8053-5384-65A2=3/CPU/20/| 4163-8053-5384-65A2=4/CPU/23/| 4163-8053-5384-65A2=5/CPU/25/| 4163-8053-5384-65A2=6/CPU/28/| 4163-8053-5384-65A2=7/CPU/25/| 4163-8053-5384-65A2=8/CPU/28/| 4163-8053-5384-65A2=9/GPU/39/| 4163-8053-5384-65A2=10/GPU/35/| 4163-8053-5384-65A2=11/RAM/26/| 4163-8053-5384-65A2=12/RAM/26/| 4163-8053-5384-65A2=13/RAM/30/| 4163-8053-5384-65A2=14/RAM/31/| 4163-8053-5384-65A2=15/SYS/23/| |
CRITICAL: One or more sensors is reporting critical temperatures (>95°C); EPAS audit operations will be prevented from starting. WARNING: One or more sensors is reporting high temperatures (>85°C) - if no audit operations are running, check system airflow. Alternatively, system data could not be retrieved due to operational errors. OK: All EPAS temperature sensors are within operational parameters. UNKNOWN: NRPE EPAS functionality encountered an unexpected error, please contact support. |
All NRPE messages contain additional information about the error/warning/condition encountered, if different from the normal state (OK). If the message size exceeds 1024 characters, the NRPE plugin will recommend checking the current state of the system(s) using the EPAS management interface page (System → EPAS Systems).
Server Configuration & Requirements¶
Before proceeding to configure the Nagios server collecting the monitoring data from the EPAS MASTER, make sure the nagios-plugins package is installed and / or the check_nrpe binary is present in the Nagios plugin folder(s).
The current guide assumes that a host has already been defined, corresponding to the EPAS MASTER appliance, in the default hosts configuration file (hosts.cfg). The host configuration should contain the IP address, hostname, check interval, check period, notification statuses as well as the default availability check command (e.g. ping).
In the commands configuration file (commands.cfg), a Nagios command should be configured using the following data, replacing the <path_to_check_nrpe_binary> with the full, absolute path, to the check_nrpe binary:
define command {
command_name check_nrpe
command_line <path_to_check_nrpe_binary> -H $HOSTADDRESS$ -c $ARG1$
}
The commands can be executed by using the following service template, replacing <check_command> with one of the commands (check_systems, check_disks, check_fans, check_temps), and <epas_host> with the host_name value of the EPAS MASTER appliance, in the hosts.cfg configuration file:
define service{
use generic-service
host_name <epas_host>
service_description EPAS <check_command>
check_command check_nrpe!<check_command>
}
After the above settings have been done in the services.cfg configuration file, the Nagios service should be restarted in order to reload the changes. As Nagios has a flexible configuration, the above recommendations should be adapted to fit the custom Nagios deployment. Icinga or other Nagios-based monitoring tools have similar way of adding NRPE commands and are documented by solution provider.