Moved Healthcheck to its own directory

2022-04-06 19:26:56 +02:00
parent f0b83c8687
commit 73cf1e3984
6 changed files with 66 additions and 55 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1 @@
-healthcheck.cfg
+*.cfg
--- a/README.md
+++ b/README.md
@@ -1,67 +1,17 @@
 # Selfhost utilities
 A collection of utilities for self hosters.
+Every utility is in a folder with its relevant configuration and is completely separated from the other, so you can install only the ones you need.

 ## HEALTHCHECK
 A simple server health check.
-Sends an email and/or executes a command in case of alarm.
+Sends an email and/or executes a command in case of alarm (high temperature, RAID disk failed etc...).
 As an example, the command may be a ntfy call to obtain a notification on a mobile phone or desktop computer.
 Meant to be run with a cron (see healthcheck.cron.example).
 Tested on Debian 11, but should run on almost any standard linux box.

 ![Email](images/healthcheck_email_notification.png)      ![Ntfy](images/healthcheck_ntfy_notification.png)

-### Alarms
-Provided ready-to-use alarms in config file:
- cpu load
- disk space
- raid status
- battery level / charger status (for laptops used as servers, apparently common among the self hosters)
- memory status
-
-Alarms that need basic configuration to work on your system:
- cpu temperature (needs to be adapted as every system has a different name for the sensor)
- fan speed (needs to be adapted as every system has a different name for the sensor)
-
-... or you can write your own custom alarm!
-
-### How does it work
-The config file contains a list of checks. The most common checks are provided in the config file, but it is possible to configure custom checks, if needed.
-Every check definition has:
- DISABLED: boolean, wether to run the check
- ALARM_VALUE_MORE_THAN: float, the alarm is issued if detected value exceeds the configured one
- ALARM_VALUE_LESS_THAN: float, the alarm is issued if detected value is less than the configured one
- ALARM_VALUE_EQUAL: float, the alarm is issued if detected value is equal to the configured one (the values are always compared as floats)
- ALARM_VALUE_NOT_EQUAL: float, the alarm is issued if detected value is not equal to the configured one (the values are always compared as floats)
- ALARM_STRING_EQUAL: string, the alarm is issued if detected value is equal to the configured one (the values are always compared as strings)
- ALARM_STRING_NOT_EQUAL: string, the alarm is issued if detected value is not equal to the configured one (the values are always compared as strings)
- COMMAND: the command to run to obtain the value
- REGEXP: a regular expression that will be executed on the command output and returns a single group that will be compared with ALARM_*. If omitted, the complete command output will be used for comparation.
-
-### Installation
-Copy the script and the config file into the system to check:
-```
-cp healthcheck.py /usr/local/bin/healthcheck.py
-cp healthcheck.cfg.example /usr/local/etc/healthcheck.cfg
-```
-Edit `/usr/local/etc/healthcheck.cfg` enabling the checks you need and configuring email settings.
-Run `/usr/local/bin/healthcheck.py /usr/local/etc/healthcheck.cfg` to check it is working. If needed, change the config to make a check fail and see if the notification mail is delivered. If you need to do some testing without spamming emails, run with the parameter `--dry-run`.
-Now copy the cron file:
-```
-cp healthcheck.cron.example /etc/cron.d/healthcheck
-```
-For increased safety, edit the cron file placing your email address in MAILTO var to be notified in case of healthcheck.py catastrophic failure.
-
-Setup is now complete: the cron runs the script every minute and you will receive emails in case of failed checks.
-
-### Useful notes
-#### Note on system load averages**:
-As stated in the `uptime` command manual:
-> System load averages is the average number of processes that are either in a runnable or uninterruptable state.  A process in a runnable state is either using the CPU  or  waiting  to  use the CPU.  A process in uninterruptable state is waiting for some I/O access, eg waiting for disk.  The averages are taken over the three time intervals.  Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system  is  loaded  all  the  time while on a 4 CPU system it means it was idle 75% of the time.
-
-#### Note on temperature and fan speed checks:
-The check to run needs lm-sensors to be installed and configured. Check your distribution install guide.
-The sensors have different name in every system, so you WILL need to adapt the configuration.
-Some systems have a single temperature sensors for the whole CPU, while some other has a sensor for every core. In this last case, you may want to copy the `[cpu_temperature]` config in N different configs like `[cpu_temperature_0]`, one for every core, and change the REGEX to match `Core 0`, `Core 1` and so on...
+Please see [healthcheck documentation](healthcheck/README.md)

 # License
 This whole repository is released under GNU General Public License version 3: see http://www.gnu.org/licenses/
--- a/healthcheck/README.md
+++ b/healthcheck/README.md
@@ -0,0 +1,61 @@
+# HEALTHCHECK
+A simple server health check.
+Sends an email and/or executes a command in case of alarm.
+As an example, the command may be a ntfy call to obtain a notification on a mobile phone or desktop computer.
+Meant to be run with a cron (see healthcheck.cron.example).
+Tested on Debian 11, but should run on almost any standard linux box.
+
+![Email](../images/healthcheck_email_notification.png)      ![Ntfy](../images/healthcheck_ntfy_notification.png)
+
+## Alarms
+Provided ready-to-use alarms in config file:
+- cpu load
+- disk space
+- raid status
+- battery level / charger status (for laptops used as servers, apparently common among the self hosters)
+- memory status
+
+Alarms that need basic configuration to work on your system:
+- cpu temperature (needs to be adapted as every system has a different name for the sensor)
+- fan speed (needs to be adapted as every system has a different name for the sensor)
+
+... or you can write your own custom alarm!
+
+## How does it work
+The config file contains a list of checks. The most common checks are provided in the config file, but it is possible to configure custom checks, if needed.
+Every check definition has:
+- DISABLED: boolean, wether to run the check
+- ALARM_VALUE_MORE_THAN: float, the alarm is issued if detected value exceeds the configured one
+- ALARM_VALUE_LESS_THAN: float, the alarm is issued if detected value is less than the configured one
+- ALARM_VALUE_EQUAL: float, the alarm is issued if detected value is equal to the configured one (the values are always compared as floats)
+- ALARM_VALUE_NOT_EQUAL: float, the alarm is issued if detected value is not equal to the configured one (the values are always compared as floats)
+- ALARM_STRING_EQUAL: string, the alarm is issued if detected value is equal to the configured one (the values are always compared as strings)
+- ALARM_STRING_NOT_EQUAL: string, the alarm is issued if detected value is not equal to the configured one (the values are always compared as strings)
+- COMMAND: the command to run to obtain the value
+- REGEXP: a regular expression that will be executed on the command output and returns a single group that will be compared with ALARM_*. If omitted, the complete command output will be used for comparation.
+
+## Installation
+Copy the script and the config file into the system to check:
+```
+cp healthcheck.py /usr/local/bin/healthcheck.py
+cp healthcheck.cfg.example /usr/local/etc/healthcheck.cfg
+```
+Edit `/usr/local/etc/healthcheck.cfg` enabling the checks you need and configuring email settings.
+Run `/usr/local/bin/healthcheck.py /usr/local/etc/healthcheck.cfg` to check it is working. If needed, change the config to make a check fail and see if the notification mail is delivered. If you need to do some testing without spamming emails, run with the parameter `--dry-run`.
+Now copy the cron file:
+```
+cp healthcheck.cron.example /etc/cron.d/healthcheck
+```
+For increased safety, edit the cron file placing your email address in MAILTO var to be notified in case of healthcheck.py catastrophic failure.
+
+Setup is now complete: the cron runs the script every minute and you will receive emails in case of failed checks.
+
+## Useful notes
+### Note on system load averages**:
+As stated in the `uptime` command manual:
+> System load averages is the average number of processes that are either in a runnable or uninterruptable state.  A process in a runnable state is either using the CPU  or  waiting  to  use the CPU.  A process in uninterruptable state is waiting for some I/O access, eg waiting for disk.  The averages are taken over the three time intervals.  Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system  is  loaded  all  the  time while on a 4 CPU system it means it was idle 75% of the time.
+
+### Note on temperature and fan speed checks:
+The check to run needs lm-sensors to be installed and configured. Check your distribution install guide.
+The sensors have different name in every system, so you WILL need to adapt the configuration.
+Some systems have a single temperature sensors for the whole CPU, while some other has a sensor for every core. In this last case, you may want to copy the `[cpu_temperature]` config in N different configs like `[cpu_temperature_0]`, one for every core, and change the REGEX to match `Core 0`, `Core 1` and so on...
--- a/healthcheck/healthcheck.cfg.example
+++ b/healthcheck/healthcheck.cfg.example
--- a/healthcheck/healthcheck.cron.example
+++ b/healthcheck/healthcheck.cron.example
--- a/healthcheck/healthcheck.py
+++ b/healthcheck/healthcheck.py
@@ -76,7 +76,7 @@ class Main:
 		self.hostname = os.uname()[1]

 	def run(self, dryRun):
-		''' Runs the healtg checks '''
+		''' Runs the health checks '''

 		for section in self.config:
 			if section == 'DEFAULT':