Implemented alarm command
This commit is contained in:
parent
8c786d6190
commit
c0b115559d
22
README.md
22
README.md
@ -3,20 +3,27 @@ A collection of utilities for self hosters.
|
|||||||
|
|
||||||
## HEALTHCHECK
|
## HEALTHCHECK
|
||||||
A simple server health check.
|
A simple server health check.
|
||||||
Sends an email in case of alarm.
|
Sends an email and/or executes a command in case of alarm.
|
||||||
Meant to be run with a cron (see healthcheck.cron.example)
|
As an example, the command may be a ntfy call to obtain a notification on a mobile phone or desktop computer.
|
||||||
Tested on Debian 11, but should run on almost any standard linux box
|
Meant to be run with a cron (see healthcheck.cron.example).
|
||||||
|
Tested on Debian 11, but should run on almost any standard linux box.
|
||||||
|
|
||||||
|
![Email](images/healthcheck_email_notification.png) ![Ntfy](images/healthcheck_ntfy_notification.png)
|
||||||
|
|
||||||
### Alarms
|
### Alarms
|
||||||
Provided ready-to-use alarms in config file:
|
Provided ready-to-use alarms in config file:
|
||||||
- system load
|
- cpu load
|
||||||
- disk space
|
- disk space
|
||||||
- raid status
|
- raid status
|
||||||
- battery level / charger status (for laptops used as servers, apparently common among the self hosters)
|
- battery level / charger status (for laptops used as servers, apparently common among the self hosters)
|
||||||
- memory status
|
- memory status
|
||||||
|
|
||||||
|
Alarms that need basic configuration to work on your system:
|
||||||
- cpu temperature (needs to be adapted as every system has a different name for the sensor)
|
- cpu temperature (needs to be adapted as every system has a different name for the sensor)
|
||||||
- fan speed (needs to be adapted as every system has a different name for the sensor)
|
- fan speed (needs to be adapted as every system has a different name for the sensor)
|
||||||
|
|
||||||
|
... or you can write your own custom alarm!
|
||||||
|
|
||||||
### How does it work
|
### How does it work
|
||||||
The config file contains a list of checks. The most common checks are provided in the config file, but it is possible to configure custom checks, if needed.
|
The config file contains a list of checks. The most common checks are provided in the config file, but it is possible to configure custom checks, if needed.
|
||||||
Every check definition has:
|
Every check definition has:
|
||||||
@ -37,7 +44,7 @@ cp healthcheck.py /usr/local/bin/healthcheck.py
|
|||||||
cp healthcheck.cfg.example /usr/local/etc/healthcheck.cfg
|
cp healthcheck.cfg.example /usr/local/etc/healthcheck.cfg
|
||||||
```
|
```
|
||||||
Edit `/usr/local/etc/healthcheck.cfg` enabling the checks you need and configuring email settings.
|
Edit `/usr/local/etc/healthcheck.cfg` enabling the checks you need and configuring email settings.
|
||||||
Run `/usr/local/bin/healthcheck.py /usr/local/etc/healthcheck.cfg` to check it is working. If needed, change the config to make a check fail and see if the notification mail is delivered.
|
Run `/usr/local/bin/healthcheck.py /usr/local/etc/healthcheck.cfg` to check it is working. If needed, change the config to make a check fail and see if the notification mail is delivered. If you need to do some testing without spamming emails, run with the parameter `--dry-run`.
|
||||||
Now copy the cron file:
|
Now copy the cron file:
|
||||||
```
|
```
|
||||||
cp healthcheck.cron.example /etc/cron.d/healthcheck
|
cp healthcheck.cron.example /etc/cron.d/healthcheck
|
||||||
@ -54,4 +61,7 @@ As stated in the `uptime` command manual:
|
|||||||
#### Note on temperature and fan speed checks:
|
#### Note on temperature and fan speed checks:
|
||||||
The check to run needs lm-sensors to be installed and configured. Check your distribution install guide.
|
The check to run needs lm-sensors to be installed and configured. Check your distribution install guide.
|
||||||
The sensors have different name in every system, so you WILL need to adapt the configuration.
|
The sensors have different name in every system, so you WILL need to adapt the configuration.
|
||||||
|
Some systems have a single temperature sensors for the whole CPU, while some other has a sensor for every core. In this last case, you may want to copy the `[cpu_temperature]` config in N different configs like `[cpu_temperature_0]`, one for every core, and change the REGEX to match `Core 0`, `Core 1` and so on...
|
||||||
|
|
||||||
|
# License
|
||||||
|
This whole repository is released under GNU General Public License version 3: see http://www.gnu.org/licenses/
|
||||||
|
@ -1,12 +1,13 @@
|
|||||||
[DEFAULT]
|
[DEFAULT]
|
||||||
|
|
||||||
#### EMAIL NOTIFICATIONS
|
#### EMAIL NOTIFICATIONS ####
|
||||||
|
|
||||||
# Notify this email address(es) in case of alarm, multiple addresses separated by commas
|
# Notify this email address(es) in case of alarm, multiple addresses separated by commas
|
||||||
|
# Comment this if you don't want email to be sent (maybe because using ALARM_COMMAND below)
|
||||||
MAILTO=root@localhost, user@localhost
|
MAILTO=root@localhost, user@localhost
|
||||||
|
|
||||||
# Sender address
|
# Sender address
|
||||||
MAILFROM=root@localhost
|
#MAILFROM=root@localhost
|
||||||
|
|
||||||
# Use a remote SMTP host (enable by removing comment)
|
# Use a remote SMTP host (enable by removing comment)
|
||||||
#SMTPHOST=my.smtp.host:465
|
#SMTPHOST=my.smtp.host:465
|
||||||
@ -19,6 +20,24 @@ MAILFROM=root@localhost
|
|||||||
#SMTPSSL=True
|
#SMTPSSL=True
|
||||||
|
|
||||||
|
|
||||||
|
#### RUN COMMAND IN CASE OF ALARM ####
|
||||||
|
# You can run a command or script when an alert is issued.
|
||||||
|
#
|
||||||
|
# In this example, `curl` is used to send a POST request to Ntfy (https://ntfy.sh/), a service
|
||||||
|
# that delivers push notifications to smartphones and desktop computers.
|
||||||
|
# If you want to use ntfy, just change the topic name with something unique (see documentation
|
||||||
|
# at https://ntfy.sh/docs/ ), uncomment the ALARM_COMMAND entry and you are ready to go.
|
||||||
|
# If you generate a lot of traffic, please consider hosting your own ntfy server.
|
||||||
|
#
|
||||||
|
# Otherwise, you can replace the curl command with anything you want, you can use the following
|
||||||
|
# placeholders to pass your command/script the details about the event:
|
||||||
|
# %%CHECKNAME%% The name of the check (the one between square brackets in this config)
|
||||||
|
# %%HOSTNAME%% The host name
|
||||||
|
# %%DATETIME%% The date and time of the event, in human readable format
|
||||||
|
# %%ERROR%% An human readable error description (the same used in the mail alert)
|
||||||
|
|
||||||
|
#ALARM_COMMAND=curl -H "%%CHECKNAME%% alarm on %%HOSTNAME%%" -d "%%ERROR%% on %%DATETIME%%" ntfy.sh/my-unique-topic-name
|
||||||
|
|
||||||
|
|
||||||
#### HEALTH CHECKS ####
|
#### HEALTH CHECKS ####
|
||||||
# Every health check is based on a command being executed, its result being parsed with a regexp
|
# Every health check is based on a command being executed, its result being parsed with a regexp
|
||||||
@ -91,7 +110,7 @@ ALARM_VALUE_LESS_THAN=90
|
|||||||
DISABLED=True
|
DISABLED=True
|
||||||
COMMAND=acpi -a
|
COMMAND=acpi -a
|
||||||
REGEXP=Adapter \d: (.+)
|
REGEXP=Adapter \d: (.+)
|
||||||
ALARM_STRING_NOT_EQUAL=on-line
|
ALARM_STRING_EQUAL=off-line
|
||||||
|
|
||||||
[free_ram]
|
[free_ram]
|
||||||
# Free ram in %
|
# Free ram in %
|
||||||
|
@ -73,6 +73,7 @@ class Main:
|
|||||||
|
|
||||||
self.config = configparser.ConfigParser(interpolation=None) # Disable interpolation because contains regexp
|
self.config = configparser.ConfigParser(interpolation=None) # Disable interpolation because contains regexp
|
||||||
self.config.read(configPath)
|
self.config.read(configPath)
|
||||||
|
self.hostname = os.uname()[1]
|
||||||
|
|
||||||
def run(self, dryRun):
|
def run(self, dryRun):
|
||||||
''' Runs the healtg checks '''
|
''' Runs the healtg checks '''
|
||||||
@ -93,7 +94,10 @@ class Main:
|
|||||||
# Alarm!
|
# Alarm!
|
||||||
logging.warning('Alarm for {}: {}!'.format(section, error))
|
logging.warning('Alarm for {}: {}!'.format(section, error))
|
||||||
if not dryRun:
|
if not dryRun:
|
||||||
self.sendMail(s, error)
|
if s.mailto:
|
||||||
|
self.sendMail(s, error)
|
||||||
|
if s.alarmCommand:
|
||||||
|
self.executeAlarmCommand(s, error)
|
||||||
|
|
||||||
# Calls the provided command, checks the value parsing it with the provided regexp
|
# Calls the provided command, checks the value parsing it with the provided regexp
|
||||||
# and returns an error string, or null if the value is within its limits
|
# and returns an error string, or null if the value is within its limits
|
||||||
@ -138,18 +142,16 @@ class Main:
|
|||||||
return 'value is {}, but should not exceed {}'.format(locale.atof(detectedValue), config.alarm_value_more_than)
|
return 'value is {}, but should not exceed {}'.format(locale.atof(detectedValue), config.alarm_value_more_than)
|
||||||
if config.alarm_value_less_than and locale.atof(detectedValue) < float(config.alarm_value_less_than):
|
if config.alarm_value_less_than and locale.atof(detectedValue) < float(config.alarm_value_less_than):
|
||||||
return 'value is {}, but should be greater than {}'.format(locale.atof(detectedValue), config.alarm_value_less_than)
|
return 'value is {}, but should be greater than {}'.format(locale.atof(detectedValue), config.alarm_value_less_than)
|
||||||
|
|
||||||
|
|
||||||
def sendMail(self, s, error):
|
def sendMail(self, s, error):
|
||||||
if s.smtphost:
|
if s.smtphost:
|
||||||
logging.info("Sending detailed logs to %s via %s", s.mailto, s.smtphost)
|
logging.info("Sending alarm email to %s via %s", s.mailto, s.smtphost)
|
||||||
else:
|
else:
|
||||||
logging.info("Sending detailed logs to %s using local smtp", s.mailto)
|
logging.info("Sending alarm email to %s using local smtp", s.mailto)
|
||||||
|
|
||||||
# Create main message
|
# Create main message
|
||||||
hostname = os.uname()[1]
|
|
||||||
msg = MIMEMultipart()
|
msg = MIMEMultipart()
|
||||||
msg['Subject'] = EMAIL_SUBJECT_TPL.format(hostname, s.name)
|
msg['Subject'] = EMAIL_SUBJECT_TPL.format(self.hostname, s.name)
|
||||||
if s.mailfrom:
|
if s.mailfrom:
|
||||||
m_from = s.mailfrom
|
m_from = s.mailfrom
|
||||||
else:
|
else:
|
||||||
@ -161,7 +163,7 @@ class Main:
|
|||||||
# Add base text
|
# Add base text
|
||||||
body = EMAIL_MESSAGE_TPL.format(
|
body = EMAIL_MESSAGE_TPL.format(
|
||||||
s.name,
|
s.name,
|
||||||
hostname,
|
self.hostname,
|
||||||
time.strftime("%a, %d %b %Y %H:%M:%S"),
|
time.strftime("%a, %d %b %Y %H:%M:%S"),
|
||||||
error
|
error
|
||||||
)
|
)
|
||||||
@ -183,6 +185,25 @@ class Main:
|
|||||||
smtp.sendmail(m_from, s.mailto, msg.as_string())
|
smtp.sendmail(m_from, s.mailto, msg.as_string())
|
||||||
smtp.quit()
|
smtp.quit()
|
||||||
|
|
||||||
|
def executeAlarmCommand(self, s, error):
|
||||||
|
cmdToRun = s.alarmCommand
|
||||||
|
cmdToRun = cmdToRun.replace('%%CHECKNAME%%', s.name)
|
||||||
|
cmdToRun = cmdToRun.replace('%%HOSTNAME%%', self.hostname)
|
||||||
|
cmdToRun = cmdToRun.replace('%%DATETIME%%', time.strftime("%a, %d %b %Y %H:%M:%S"))
|
||||||
|
cmdToRun = cmdToRun.replace('%%ERROR%%', error)
|
||||||
|
|
||||||
|
logging.debug("Executing alarm command %s", cmdToRun)
|
||||||
|
|
||||||
|
ret = subprocess.run(cmdToRun, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
|
||||||
|
if ret.stderr:
|
||||||
|
self._log.info('{} subprocess stderr:\n{}', cmdToRun, ret.stderr.decode())
|
||||||
|
if ret.stdout:
|
||||||
|
stdout = ret.stdout.decode()
|
||||||
|
self._log.debug('{} subprocess stdout:\n{}', cmdToRun, stdout)
|
||||||
|
if ret.returncode != 0:
|
||||||
|
self._log.error('subprocess {} exited with error code {}'.format(cmdToRun, ret.returncode))
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
class Settings:
|
class Settings:
|
||||||
''' Represents settings for a check '''
|
''' Represents settings for a check '''
|
||||||
@ -203,9 +224,14 @@ class Settings:
|
|||||||
self.smtpuser = self.getStr(name, 'SMTPUSER', None)
|
self.smtpuser = self.getStr(name, 'SMTPUSER', None)
|
||||||
self.smtppass = self.getStr(name, 'SMTPPASS', None)
|
self.smtppass = self.getStr(name, 'SMTPPASS', None)
|
||||||
self.smtpssl = self.getBoolean(name, 'SMTPSSL', False)
|
self.smtpssl = self.getBoolean(name, 'SMTPSSL', False)
|
||||||
## List of email address to notify about backup status (mandatory)
|
## List of email address to notify in case of alarms (disabled if missing)
|
||||||
mailtoList = config.get(name, 'MAILTO')
|
mailtoList = self.getStr(name, 'MAILTO', None)
|
||||||
self.mailto = [ x.strip() for x in mailtoList.strip().split(self.EMAIL_LIST_SEP) ]
|
if mailtoList:
|
||||||
|
self.mailto = [ x.strip() for x in mailtoList.strip().split(self.EMAIL_LIST_SEP) ]
|
||||||
|
else:
|
||||||
|
self.mailto = None
|
||||||
|
## Command to execute in case of alarms (disabled if missing)
|
||||||
|
self.alarmCommand = self.getStr(name, 'ALARM_COMMAND', None)
|
||||||
## Sender address for the notification email
|
## Sender address for the notification email
|
||||||
self.mailfrom = self.getStr(name, 'MAILFROM', getpass.getuser()+'@'+socket.gethostname())
|
self.mailfrom = self.getStr(name, 'MAILFROM', getpass.getuser()+'@'+socket.gethostname())
|
||||||
## Values to compare
|
## Values to compare
|
||||||
@ -233,6 +259,7 @@ class Settings:
|
|||||||
return defaultValue
|
return defaultValue
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
import argparse
|
import argparse
|
||||||
|
|
||||||
|
BIN
images/healthcheck_email_notification.png
Normal file
BIN
images/healthcheck_email_notification.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 57 KiB |
BIN
images/healthcheck_ntfy_notification.png
Normal file
BIN
images/healthcheck_ntfy_notification.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 56 KiB |
Loading…
Reference in New Issue
Block a user