2 Commits
v0.1 ... devel

9 changed files with 281 additions and 55 deletions

2
.gitignore vendored
View File

@ -1 +1 @@
healthcheck.cfg
*.cfg

View File

@ -1,67 +1,17 @@
# Selfhost utilities
A collection of utilities for self hosters.
Every utility is in a folder with its relevant configuration and is completely separated from the other, so you can install only the ones you need.
## HEALTHCHECK
A simple server health check.
Sends an email and/or executes a command in case of alarm.
Sends an email and/or executes a command in case of alarm (high temperature, RAID disk failed etc...).
As an example, the command may be a ntfy call to obtain a notification on a mobile phone or desktop computer.
Meant to be run with a cron (see healthcheck.cron.example).
Tested on Debian 11, but should run on almost any standard linux box.
![Email](images/healthcheck_email_notification.png) ![Ntfy](images/healthcheck_ntfy_notification.png)
### Alarms
Provided ready-to-use alarms in config file:
- cpu load
- disk space
- raid status
- battery level / charger status (for laptops used as servers, apparently common among the self hosters)
- memory status
Alarms that need basic configuration to work on your system:
- cpu temperature (needs to be adapted as every system has a different name for the sensor)
- fan speed (needs to be adapted as every system has a different name for the sensor)
... or you can write your own custom alarm!
### How does it work
The config file contains a list of checks. The most common checks are provided in the config file, but it is possible to configure custom checks, if needed.
Every check definition has:
- DISABLED: boolean, wether to run the check
- ALARM_VALUE_MORE_THAN: float, the alarm is issued if detected value exceeds the configured one
- ALARM_VALUE_LESS_THAN: float, the alarm is issued if detected value is less than the configured one
- ALARM_VALUE_EQUAL: float, the alarm is issued if detected value is equal to the configured one (the values are always compared as floats)
- ALARM_VALUE_NOT_EQUAL: float, the alarm is issued if detected value is not equal to the configured one (the values are always compared as floats)
- ALARM_STRING_EQUAL: string, the alarm is issued if detected value is equal to the configured one (the values are always compared as strings)
- ALARM_STRING_NOT_EQUAL: string, the alarm is issued if detected value is not equal to the configured one (the values are always compared as strings)
- COMMAND: the command to run to obtain the value
- REGEXP: a regular expression that will be executed on the command output and returns a single group that will be compared with ALARM_*. If omitted, the complete command output will be used for comparation.
### Installation
Copy the script and the config file into the system to check:
```
cp healthcheck.py /usr/local/bin/healthcheck.py
cp healthcheck.cfg.example /usr/local/etc/healthcheck.cfg
```
Edit `/usr/local/etc/healthcheck.cfg` enabling the checks you need and configuring email settings.
Run `/usr/local/bin/healthcheck.py /usr/local/etc/healthcheck.cfg` to check it is working. If needed, change the config to make a check fail and see if the notification mail is delivered. If you need to do some testing without spamming emails, run with the parameter `--dry-run`.
Now copy the cron file:
```
cp healthcheck.cron.example /etc/cron.d/healthcheck
```
For increased safety, edit the cron file placing your email address in MAILTO var to be notified in case of healthcheck.py catastrophic failure.
Setup is now complete: the cron runs the script every minute and you will receive emails in case of failed checks.
### Useful notes
#### Note on system load averages**:
As stated in the `uptime` command manual:
> System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
#### Note on temperature and fan speed checks:
The check to run needs lm-sensors to be installed and configured. Check your distribution install guide.
The sensors have different name in every system, so you WILL need to adapt the configuration.
Some systems have a single temperature sensors for the whole CPU, while some other has a sensor for every core. In this last case, you may want to copy the `[cpu_temperature]` config in N different configs like `[cpu_temperature_0]`, one for every core, and change the REGEX to match `Core 0`, `Core 1` and so on...
Please see [healthcheck documentation](healthcheck/README.md)
# License
This whole repository is released under GNU General Public License version 3: see http://www.gnu.org/licenses/

3
dashboard/README.md Normal file
View File

@ -0,0 +1,3 @@
# Dashboard
Allows using a tablet, smartphone, ebook reader or any other low-power internet-connected hardware as system monitor for an host on the same network.

View File

@ -0,0 +1,95 @@
[DEFAULT]
# The webpage will be available at http://this.host.ip.address:PORT
PORT=8080
# The webpage will be updated every REFRESH_SECONDS. Set to 0 to disable autorefresh.
REFRESH_SECONDS=1
#### SENSORS ####
# Every sensor value is obtained on a command being executed, its result being parsed with a regexp
# to extract (as a single group) the numeric or string value, and the value being used to plot the
# graph. This sensor definitions are ready to be used, just enable the ones you need.
# You can add your own declaring another section like this:
#
# [my_sensor]
# DISABLED=False
# COMMAND=/my/custom/binary --with parameters
# REGEXP=my regex to parse (awesome|disappointing) command output
# TYPE=TIMEGRAPH # May also be GRAPH or ERROR
[system_load_1min]
# The system load average in the last minute
DISABLED=True
COMMAND=uptime
REGEXP=.*load average: (\d+[,.]\d+), \d+[,.]\d+, \d+[,.]\d+
TYPE=TIMEGRAPH
[system_load_5min]
# The system load average in the last 5 minutes
DISABLED=True
COMMAND=uptime
REGEXP=.*load average: \d+[,.]\d+, (\d+[,.]\d+), \d+[,.]\d+
TYPE=TIMEGRAPH
[system_load_15min]
# The system load average in the last 15 minutes
DISABLED=True
COMMAND=uptime
REGEXP=.*load average: \d+[,.]\d+, \d+[,.]\d+, (\d+[,.]\d+)
TYPE=TIMEGRAPH
[used_disk_space]
# Used disk space in percent
DISABLED=True
COMMAND=df -h /dev/sda1
REGEXP=(\d{1,3})%
TYPE=GRAPH
[raid_status]
# Raid status
DISABLED=True
COMMAND=cat /proc/mdstat
REGEXP=.*\] \[([U_]+)\]\n
TYPE=ERROR
[laptop_charger_disconnected]
# Laptop charger disconnected
# For laptops used as servers, apparently common among the self hosters. Requires acpi package installed.
DISABLED=True
COMMAND=acpi -a
REGEXP=Adapter \d: (.+)
TYPE=ERROR
[free_ram]
# Free ram in %
# Shows another approach: does all the computation in the command and picks up
# all the output (by not declaring a regexp).
DISABLED=True
COMMAND=free | grep Mem | awk '{print int($4/$2 * 100.0)}'
TYPE=TIMEGRAPH
[available_ram]
# Like Free ram, but shows available instead of free. You may want to use this if you use a memcache.
DISABLED=True
COMMAND=free | grep Mem | awk '{print int($7/$2 * 100.0)}'
TYPE=TIMEGRAPH
[cpu_temperature]
# CPU Temperature alarm: requires lm-sensors installed and configured (check your distribution's guide)
# The regexp must be adapted to your configuration: run `sensors` in the command line
# to find the name of the temperature sensor in your system. In this case is `Core 0`,
# but may be called Tdie or a lot of different names, there is no standard.
DISABLED=True
COMMAND=sensors
REGEXP=Core 0: +\+?(-?\d{1,3}).\d°[CF]
TYPE=TIMEGRAPH
[fan_speed]
# Fan speed alarm: requires lm-sensors installed and configured (check your distribution's guide)
# The regexp must be adapted to your configuration: run `sensors` in the command line
# to find the name of the fan speed sensor in your system.
DISABLED=True
COMMAND=sensors
REGEXP=cpu_fan: +(\d) RPM
TYPE=TIMEGRAPH

117
dashboard/dashboard.py Executable file
View File

@ -0,0 +1,117 @@
#!/usr/bin/env python3
import os
import sys
import time
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging
import traceback
import re
import locale
import subprocess
import configparser
""" @package docstring
Resources dashboard
Starts a webserver on a specific port of the monitored server and serves a simple webpage containing
the monitored sensors graphs.
Installation:
- Copy dashboard.cfg in /usr/local/etc/dashboard.cfg and customize it
- Copy dashboard.py in /usr/local/bin/dashboard.py
Usage:
Start the server:
/usr/local/bin/dashboard.py /usr/local/etc/dashboard.cfg
@author Daniele Verducci <daniele.verducci@ichibi.eu>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
NAME = 'dashboard'
VERSION = '0.1'
DESCRIPTION = 'A simple system resources dashboard'
class WebServer(BaseHTTPRequestHandler):
def __init__(self, configPath):
''' Sets up locale (needed for parsing numbers) '''
# Get system locale from $LANG (i.e. "en_GB.UTF-8")
systemLocale = os.getenv('LANG')
if not systemLocale:
raise ValueError('System environment variabile $LANG is not set!')
locale.setlocale(locale.LC_ALL, systemLocale)
''' Reads the config '''
self._log = logging.getLogger('main')
if not os.path.exists(configPath) or not os.path.isfile(configPath):
raise ValueError('configPath must be a file')
self.config = configparser.ConfigParser(interpolation=None) # Disable interpolation because contains regexp
self.config.read(configPath)
self.hostname = os.uname()[1]
def do_GET(self):
self.readSensors()
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("<html><head><title>https://pythonbasics.org</title></head>", "utf-8"))
self.wfile.write(bytes("<p>Request: %s</p>" % self.path, "utf-8"))
self.wfile.write(bytes("<body>", "utf-8"))
self.wfile.write(bytes("<p>This is an example web server.</p>", "utf-8"))
self.wfile.write(bytes("</body></html>", "utf-8"))
def readSensors():
return
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(
prog = NAME + '.py',
description = NAME + ' ' + VERSION + '\n' + DESCRIPTION,
formatter_class = argparse.RawTextHelpFormatter
)
parser.add_argument('configFile', help="configuration file path")
parser.add_argument('-q', '--quiet', action='store_true', help="suppress non-essential output")
args = parser.parse_args()
if args.quiet:
level = logging.WARNING
else:
level = logging.INFO
logging.basicConfig(level=level)
port =
httpd = HTTPServer(('localhost', port), Server)
logging.info('Serving on port {}'.format(port))
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
except Exception as e:
logging.critical(traceback.format_exc())
print('ERROR: {}'.format(e))
sys.exit(1)
finally:
httpd.server_close()
sys.exit(0)

61
healthcheck/README.md Normal file
View File

@ -0,0 +1,61 @@
# HEALTHCHECK
A simple server health check.
Sends an email and/or executes a command in case of alarm.
As an example, the command may be a ntfy call to obtain a notification on a mobile phone or desktop computer.
Meant to be run with a cron (see healthcheck.cron.example).
Tested on Debian 11, but should run on almost any standard linux box.
![Email](../images/healthcheck_email_notification.png) ![Ntfy](../images/healthcheck_ntfy_notification.png)
## Alarms
Provided ready-to-use alarms in config file:
- cpu load
- disk space
- raid status
- battery level / charger status (for laptops used as servers, apparently common among the self hosters)
- memory status
Alarms that need basic configuration to work on your system:
- cpu temperature (needs to be adapted as every system has a different name for the sensor)
- fan speed (needs to be adapted as every system has a different name for the sensor)
... or you can write your own custom alarm!
## How does it work
The config file contains a list of checks. The most common checks are provided in the config file, but it is possible to configure custom checks, if needed.
Every check definition has:
- DISABLED: boolean, wether to run the check
- ALARM_VALUE_MORE_THAN: float, the alarm is issued if detected value exceeds the configured one
- ALARM_VALUE_LESS_THAN: float, the alarm is issued if detected value is less than the configured one
- ALARM_VALUE_EQUAL: float, the alarm is issued if detected value is equal to the configured one (the values are always compared as floats)
- ALARM_VALUE_NOT_EQUAL: float, the alarm is issued if detected value is not equal to the configured one (the values are always compared as floats)
- ALARM_STRING_EQUAL: string, the alarm is issued if detected value is equal to the configured one (the values are always compared as strings)
- ALARM_STRING_NOT_EQUAL: string, the alarm is issued if detected value is not equal to the configured one (the values are always compared as strings)
- COMMAND: the command to run to obtain the value
- REGEXP: a regular expression that will be executed on the command output and returns a single group that will be compared with ALARM_*. If omitted, the complete command output will be used for comparation.
## Installation
Copy the script and the config file into the system to check:
```
cp healthcheck.py /usr/local/bin/healthcheck.py
cp healthcheck.cfg.example /usr/local/etc/healthcheck.cfg
```
Edit `/usr/local/etc/healthcheck.cfg` enabling the checks you need and configuring email settings.
Run `/usr/local/bin/healthcheck.py /usr/local/etc/healthcheck.cfg` to check it is working. If needed, change the config to make a check fail and see if the notification mail is delivered. If you need to do some testing without spamming emails, run with the parameter `--dry-run`.
Now copy the cron file:
```
cp healthcheck.cron.example /etc/cron.d/healthcheck
```
For increased safety, edit the cron file placing your email address in MAILTO var to be notified in case of healthcheck.py catastrophic failure.
Setup is now complete: the cron runs the script every minute and you will receive emails in case of failed checks.
## Useful notes
### Note on system load averages**:
As stated in the `uptime` command manual:
> System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
### Note on temperature and fan speed checks:
The check to run needs lm-sensors to be installed and configured. Check your distribution install guide.
The sensors have different name in every system, so you WILL need to adapt the configuration.
Some systems have a single temperature sensors for the whole CPU, while some other has a sensor for every core. In this last case, you may want to copy the `[cpu_temperature]` config in N different configs like `[cpu_temperature_0]`, one for every core, and change the REGEX to match `Core 0`, `Core 1` and so on...

View File

@ -76,7 +76,7 @@ class Main:
self.hostname = os.uname()[1]
def run(self, dryRun):
''' Runs the healtg checks '''
''' Runs the health checks '''
for section in self.config:
if section == 'DEFAULT':