μMon - minimal host monitoring

umon_logo_black.png

Homepage: https://tomscii.sig7.se/umon
Source: https://github.com/tomszilagyi/umon

μMon (ascii: uMon; pron.: micro-mon) is a minimal host monitoring toolkit based on RRDtool to store, aggregate and graph time-series metrics data. Metrics are collected via SNMP and simple shell scripts.

For a popular introduction to μMon, please read this article, which tells you the story of why it exists, compares it with existing solutions, and exhibits some screenshots. For a more technical overview and specific instructions, read the rest of this document.

μMon is currently usable under Linux (tested on Debian) and OpenBSD, but should be portable to other Unix-like systems with minimal hassle.

μMon is published under a permissive BSD license.

Table of Contents

Overview

μMon essentials

As opposed to more accomplished (and much more complex) monitoring stacks, the overarching architectural principle of μMon is simplicity. This results in some noteworthy properties:

  • From μMon's perspective, all hosts are self-contained. Each probe gathers metrics directly observed on the local host. Data collection does not inherently entail network traffic. There is no single point of failure in the monitoring system; as long as a host is up, its own metrics will be collected.
  • There is no central database to run queries against; each probe feeds data into a local round-robin archive (an RRDtool database file). Data collection is scheduled by a trivial crontab entry.
  • As a consequence of self-containment, all hosts run their own instance of the μMon fastCGI handler to serve the web UI. This is a small C++ program that accepts input on its stdin and writes to its stdout and stderr. This program does not contain any networking code. Thus, it must be run via some kind of network wrapper. Both socat and (x)inetd work well; the latter is recommended for actual usage.
  • Because everything is local, and RRDtool does all the data series aggregation, serving graphs and views is fast. It does not matter if you are looking at the last hour or last year of data.
  • The fastCGI handler must be hooked into a webserver, which might run on the same host or a different host. There are obvious implications of these choices, none of which concern μMon itself. See below for detailed setup and configuration guidance.
  • There is no alerting logic, only (scheduled) metrics collection and (on-demand) graph generation. μMon is meant as a lightweight solution to robust systems that need not be closely monitored.

The downside of this very simple architecture is the lack of a "central instance" to upgrade: the μMon deployment will have to be updated on all systems to roll out a change. Also, there is no systemic way to collect and display metrics that span multiple hosts, short of implementing a single probe on one of the hosts to collect and aggregate the data in the desired way.

Probes

μMon currently offers these probes for monitoring various aspects of the system:

Name Description Source Parameter
cpu Per-core CPU usage percentages SNMP  
df Filesystem utilisation ("disk free") df  
diskio Block device I/O ops/sec and transfer rates SNMP <device>
doveadm-who dovecot: Number of connected IMAP users doveadm  
if Network interface packets/sec and transfer rates SNMP <interface>
load System load averages (1, 5 and 15 mins) uptime  
smtpd OpenSMTPD statistics smtpctl  
upsc UPS: line voltage, battery charge, load % upsc <ups-name>
vmstat Virtual memory subsystem statistics vmstat  

Some probes assume they are run as singletons – e.g., it does not make sense to have more than one instance of the load probe on any system. Other probes must be parameterized for each instance. For example, the if probe needs a separate instance for each interface to be monitored (and the user must decide which interfaces to monitor).

Configuring what to monitor is as simple as listing all probe instances – with any necessary parameters – in the file probe.conf. Here is a simple example:

# This is probe.conf - commented lines like this are discarded.
probe load
probe cpu
probe vmstat
probe df
probe diskio sda
probe if ens3
probe if lo

Note that not all supported probes are present. This example is from a host that does not run any mail software, so the doveadm-who and smtpd probes have been omitted. There is no UPS either, so upsc is missing as well.

Graphs

Each defined probe instance produces and feeds one round-robin archive file. This RRA will contain a set of data series (variables). Some probes have a hardwired set of variables, such as load (moving averages for 1, 5 and 15 minutes) or if (received and transmitted bytes and packets on the given network interface). Other probes determine the set of variables dynamically, for example cpu (depending on the number of cores) or df (depending on the actual file systems).

Graphs are loosely coupled to the probes. They are defined under graphs/ and use all or a subset of the data series maintained by a given probe. The graph is named after the probe, suffixed with a specialization if it only displays part of the captured data. For example, the cpu graph script creates the CPU utilisation graph (there is only one such graph on any system). On the other hand, diskio-ops and diskio-xfer are two graphs based on diskio probe data, available for each monitored network interface.

In theory, it would be possible to have graphs that display data from multiple probes, but there is no such graph yet.

Graphs might look at the probe state (environment) to make use of dynamically determined information that the probe has stored there, such as the names of data series (e.g., in case of df, the actual filesystem mount points). They also receive the time span (e.g., to display the last 6 hours) as a command line argument, so they invoke rrdtool graph with the correct parameter.

Views

The contents of probes.conf inform the view layout as well, by getting filtered through the view scripts under views/. For example, on the main view, each defined probe instance will generate some graph, but (to reduce clutter) not necessarily all the graphs based on the probe's data. More specialized views work the opposite way. For example, the upsc view will generate all the graphs based on data collected by the upsc probe, and no graphs for any of the other probes.

Not all views might make sense for a given probe configuration. If there is no upsc probe instance, the upsc view should not be offered. This is something left to be configured by the user. The file views.conf should contain the names of views, one per line. This would be a good fit for the probe config above:

# This is views.conf - commented lines like this are discarded.
main
io
disk
network
vmstat

Extending μMon

Implementing new probes is fairly straightforward based on existing ones – it is best to clone one that has something in common with the new probe to be added: SNMP vs. command-based query; fixed vs. dynamic set of data series; etc.. The scripts reside under probes/.

Probes are permitted to have some persistent state (considered to be the probe's environment), which is customarily saved beside the probe script in an .env file. For example, the CPU usage probe cpu.sh creates cpu.env on its first invocation to store the number of cores so it does not have to be queried on every run. If a probe is changed in an incompatible way, this .env file must be deleted together with the RRA so the probe can start with a clear state.

Depending on the probe, its data will be displayed on one or more graphs. Creating these graphs is mostly copy-paste based on existing ones, with some trial and error to make the output graphs of rrdtool look good. The new graphs will have to be added to some existing views (chiefly the main view), with possibly a bespoke (thematic) view created to display all of them.

This sounds like more hassle than it actually is, but it is more or less only cloning and light adaptation of some small shell scripts, not a "serious" programming activity. As a plus, the fastCGI handler (and its network wrapper, e.g. xinetd) does not have to be restarted during development. Changes made to any of the shell scripts are immediately effective (on the next sample for probes, and on the next page load for graphs and views).

Installation and getting started

Since μMon is very hands-on and sooner or later you will want to change something in its source scripts, it is probably best to clone it from git onto some dev machine. This can be your personal laptop, a dedicated test (staging) host, etc. On this machine, run μMon directly from the git clone. This allows you to have direct feedback on any changes you make and version-control those changes on a local branch. When you want to deploy your flavour of μMon to "production" (i.e., your other machines), execute make dist in the top-level folder of this checkout. This will create umon.tar.gz that is referenced by the below instructions.

OpenBSD

Initial deployment

Create a dedicated unprivileged user:

# useradd -c "uMon" -d /var/umon -s /sbin/nologin -L daemon _umon
# mkdir /var/umon
# chown _umon:_umon /var/umon

Untar the umon archive to the home directory of this user:

# doas -u _umon /bin/sh -c "cd && tar xzf /path/to/umon.tar.gz"

Install dependencies:

# pkg_add rrdtool socat

Note: socat is only needed for running the fcgi as a standalone service; for production, usage of inetd (part of the base system, hence no need to install it) is recommended.

Create a copy of examples/*.conf to the main μMon directory (one level up). You do not necessarily need to make changes to all of these files, but you are encouraged to look at them. If you decide to make changes (now or later), having the upstream version under examples/ will protect you from overwriting your local config in case you untar an updated archive on top of your μMon instance.

Setting up metrics collection

Configure and enable snmpd:

# cp /etc/examples/snmpd.conf /etc/
# echo "listen on 127.0.0.1 snmpv2c" >> /etc/snmpd.conf
# echo "read-only community public" >> /etc/snmpd.conf
# rcctl enable snmpd
# rcctl start snmpd

Note: you are free (and encouraged) to use a different community string. Please set SNMP_COMMUNITY in umon.conf accordingly if you do so.

Verify that snmp queries work, output should be something similar to below:

$ snmp walk -v 2c -c public localhost ifDescr
ifDescr.1 = STRING: em0
ifDescr.2 = STRING: enc0
ifDescr.3 = STRING: lo0
ifDescr.4 = STRING: pflog0

By default, RRD databases will be created under db/, with consolidation functions to yield appropriate granularity on view durations ranging from the last 1 hour to the last 2 years. You can change these by editing umon.conf, but everything should be fine as is.

Configure the plugins by editing probes.conf. This should be straightforward, but you might need to check your system to know what the appropriate device names are.

Depending on which probes you have enabled, you might need some or all of the below privilege escalations added to /etc/doas.conf:

# Permit _umon to run query commands
permit nopass _umon as root cmd doveadm args who
permit nopass _umon as root cmd smtpctl args show stats

Now run probes/sample.sh manually and observe that it creates RRD databases without errors:

$ probes/sample.sh
Creating directory for RRD files: db
...
Creating db/load.rrd
Creating db/df.rrd
Creating db/cpu.rrd
Creating db/vmstat.rrd
...

Create a crontab entry for running sample.sh once every minute:

$ crontab -l
# min hr  dom mon dow command
*   *   *   *   *   /var/umon/probes/sample.sh 2>&1 >/dev/null

Setting up the web view

Configure httpd by adding this to /etc/httpd.conf:

# uMon - rrdtool-based monitoring
server "default" {
    listen on * port 8888
    location "/*" {
        fastcgi socket tcp 127.0.0.1 3333
    }
}

Feel free to change the listen address/port appropriately, e.g. listen on 127.0.0.1 instead of listen on * if you do not wish the μMon webpage to be accessible from the network. In such case, you can still access it by setting up an ssh tunnel. Or set up http basic auth so only people who know the credentials have access to it. Etc.

Make sure your changes take effect:

# rcctl reload httpd

Compile the fastCGI server:

make -C fcgi

This creates the fcgi/umon_fcgi binary executable from its C++ sources.

If you want to develop the fastCGI server program of μMon, it is convenient to run the server standalone so you can see the stdout in the console. You can do that by executing the wrapper script fcgi/standalone.sh. This requires socat to be installed. Note: to develop or change probes, graphs or views, you do not need to touch the fastCGI server, as it only invokes the corresponding shell scripts and is itself quite generic. Hence, you most probably want to deploy as "normal".

For normal deployment, you will want to set up inetd to invoke the fastCGI server via the fcgi/inetd.sh wrapper. This is convenient for development of graphs and views as well, in the sense that nothing has to be restarted after rebuilding the umon_fcgi executable; however, stderr will go to /dev/null. Use a config similar to this:

# cat /etc/inetd.conf
127.0.0.1:3333  stream  tcp     nowait  _umon     /var/umon/fcgi/inetd.sh

Make sure you enable and start inetd as appropriate:

# rcctl enable inetd
# rcctl start inetd

Configure the views you want to access by editing views.conf.

Navigate your browser to http://your.hostname:8888. The main view should load. (Tip: you can read this document by clicking on the μMon logo in the navbar!)

Updating the deployment

If you have a new, updated source archive umon.tar.gz, you can safely untar it on top of your existing installation:

doas -u _umon /bin/sh -c "cd && tar xzf /path/to/umon.tar.gz && make -C fcgi"

Your actual config files will not be overwritten. If there were changes made to the example config files, you might want to migrate some or all of those to your actual config (*.conf in the main μMon directory).

If there is a change to the RRD format (data series names) produced by a probe, reinitialize it by deleting the corresponding RRD file(s) under db/ and any probe state files probes/*.env produced by the probe (the base name should match the probe).

Linux

Initial deployment

Create a dedicated unprivileged user:

useradd -s /usr/sbin/nologin -r -M -d /var/umon _umon
mkdir /var/umon
chown _umon:_umon /var/umon

Untar the umon archive to the home directory of this user:

# sudo -u _umon /bin/sh -c "cd && tar xzf /path/to/umon.tar.gz"

Install dependencies:

# apt-get install rrdtool socat

Note: socat is only needed for running the fcgi as a standalone service; for production, usage of xinetd is recommended.

Create a copy of examples/*.conf to the main μMon directory (one level up). You do not necessarily need to make changes to all of these files, but you are encouraged to look at them. If you decide to make changes (now or later), having the upstream version under examples/ will protect you from overwriting your local config in case you untar an updated archive on top of your μMon instance.

Setting up metrics collection

Install and configure Net-SNMP:

# apt-get install snmp snmpd snmp-mibs-downloader

Edit /etc/snmp/snmpd.conf and open up access to OIDs by adding a line such as:

view   systemonly  included   .1.3.6

Restart snmpd for the changes to take effect:

# systemctl restart snmpd

Note: you are free (and encouraged) to use a different community string. Please set SNMP_COMMUNITY in umon.conf accordingly if you do so.

Verify that snmp queries work, output should be something similar to below:

$ snmpwalk -O n -v 2c -c public localhost .1.3.6.1.2.1.31.1.1.1.1
.1.3.6.1.2.1.31.1.1.1.1.1 = STRING: "lo"
.1.3.6.1.2.1.31.1.1.1.1.2 = STRING: "ens3"

By default, RRD databases will be created under db/, with consolidation functions to yield appropriate granularity on view durations ranging from the last 1 hour to the last 2 years. You can change these by editing umon.conf, but everything should be fine as is.

Configure the plugins by editing probes.conf. This should be straightforward, but you might need to check your system to know what the appropriate device names are.

Depending on which probes you have enabled, you might need to delegate the privilege of running certain commands (as root) to the _umon user. On a stock Linux using sudo, just create /etc/sudoers.d/umon with some or all of the below lines:

# Permit _umon to run query commands
_umon   ALL=(root) NOPASSWD: /usr/bin/doveadm
_umon   ALL=(root) NOPASSWD: /usr/bin/smtpctl

Now run probes/sample.sh manually and observe that it creates RRD databases without errors:

# sudo -u _umon /bin/bash
$ cd
$ probes/sample.sh
Creating directory for RRD files: db
...
Creating db/load.rrd
Creating db/df.rrd
Creating db/cpu.rrd
Creating db/vmstat.rrd
...

Create a crontab entry for running sample.sh once every minute:

$ crontab -l
# min hr  dom mon dow command
*   *   *   *   *   /var/umon/probes/sample.sh 2>&1 >/dev/null

If you are collecting or tailing syslog messages as a matter of course, you might be annoyed by the nonsensical verbosity of cronjob logging, emitting not one but three entries per minute. To stop these useless entries from flooding your logs, create /etc/rsyslog.d/umon_block.conf with the below content:

if $msg contains "pam_unix(cron:session)" or $msg contains "_umon"
then {
    stop
}

Do not forget to restart the syslog daemon for this to take effect.

Setting up the web view

The below configuration example applies to nginx on the local host. If you use a different webserver or want to access the fastCGI socket from a different host, please adapt your config accordingly.

Merge this snippet into your enabled virtual hosts configs:

server {
    fastcgi_param  CONTENT_LENGTH     $content_length;
    fastcgi_param  CONTENT_TYPE       $content_type;
    fastcgi_param  DOCUMENT_URI       $document_uri;
    fastcgi_param  QUERY_STRING       $query_string;
    fastcgi_param  REQUEST_METHOD     $request_method;
    fastcgi_param  REQUEST_URI        $request_uri;

    listen 8888;
    location / {
        fastcgi_pass 127.0.0.1:3333;
    }
}

(If you do not have any webserver installed, just apt-get install nginx and add the above snippet to /etc/nginx/sites-enabled/default.)

Make sure your changes take effect:

systemctl reload nginx

Compile the fastCGI server:

make -C fcgi

Please see above in the equivalent OpenBSD section for a discussion on running the fastCGI server for development purposes.

For normal deployment, you will want to set up xinetd to invoke the fastCGI server via the fcgi/inetd.sh wrapper. This is convenient for development of graphs and views as well, in the sense that nothing has to be restarted after rebuilding the umon_fcgi executable; however, stderr will go to /dev/null.

If you do not yet have it installed, apt-get install xinetd. Then, create a file /etc/xinetd.d/umon_fcgi with the below content:

service umon_fcgi
{
        disable         = no
        type            = UNLISTED
        socket_type     = stream
        protocol        = tcp
        interface       = 127.0.0.1
        port            = 3333
        user            = _umon
        wait            = no
        server          = /var/umon/fcgi/inetd.sh
}

Don't forget to enable/start xinetd as appropriate:

# systemctl enable xinetd
# systemctl start xinetd

Configure the views you want to access by editing views.conf.

Navigate your browser to http://your-hostname:8888. The main view should load. (Tip: you can read this document by clicking on the μMon logo in the navbar!)

Updating the deployment

The same considerations apply as with OpenBSD (see above).

The equivalent update command to use:

sudo -u _umon /bin/sh -c "cd && tar xzf /path/to/umon.tar.gz && make -C fcgi"