Monitoring software send alerts (typically e-mails) in case of failure of one
or more services. The time taken to fix the problem is the performance indicator
for the system administrator. This becomes even more critical when the problem
has resulted in disruption of deployed and live services. What if, the
monitoring software itself could fix the problem(s), as and when they arise? The
idea being, what does a system administrator do to fix an issue? He/she may
execute a couple of commands, may edit configuration files or restart the
services. Given all these actions as a batch, a software can execute them
against the affected system to fix a problem. This is where Monit comes to help.
Besides monitoring the system, Monit can be instructed to take custom actions on
specific alerts corresponding to problems/failures in services.
Monit is defined at its website http://mmonit. com/monit, as a free Open
Source utility for managing and monitoring, processes, files, directories and
file systems on a UNIX system. So, we decided to check its utility. The setup
and examples in this article have been executed using Monit version 5.0.3 on a
machine running Fedora 12.
Installation and basic configuration
Installing Monit is as simple as issuing the following:
yum install monit
Open the file named monit.conf found in /etc/monit and append the following
lines to it:
set alert admin@foo.com
set mailserver localhost
Replace admin@foo.com with the e-mail address to which you want to send
alerts. In case, no SMTP server like sendmail or postfix is running on the
machine (which runs Monit), replace localhost with the name/IP address of a
machine running an SMTP or mail server on your network.
To specify more than one recipients you can append lines,
as follows, to the monit.conf file:
set alert admin1@foo.com
set alert admin2@foo.com
Start the Monit service as: service monit start
Subsequently, we can setup the various services to monitor
by writing Monit configuration file(s) in the direcory /etc/monit.d/. Some
examples are as follows:
1. Monitoring a service
Lets take Apache web server as an example. Suppose Apache server is
overloaded with connections, not responding or has died, the following Monit
configuration can detect it.
check host Apache with address localhost
if failed url
http://localhost/index.php
then alert
Save the above configuration in a file named Apache in the
directory /etc/monit.d. Then reload Monit as:
service monit reload
Assuming that Monit is running on the same machine as the
Apache web server, the above tries to fetch a URL http://localhost/index.php.
When not able to fetch, Monit sends an e-mail alert to the e-mail address(es)
configured in monit.conf (as explained above). The e-mail alert looks something
like this:
Subject: monit alert -- Connection failed Apache
Connection failed Service Apache
Date: Tue, 16 Feb 2010 17:32:33 +0530
Action: alert
Host: laptop.it4enterprise.net
Description: failed, cannot open a connection to INET
But wait, didn't we say that Monit can alert as well as fix
the problem. In this case, Monit should perhaps restart the Apache web server.
For this modify the configuration in /etc/monit.d/apache as follows:
check host Apache with address localhost
start program = "/etc/init.d/httpd restart"
if failed url
http://localhost/index.php
then restart
Issue service monit reload. And voila, Monit will send the
alert as before, but this time it will also restart Apache. What's more, in a
few seconds, it will drop an e-mail informing about the successful resumption of
the service. The e-mail looks as follows:
Subject: monit alert -- Connection succeeded Apache
Connection succeeded Service Apache
Date: Tue, 16 Feb 2010 17:47:26 +0530
Action: alert
Host: laptop.it4enterprise.net
Description: connection succeeded to INET
2. A failover setup
For this example, let's assume Monit is running on a separate machine (say
192.168.2.1) and monitoring an application server. To check whether the server
machine is up and running, a Monit configuration file looks as follows:
check host appserver with address 192.168.2.100
if failed icmp type echo count 3
then alert
Save the above configuration in a file named appserver in
the directory /etc/monit.d. Then reload Monit. The above script will check
whether the application server (called appserver) with IP 192.168.2.100 is alive
i.e. responding to ping. If not, it will send an alert as follows:
Subject: monit alert -- ICMP failed appserver
ICMP failed Service appserver
Date: Tue, 16 Feb 2010 18:12:07 +0530
Action: alert
Host: laptop.it4enterprise.net
Description: failed ICMP test
So far so good. But what will be the remedial action? Lets
assume that there is a backup/mirror server running at 192.168.2.101. So what if
we create an network alias to it with the IP set as 192.168.2.100. Subsequently
all traffic to 192.168.2.100 (which is not accessible) will land at the backup
server. For this we will SSH into 192.168.2.101 server and issue the following
command:
ifconfig eth0:0 192.168.2.100
But we will automate this via a shell script which will
used by the app server Monit configuration. The shell script is as follows:
#!/bin/bash
ssh root@192.168.2.101 "ifconfig eth0:0 192.168.2.100"
Save the above in a file named add_alias.sh in the
directory /opt. Give the file executable permissions as:
chmod +x /opt/add_alias.sh
SSH password prompt can be suppressed by using SSH keys
(generated by ssh-keygen). Refer to the tutorial at http://rcsg-gsir.imsb-dsgi.nrc-cnrc.gc.ca/documents/internet/node31.html
for a password-less SSH login. Next modify the file /etc/monit.d/appserver as
follows:
check host appserver with address 192.168.2.100
if failed icmp type echo count 3
then exec "/opt/add_alias.sh"
Note the use of “exec” to execute an external command or
shell script. One can debate that something like heartbeat is better to use for
such a failover setup. While this is true, we are showcasing that if you have
Monit installed, you can use it for this purpose too.
3. When the storage space goes low
Suppose one of the servers is using a storage volume which is about to get
exhausted — reached about 99% of usage. Monit can monitor this and also assign a
larger volume for storage. Following shall be the Monit configuration:
check filesystem storage with path /mnt/storage
if space usage > 30%
then exec /opt/assign_premium_storage
The assign_premium_storage script can be something as
follows:
#!/bin/bash
# mount the premium (larger) volume on /mnt/storage2
mount -t cifs //storage-server/premium-volume /mnt/storage2 -o username=storageadmin,password=secret
# copy all the files from the current storage to the new storage
cp -R /mnt/storage/* /mnt/storage2
# umount the storage
umount /mnt/storage
umount /mnt/storage2
# mount the new premium storage onto the old location to make it totally
transparent to the user
mount -t cifs //storage-server/premium-volume /mnt/storage -o username=storageadmin,password=secret
In the above script, it is assumed that the network storage
volume is mounted at /mnt/storage.
With the power of custom scripts in hand, you can use Monit
even for intricate tasks. For instance, when the load of virtual machines in a
cluster goes high, it can automatically provision another virtual machine to
share the load!