Once the production system is working, you may think that the job is
done and the developers can switch to a new project. Unfortunately,
in most cases the server will still need to be maintained to make
sure that everything is working as expected, to ensure that the web
server is always up, and much more. A large part of this job can be
automated, which will save time. It will also increase the uptime of
the server, since automated processes generally work faster than
manual ones. If created properly, automated processes also will
always work correctly, whereas human operators are likely to make
occassional mistakes.
5.10.1. Interactive Monitoring
When you're
getting started, it usually
helps to monitor the server interactively. Many different tools are
available to do this. We will discuss a few of them now.
When writing automated monitoring tools, you should start by
monitoring the tools themselves until they are reliable and stable
enough to be left to work by themselves.
Even when everything is automated, you should check at regular
intervals that everything is working OK, since a minor change in a
single component can silently break the whole monitoring system. A
good example is a silent failure of the mail system—if all
alerts from the monitoring tools are delivered through email, having
no messages from the system does not necessarily mean that everything
is OK. If emails alerting about a problem cannot reach the webmaster
because of a broken email system, the webmaster will not realize that
a problem exists. (Of course, the mailing system should be monitored
as well, but then problems must be reported by means other than
email. One common solution is to send messages by both email and to a
mobile phone's short message service.)
Another very important (albeit often-forgotten) risk time is the
post-upgrade period. Even after a minor upgrade, the whole service
should be monitored closely for a while.
The first and simplest check is to visit a few pages from the service
to make sure that things are working. Of course, this might not
suffice, since different pages might use different
resources—while code that does not use the database system
might work properly, code that does use it might not work if the
database server is down.
The second thing to check is the web server's
error_log file. If there are any problems, they
will probably be reported here. However, only obvious syntactic or
malfunction bugs will appear here—the subtle bugs that are a
result of bad program logic will be revealed only through careful
testing (which should have been completed before upgrading the live
server).
Periodic system health checking can be done using the
top utility, which shows
free memory and swap space, the machine's CPU load,
etc.
5.10.2. Apache::VMonitor—The Visual System and Apache Server Monitor
The Apache::VMonitormodule provides even better
monitoring functionality than top. It supplies
all the relevant information that top does, plus
all the Apache-specific information provided by
Apache's mod_status module (request processing time,
last request's URI, number of requests served by
each child, etc.) In addition, Apache::VMonitor
emulates the reporting functions of the top,
mount, and df utilities.
Apache::VMonitor has a special mode for mod_perl
processes. It also has visual alerting capabilities and a
configurable "automatic refresh"
mode. A web interface can be used to show or hide all sections
dynamically.
The module provides two main viewing modes:
Multi-processes and overall system status
Single-process extensive reporting
5.10.2.1. Prerequisites and configuration
To run Apache::VMonitor,
you need to have
Apache::Scoreboard installed and configured in
httpd.conf.
Apache::Scoreboard, in turn, requires
mod_status to be installed
with ExtendedStatus enabled. In
httpd.conf, add:
ExtendedStatus On
Turning on extended mode will add a certain overhead to each
request's response time. If every millisecond
counts, you may not want to use it in production.
You also need Time::HiRes and
GTop to be installed. And, of course, you need a
running mod_perl-enabled Apache server.
To enable Apache::VMonitor, add the following
configuration to httpd.conf:
The monitor will be displayed when you request
http://localhost/system/vmonitor/.
You probably want to protect this location from unwanted visitors. If
you are accessing this location from the same IP address, you can use
a simple host-based authentication:
<Location /system/vmonitor>
SetHandler perl-script
PerlHandler Apache::VMonitor
order deny,allow
deny from all
allow from 132.123.123.3
</Location>
Alternatively, you may use Basic or other authentication schemes
provided by Apache and its extensions.
You should load the module in httpd.conf:
PerlModule Apache::VMonitor
or from the the startup file:
use Apache::VMonitor( );
You can control the behavior of Apache::VMonitor
by configuring variables in the startup file or inside the
<Perl>section. To alter the monitor
reporting behavior, tweak the following configuration arguments from
within the startup file:
You can control the sorting of the mod_perl processes report by
sorting them by one of the following columns: pid,
mode, elapsed,
lastreq, served,
size, share,
vsize, rss,
client, or request. For
example, to sort by the process size, use the following setting:
$Apache::VMonitor::Config{SORT_BY} = "size";
As the application provides an option to monitor processes other than
mod_perl processes, you can define a regular expression to match the
relevant processes. For example, to match the process names that
include "httpd_docs",
"mysql", and
"squid", the following regular
expression could be used:
We will discuss all these configuration options and their influence
on the application shortly.
5.10.2.2. Multi-processes and system overall status reporting mode
The first mode is the one
that's used most often, since it allows you to
monitor almost all important system resources from one location. For
your convenience, you can turn different sections on and off on the
report, to make it possible for reports to fit into one screen.
This mode comes with the following features:
Automatic Refresh Mode
You can tell the application to refresh the report every few seconds.
You can preset this value at server startup. For example, to set the
refresh to 60 seconds, add the following configuration setting:
$Apache::VMonitor::Config{REFRESH} = 60;
A 0 (zero) value turns off automatic refresh.
When the server is started, you can always adjust the refresh rate
through the user interface.
top Emulation: System Health Report
Like top, this shows the current date/time,
machine uptime, average load, and all the system CPU and memory usage
levels (CPU load, real memory, and swap partition usage).
The top section includes a swap space usage
visual alert capability. As we will explain later in this chapter,
swapping is very undesirable on production systems. This tool helps
to detect abnormal swapping situations by changing the swap report
row's color according to the following rules:
swap usage report color
---------------------------------------------------------
5Mb < swap < 10 MB light red
20% < swap (swapping is bad!) red
70% < swap (almost all used!) red + blinking (if enabled)
Note that you can turn on the blinking mode with:
$Apache::VMonitor::Config{BLINKING} = 1;
The module doesn't alert when swap is being used
just a little (< 5 Mb), since swapping is common on many Unix
systems, even when there is plenty of free RAM.
If you don't want the system section to be
displayed, set:
$Apache::VMonitor::Config{SYSTEM} = 0;
The default is to display this section.
top Emulation: Apache/mod_perl Processes Status
Like top, this emulation gives a report of the
processes, but it shows only information relevant to mod_perl
processes. The report includes the status of the process (Starting,
Reading, Sending, Waiting, etc.), process ID, time since the current
request was started, last request processing time, size, and shared,
virtual, and resident size. It shows the last
client's IP address and the first 64 characters of
the request URI.
This report can be sorted by any column by clicking on the name of
the column while running the application. The sorting can also be
preset with the following setting:
$Apache::VMonitor::Config{SORT_BY} = "size";
The valid choices are pid,
mode, elapsed,
lastreq, served,
size, share,
vsize, rss,
client, and request.
The section is concluded with a report about the total memory being
used by all mod_perl processes as reported by the kernel, plus an
extra number approximating the real memory usage when memory sharing
is taking place. We discuss this in more detail in Chapter 10.
If you don't want the mod_perl processes section to
be displayed, set:
$Apache::VMonitor::Config{APACHE} = 0;
The default is to display this section.
top Emulation: Any Processes
This section, just like the mod_perl processes section, displays the
information as the top program would. To enable
this section, set:
$Apache::VMonitor::Config{PROCS} = 1;
The default is not to display this section.
You need to specify which processes are to be monitored. The regular
expression that will match the desired processes is required for this
section to work. For example, if you want to see all the processes
whose names include any of the strings
"http",
"mysql", or
"squid", use the following regular
expression:
Figure 5-1 visualizes the sections that have been
discussed so far. As you can see, the swap memory is heavily used.
Although you can't see it here, the swap memory
report is colored red.
Figure 5-1. Emulation of top, centralized information about mod_perl and selected processes
mount Emulation
This section provides information about mounted filesystems, as if
you had called mount with no parameters.
If you want the mountsection to be displayed,
set:
$Apache::VMonitor::Config{MOUNT} = 1;
The default is not to display this section.
df Emulation
This section completely reproduces the df
utility. For each mounted filesystem, it reports the number of total
and available blocks for both superuser and user, and usage in
percentages. In addition, it reports available and used file inodes
in numbers and percentages.
This section can give you a visual alert when a filesystem becomes
more than 90% full or when there are less than 10% of free file
inodes left. The relevant filesystem row will be displayed in red and
in a bold font. A mount point directory will blink if blinking is
turned on. You can turn the blinking on with:
$Apache::VMonitor::Config{BLINKING} = 1;
If you don't want the df
section to be displayed, set:
$Apache::VMonitor::Config{FS_USAGE} = 0;
The default is to display this section.
Figure 5-2 presents an example of the report
consisting of the last two sections that were discussed
(df and mount emulation),
plus the ever-important mod_perl processes report.
Figure 5-2. Emulation of df, both inodes and blocks
In Figure 5-2, the /mnt/cdrom
and /usr filesystems are more than 90% full and
therefore are colored red. This is normal for
/mnt/cdrom, which is a mounted CD-ROM, but might
be critical for the /usr filesystem, which
should be cleaned up or enlarged.
Abbreviations and hints
The report uses many abbreviations that might be new for you. If you
enable the VERBOSE mode with:
$Apache::VMonitor::Config{VERBOSE} = 1;
this section will reveal the full names of the abbreviations at the
bottom of the report.
The default is not to display this section.
5.10.2.3. Single-process extensive reporting system
If you
need
to get in-depth information about a single process, just click on its
PID. If the chosen process is a mod_perl process, the following
information is displayed:
Process type (child or parent), status of the process (Starting,
Reading, Sending, Waiting, etc.), and how long the current request
has been being processed (or how long the previous request was
processed for, if the process is inactive at the moment the report
was made).
How many bytes have been transferred so far, and how many requests
have been served per child and per slot. (When the child process
quits, it is replaced by a new process running in the same slot.)
CPU times used by the process: total,
utime, stime,
cutime, cstime.
For all processes (mod_perl and non-mod_perl), the following
information is reported:
General process information: UID, GID, state, TTY, and command-line
arguments
Memory usage: size, share, VSize, and RSS
Memory segments usage: text, shared lib, date, and stack
Memory maps: start-end, offset, device_major:device_minor, inode,
perm, and library path
Sizes of loaded libraries
Just as with the multi-process mode, this mode allows you to
automatically refresh the page at the desired intervals.
Figure 5-3. Extended information about processes: general process information
Figure 5-4. Extended information about processes: memory usage and maps
Figure 5-5. Extended information about processes: loaded libraries
5.10.3. Automated Monitoring
As we mentioned earlier, the more
things are automated, the more stable the server will be. In general,
there are three things that we want to ensure:
Apache is up and properly serving requests. Remember that it can be
running but unable to serve requests (for example, if there is a
stale lock and all processes are waiting to acquire it).
All the resources that mod_perl relies on are available and working.
This might include database engines, SMTP services, NIS or LDAP
services, etc.
The system is healthy. Make sure that there is no system resource
contention, such as a small amount of free RAM, a heavily swapping
system, or low disk space.
None of these categories has a higher priority than the others. A
system administrator's role includes the proper
functioning of the whole system. Even if the administrator is
responsible for just part of the system, she must still ensure that
her part does not cause problems for the system as a whole. If any of
the above categories is not monitored, the system is not safe.
A specific setup might certainly have additional concerns that are
not covered here, but it is most likely that they will fall into one
of the above categories.
Before we delve into details, we should mention that all automated
tools can be divided into two categories: tools that know how to
detect problems and notify the owner, and tools that not only detect
problems but also try to solve them, notifying the owner about both
the problems and the results of the attempt to solve them.
Automatic tools are generally called watchdogs.
They can alert the owner when there is a problem, just as a watchdog
will bark when something is wrong. They will also try to solve
problems themselves when the owner is not around, just as watchdogs
will bite thieves when their owners are asleep.
Although some tools can perform corrective actions when something
goes wrong without human intervention (e.g., during the night or on
weekends), for some problems it may be that only human intervention
can resolve the situation. In such cases, the tool should not attempt
to do anything at all. For example, if a hardware failure occurs, it
is almost certain that a human will have to intervene.
Below are some techniques and tools that apply to each category.
5.10.3.1. mod_perl server watchdogs
One simple watchdog solution is to use a
slightly modified apachectlscript, which we
have called apache.watchdog. Call it from
cron every 30 minutes—or even every
minute—to make sure that the server is always up.
The crontab entry for 30-minute intervals would
read:
--------------------
#!/bin/sh
# This script is a watchdog checking whether
# the server is online.
# It tries to restart the server, and if it is
# down it sends an email alert to the admin.
# admin's email
EMAIL=webmaster@example.com
# the path to the PID file
PIDFILE=/home/httpd/httpd_perl/logs/httpd.pid
# the path to the httpd binary, including any options if necessary
HTTPD=/home/httpd/httpd_perl/bin/httpd_perl
# check for pidfile
if [ -f $PIDFILE ] ; then
PID=`cat $PIDFILE`
if kill -0 $PID; then
STATUS="httpd (pid $PID) running"
RUNNING=1
else
STATUS="httpd (pid $PID?) not running"
RUNNING=0
fi
else
STATUS="httpd (no pid file) not running"
RUNNING=0
fi
if [ $RUNNING -eq 0 ]; then
echo "$0 $ARG: httpd not running, trying to start"
if $HTTPD ; then
echo "$0 $ARG: httpd started"
mail $EMAIL -s "$0 $ARG: httpd started" \
< /dev/null > /dev/null 2>&1
else
echo "$0 $ARG: httpd could not be started"
mail $EMAIL -s "$0 $ARG: httpd could not be started" \
< /dev/null > /dev/null 2>&1
fi
fi
Another approach is to use the Perl LWP module to
test the server by trying to fetch a URI served by the server. This
is more practical because although the server may be running as a
process, it may be stuck and not actually serving any
requests—for example, when there is a stale lock that all the
processes are waiting to acquire. Failing to get the document will
trigger a restart, and the problem will probably go away.
We set a cron job to call this LWP script every
few minutes to fetch a document generated by a very light script. The
best thing, of course, is to call it every minute (the finest
resolution cron provides). Why so often? If the
server gets confused and starts to fill the disk with lots of error
messages written to the error_log, the system
could run out of free disk space in just a few minutes, which in turn
might bring the whole system to its knees. In these circumstances, it
is unlikely that any other child will be able to serve requests,
since the system will be too busy writing to the
error_log file. Think big—if running a
heavy service, adding one more request every minute will have no
appreciable impact on the server's load.
#!/usr/bin/perl -Tw
# These prevent taint checking failures
$ENV{PATH} = '/bin:/usr/bin';
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};
use strict;
use diagnostics;
use vars qw($VERSION $ua);
$VERSION = '0.01';
require LWP::UserAgent;
###### Config ########
my $test_script_url = 'http://www.example.com:81/perl/test.pl';
my $monitor_email = 'root@localhost';
my $restart_command = '/home/httpd/httpd_perl/bin/apachectl restart';
my $mail_program = '/usr/lib/sendmail -t -n';
######################
$ua = LWP::UserAgent->new;
$ua->agent("$0/watchdog " . $ua->agent);
# Uncomment the following two lines if running behind a firewall
# my $proxy = "http://www-proxy";
# $ua->proxy('http', $proxy) if $proxy;
# If it returns '1' it means that the service is alive, no need to
# continue
exit if checkurl($test_script_url);
# Houston, we have a problem.
# The server seems to be down, try to restart it.
my $status = system $restart_command;
my $message = ($status = = 0)
? "Server was down and successfully restarted!"
: "Server is down. Can't restart.";
my $subject = ($status = = 0)
? "Attention! Webserver restarted"
: "Attention! Webserver is down. can't restart";
# email the monitoring person
my $to = $monitor_email;
my $from = $monitor_email;
send_mail($from, $to, $subject, $message);
# input: URL to check
# output: 1 for success, 0 for failure
#######################
sub checkurl {
my($url) = @_;
# Fetch document
my $res = $ua->request(HTTP::Request->new(GET => $url));
# Check the result status
return 1 if $res->is_success;
# failed
return 0;
}
# send email about the problem
#######################
sub send_mail {
my($from, $to, $subject, $messagebody) = @_;
open MAIL, "|$mail_program"
or die "Can't open a pipe to a $mail_program :$!\n";
print MAIL <<_ _END_OF_MAIL_ _;
To: $to
From: $from
Subject: $subject
$messagebody
--
Your faithful watchdog
_ _END_OF_MAIL_ _
close MAIL or die "failed to close |$mail_program: $!";
}
Of course, you may want to replace a call to
sendmail with Mail::Send,
Net::SMTP code, or some other preferred email-sending technique.
5.9. Three-Tier Server Scheme: Development, Staging, and Production