When
you're
developing code on a development server, anything goes: modifying the
configuration, adding or upgrading Perl modules without checking that
they are syntactically correct, not checking that Perl modules
don't collide with other modules, adding
experimental new modules from CPAN, etc. If something goes wrong,
configuration changes can be rolled back (assuming
you're using some form of version control), modules
can be uninstalled or reinstalled, and the server can be started and
stopped as many times as required to get it working.
Of course, if there is more than one developer working on a
development server, things can't be quite so
carefree. Possible solutions for the problems that can arise when
multiple developers share a development server will be discussed
shortly.
The most difficult situation is transitioning changes to a live
server. However much the changes have been tested on a development
server, there is always the risk of breaking something when a change
is made to the live server. Ideally, any changes should be made in a
way that will go unnoticed by the users, except as new or improved
functionality or better performance. No users should be exposed to
even a single error message from the upgraded
service—especially not the "database
busy" or "database
error" messages that some high-profile sites seem to
consider acceptable.
Live services can be divided into two categories: servers that must
be up 24 hours a day and 7 days a week, and servers that can be
stopped during non-working hours. The latter generally applies to
Intranets of companies with offices located more or less in the same
time zone and not scattered around the world. Since the Intranet
category is the easier case, let's talk about it
first.
5.8.1. Upgrading Intranet Servers
An Intranet server generally serves the
company's internal staff by allowing them to share
and distribute internal information, read internal email, and perform
other similar tasks. When all the staff is located in the same time
zone, or when the time difference between sites does not exceed a few
hours, there is often no need for the server to be up all the time.
This doesn't necessarily mean that no one will need
to access the Intranet server from home in the evenings, but it does
mean that the server can probably be stopped for a few minutes when
it is necessary to perform some maintenance work.
Even if the update of a live server occurs during working hours and
goes wrong, the staff will generally tolerate the inconvenience
unless the Intranet has become a really mission-critical tool. For
servers that are mission critical, the following
section will describe the least disruptive and safest upgrade
approach.
If possible, any administration or upgrades of the
company's Intranet server should be undertaken
during non-working hours, or, if this is not possible, during the
times of least activity (e.g., lunch time). Upgrades that are carried
out while users are using the service should be done with a great
deal of care.
In very large organizations, upgrades are often scheduled events and
employees are notified ahead of time that the service might not be
available. Some organizations deem these periods
"at-risk" times, when employees are
expected to use the service as little as possible and then only for
noncritical work. Again, these major updates are generally scheduled
during the weekends and late evening hours.
The next section deals with this issue for services that need to be
available all the time.
5.8.2. Upgrading 24 × 7 Internet Servers
Internet servers are normally expected to be available 24 hours a
day, 7 days a week. E-commerce sites, global B2B
(business-to-business) sites, and any other revenue-producing sites
may be critical to the companies that run them, and their
unavailability could prove to be very expensive. The approach taken
to ensure that servers remain in service even when they are being
upgraded depends on the type of server in use. There are two
categories to consider: server clusters and
single servers.
5.8.2.1. The server cluster
When a service is very popular, a single
machine probably will not be able to keep up with the number of
requests the service has to handle. In this situation, the solution
is to add more machines and to distribute the load amongst them. From
the user's point of view, the use of multiple
servers must be completely transparent; users must still have a
single access point to the service (i.e., the same single URL) even
though there may be many machines with different server names
actually delivering the service. The requests must also be properly
distributed across the machines: not simply by giving equal numbers
of requests to each machine, but rather by giving each machine a load
that reflects its actual capabilities, given that not all machines
are built with identical hardware. This leads to the need for some
smart load-balancing techniques.
All current load-balancing techniques are based on a central machine
that dispatches all incoming requests to machines that do the real
processing. Think of it as the only entrance into a building with a
doorkeeper directing people into different rooms, all of which have
identical contents but possibly a different number of clerks.
Regardless of what room they're directed to, all
people use the entrance door to enter and exit the building, and an
observer located outside the building cannot tell what room people
are visiting. The same thing happens with the cluster of
servers—users send their browsers to URLs, and back come the
pages they requested. They remain unaware of the particular machines
from which their browsers collected their pages.
No matter what load-balancing technique is used, it should always be
straightforward to be able to tell the central machine that a new
machine is available or that some machine is not available any more.
How does this long introduction relate to the upgrade problem?
Simple. When a particular machine requires upgrading, the dispatching
server is told to stop sending requests to that machine. All the
requests currently being executed must be left to complete, at which
point whatever maintenance and upgrade work is to be done can be
carried out. Once the work is complete and has been tested to ensure
that everything works correctly, the central machine can be told that
it can again send requests to the newly upgraded machine. At no point
has there been any interruption of service or any indication to users
that anything has occurred. Note that for some services, particularly
ones to which users must log in, the wait for all the users to either
log out or time out may be considerable. Thus, some sites stop
requests to a machine at the end of the working day, in the hope that
all requests will have completed or timed out by the morning.
How do we talk to the central machine? This depends on the
load-balancing technology that is implemented and is beyond the scope
of this book. The references section at the end of this chapter gives
a list of relevant online resources.
5.8.2.2. The single server
It's not uncommon for a popular web site to run
on a single machine. It's also common for a web site
to run on multiple machines, with one machine dedicated to serving
static objects (such as images and static HTML files), another
serving dynamically generated responses, and perhaps even a third
machine that acts as a dedicated database server.
Therefore, the situation that must be addressed is where just one
machine runs the service or where the service is spread over a few
machines, with each performing a unique task, such that no machine
can be shut down even for a single minute, and leaving the service
unavailable for more than five seconds is unacceptable. In this case,
two different tasks may be required: upgrading the software on the
server (including the Apache server), and upgrading the code of the
service itself (i.e., custom modules and scripts).
5.8.2.2.1. Upgrading live server components by swapping machines
There are many things that you might need to
update on a server, ranging from a major upgrade of the operating
system to just an update of a single piece of software (such as the
Apache server itself).
One simple approach to performing an upgrade painlessly is to have a
backup machine, of similar capacity and identical configuration, that
can replace the production machine while the upgrade is happening. It
is a good idea to have such a machine handy and to use it whenever
major upgrades are required. The two machines must be kept
synchronized, of course. (For Unix/Linux users, tools such as
rsync and mirror can be
used for synchronization.)
However, it may not be necessary to have a special machine on standby
as a backup. Unless the service is hosted elsewhere and you
can't switch the machines easily, the development
machine is probably the best choice for a backup—all the
software and scripts are tested on the development machine as a
matter of course, and it probably has a software setup identical to
that of the production machine. The development machine might not be
as powerful as the live server, but this may well be acceptable for a
short period, especially if the upgrade is timed to happen when the
site's traffic is fairly quiet.
It's much better to have a slightly slower service
than to close the doors completely. A web log analysis tool such as
analog can be used to determine the hour of the
day when the server is under the least load.
Switching between the two machines is very simple:
Shut down the network on the backup machine.
Configure the backup machine to use the same IP address and domain
name as the live machine.
Shut down the network on the live machine (do not shut down the
machine itself!).
Start up the network on the backup machine.
When you are certain that the backup server has successfully replaced
the live server (that is, requests are being serviced, as revealed by
the backup machine's
access_log), it is safe to switch off the master
machine or do any necessary upgrades.
Why bother waiting to check that everything is working correctly with
the backup machine? If something goes wrong, the change can
immediately be rolled back by putting the known working machine back
online. With the service restored, there is time to analyze and fix
the problem with the replacement machine before trying it again.
Without the ability to roll back, the service may be out of operation
for some time before the problem is solved, and users may become
frustrated.
We recommend that you practice this technique with two unused
machines before using the production boxes.
After the backup machine has been put into service and the original
machine has been upgraded, test the original machine. Once the
original machine has been passed as ready for service, the server
replacement technique described above should be repeated in reverse.
If the original machine does not work correctly once returned to
service, the backup machine can immediately be brought online while
the problems with the original are fixed.
You cannot have two machines configured to use the same IP address,
so the first machine must release the IP address by shutting down the
link using this IP before the second machine can enable its own link
with the same IP address. This leads to a short downtime during the
switch. You can use the heartbeat utility to
automate this process and thus possibly shorten the downtime period.
See the references section at the end of this chapter for more
information about heartbeat.
5.8.2.2.2. Upgrading a live server with port forwarding
Usingmore than one machine to perform an
update may not be convenient, or even possible. An alternative
solution is to use the port-forwarding capabilities of the
host's operating system.
One approach is to configure the web server to listen on an
unprivileged port, such as 8000, instead of 80. Then, using a
firewalling tool such as iptables,
ipchains, or ipfwadm,
redirect all traffic coming for port 80 to port 8000. Keeping a rule
like this enabled at all times on a production machine will not
noticeably affect performance.
Once this rule is in place, it's a matter of getting
the new code in place, adjusting the web server configuration to
point to the new location, and picking a new unused port, such as
8001. This way, you can start the
"new" server listening on that port
and not affect the current setup.
To check that everything is working, you could test the server by
accessing it directly by port number. However, this might break links
and redirections. Instead, add another port forwarding rule before
the first one, redirecting traffic for port 80 from your test machine
or network to port 8001.
Once satisfied with the new server, publishing the change is just a
matter of changing the port-forwarding rules one last time. You can
then stop the now old server and everything is done.
Now you have your primary server listening on port 8001, answering
requests coming in through port 80, and nobody will have noticed the
change.
5.8.2.2.3. Upgrading a live server with prepackaged components
Assuming that the testbed machine and the
live server have an identical software installation, consider
preparing an upgrade package with the components that must be
upgraded. Test this package on the testbed machine, and when it is
evident that the package gets installed flawlessly, install it on the
live server. Do not build the software from scratch on the live
server, because if a mistake is made, it could cause the live server
to misbehave or even to fail.
For example, many Linux distributions use the Red Hat Package Manager
(RPM) utility, rpm, to distribute source and
binary packages. It is not necessary for a binary package to include
any compiled code (for example, it can include Perl scripts, but it
is still called a binary). A binary package allows the new or
upgraded software to be used the moment you install it. The
rpm utility is smart enough to make upgrades
(i.e., remove previous installation files, preserve configuration
files, and execute appropriate installation scripts).
If, for example, the mod_perl server needs to be upgraded, one
approach is to prepare a package on a similarly configured machine.
Once the package has been built, tested, and proved satisfactory, it
can then be transferred to the live machine. The
rpm utility can then be used to upgrade the
mod_perl server. For example, if the package file is called
mod_perl-1.26-10.i386.rpm, this command:
panic% rpm -Uvh mod_perl-1.26-10.i386.rpm
will remove the previous server (if any) and install the new one.
There's no problem upgrading software that
doesn't break any dependencies in other packages, as
in the above example. But what would happen if, for example, the Perl
interpreter needs to be upgraded on the live machine?
If the mod_perl package described earlier was properly prepared, it
would specify the packages on which it depends and their versions. So
if Perl was upgraded using an RPM package, the
rpm utility would detect that the upgrade would
break a dependency, since the mod_perl package is supposed to work
with the previous version of Perl. rpm will not
allow the upgrade unless forced to.
This is a very important feature of RPM. Of course, it relies on the
fact that the person who created the package has set all the
dependencies correctly. Do not trust packages downloaded from the
Web. If you have to use an RPM package prepared by someone else, get
its source, read its specification file, and make doubly sure that
it's what you want.
The Perl upgrade task is in fact a very easy problem to solve. Have
two packages ready on the development machine: one for Perl and the
other for mod_perl, the latter built using the Perl version that is
going to be installed. Upload both of them to the live server and
install them together. For example:
This should be done as an atomic
operation—i.e., as a single execution of the
rpm program. If the installation of the packages
is attempted with separate commands, they will both fail, because
each of them will break some dependency.
If a mistake is made and checks reveal that a faulty package has been
installed, it is easy to roll back. Just make sure that the previous
version of the properly packaged software is available. The packages
can be downgraded by using the —force
option—and voilà, the previously working system is
restored. For example:
Although this example uses the rpm utility,
other similar utilities exist for various operating systems and
distributions. Creating packages provides a simple way of upgrading
live systems (and downgrading them if need be). The packages used for
any successful upgrade should be kept, because they will become the
packages to downgrade to if a subsequent upgrade with a new package
fails.
When using a cluster of machines with identical setups, there is
another important benefit of prepackaged upgrades. Instead of doing
all the upgrades by hand, which could potentially involve dozens or
even hundreds of files, preparing a package can save lots of time and
will minimize the possibility of error. If the packages are properly
written and have been tested thoroughly, it is perfectly possible to
make updates to machines that are running live services. (Note that
not all operating systems permit the upgrading of running software.
For example, Windows does not permit DLLs that are in active use to
be updated.)
It should be noted that the packages referred to in this discussion
are ones made locally, specifically for the systems to be upgraded,
not generic packages downloaded from the Internet. Making local
packages provides complete control over what is installed and
upgraded and makes upgrades into atomic actions that can be rolled
back if necessary. We do not recommend using third-party packaged
binaries, as they will almost certainly have been built for a
different environment and will not have been fine-tuned for your system.
5.8.2.2.4. Upgrading a live server using symbolic links
Yet another alternative is to use symbolic links for
upgrades. This concept is quite simple: install a package into some
directory and symlink to it. So, if some software was expected in the
directory /usr/local/foo, you could simply
install the first version of the software in the directory
/usr/local/foo-1.0 and point to it from the
expected directory:
panic# ln -sf /usr/local/foo-1.0 /usr/local/foo
If later you want to install a second version of the software,
install it into the directory /usr/local/foo-2.0
and change the symbolic link to this new directory:
panic# ln -sf /usr/local/foo-2.0 /usr/local/foo
Now if something goes wrong, you can always switch back with:
panic# ln -sf /usr/local/foo-1.0 /usr/local/foo
In reality, things aren't as simple as in this
example. It works if you can place all the software components under
a single directory, as with the default Apache installation.
Everything is installed under a single directory, so you can have:
/usr/local/apache-1.3.17
/usr/local/apache-1.3.19
and use the symlink /usr/local/apache to switch
between the two versions.
However, if you use a default installation of Perl, files are spread
across multiple directories. In this case, it's not
easy to use symlinks—you need several of them, and
they're hard to keep track of. Unless you automate
the symlinks with a script, it might take a while to do a switch,
which might mean some downtime. Of course, you can install all the
Perl components under a single root, just like the default Apache
installation, which simplifies things.
Another complication with upgrading Perl is that you may need to
recompile mod_perl and other Perl third-party modules that use
XS extensions. Therefore, you probably want to
build everything on some other machine, test it, and when ready, just
untar everything at once on the production
machine and adjust the symbolic links.
5.8.2.2.5. Upgrading Perl code
Although new
versions of mod_perl and Apache may not be released for months at a
time and the need to upgrade them may not be pressing, the handlers
and scripts being used at a site may need regular tweaks and changes,
and new ones may be added quite frequently.
Of course, the safest and best option is to prepare an RPM (or
equivalent) package that can be used to automatically upgrade the
system, as explained in the previous section. Once an RPM
specification file has been written (a task that might take some
effort), future upgrades will be much less time consuming and have
the advantage of being very easy to roll back.
But if the policy is to just overwrite files by hand, this section
will explain how to do so as safely as possible.
All code should be thoroughly tested on a development machine before
it is put on the live server, and both machines must have an
identical software base (i.e., the same versions of the operating
system, Apache, any software that Apache and mod_perl depend on,
mod_perl itself, and all Perl modules). If the versions do not match,
code that works perfectly on the development machine might not work
on the live server.
For example, we have encountered a problem when the live and
development servers were using different versions of the MySQL
database server. The new code took advantage of new features added in
the version installed on the development machine. The code was tested
and shown to work correctly on the development machine, and when it
was copied to the live server it seemed to work fine. Only by chance
did we discover that scripts did not work correctly when the new
features were used.
If the code hadn't worked at all, the problem would
have been obvious and been detected and solved immediately, but the
problem was subtle. Only after a thorough analysis did we understand
that the problem was that we had an older version of the MySQL server
on the live machine. This example reminded us that all modifications
on the development machine should be logged and the live server
updated with all of the modifications, not just the new version of
the Perl code for a project.
We solved this particular problem by immediately reverting to the old
code, upgrading the MySQL server on the live machine, and then
successfully reapplying the new code.
5.8.2.2.6. Moving files and restarting the server
Now let's
discuss the techniques used to upgrade live server scripts and
handlers.
The most common scenario is a live running service that needs to be
upgraded with a new version of the code. The new code has been
prepared and uploaded to the production server, and the server has
been restarted. Unfortunately, the service does not work anymore.
What could be worse than that? There is no way back, because the
original code has been overwritten with the new but non-working code.
Another scenario is where a whole set of files is being transferred
to the live server but some network problem has occurred in the
middle, which has slowed things down or totally aborted the transfer.
With some of the files old and some new, the service is most likely
broken. Since some files were overwritten, you can't
roll back to the previously working version of the service.
No matter what file transfer technique is used, be it FTP, NFS, or
anything else, live running code should never be directly overwritten
during file transfer. Instead, files should be transferred to a
temporary directory on the live machine, ready to be moved when
necessary. If the transfer fails, it can then be restarted safely.
Both scenarios can be made safer with two approaches. First, do not
overwrite working files. Second, use a revision control system such
as CVS so that changes to working code can easily be undone if the
working code is accidentally overwritten. Revision control will be
covered later in this chapter.
We recommend performing all updates on the live server in the
following sequence. Assume for this example that the
project's code directory is
/home/httpd/perl/rel. When
we're about to update the files, we create a new
directory, /home/httpd/perl/test, into which we
copy the new files. Then we do some final sanity checks: check that
file permissions are readable and executable for the user the server
is running under, and run perl -Tcw on the new
modules to make sure there are no syntax errors in them.
To save some typing, we set up some aliases for some of the
apachectl commands and for
tailing the error_log file:
panic% alias graceful /home/httpd/httpd_perl/bin/apachectl graceful
panic% alias restart /home/httpd/httpd_perl/bin/apachectl restart
panic% alias start /home/httpd/httpd_perl/bin/apachectl start
panic% alias stop /home/httpd/httpd_perl/bin/apachectl stop
panic% alias err tail -f /home/httpd/httpd_perl/logs/error_log
Finally, when we think we are ready, we do:
panic% cd /home/httpd/perl
panic% mv rel old && mv test rel && stop && sleep 3 && restart && err
Note that all the commands are typed as a single line, joined by
&&, and only at the end should the Enter
key be pressed. The && ensures that if any
command fails, the following commands will not be executed.
The elements of this command line are:
mv rel old &&
Backs up the working directory to old, so none
of the original code is deleted or overwritten
mv test rel &&
Puts the new code in place of the original
stop &&
Stops the server
sleep 3 &&
Allows the server a few seconds to shut down (it might need a longer
sleep)
restart &&
Restarts the server
err
tails the error_log file to
make sure that everything is OK
If mv is overriden by a global alias
mv -i, which requires confirming every action,
you will need to call mv -f to override the
-i option.
When updating code on a remote machine, it's a good
idea to prepend nohup to the beginning of the
command line:
panic% nohup mv rel old && mv test rel && stop && sleep 3 && restart && err
This approach ensures that if the connection is suddenly dropped, the
server will not stay down if the last command that executes is
stop.
apachectl generates its status messages a little
too early. For example, when we execute apachectl
stop, a message saying that the server has been stopped is
displayed, when in fact the server is still running. Similarly, when
we execute apachectl start, a message is
displayed saying that the server has been started, while it is
possible that it hasn't yet. In both cases, this
happens because these status messages are not generated by Apache
itself. Do not rely on them. Rely on the
error_log file instead, where the running Apache
server indicates its real status.
Also note that we use restart and not just
start. This is because of
Apache's potentially long stopping times if it has
to run lots of destruction and cleanup code on exit. If
start is used and Apache has not yet released
the port it is listening to, the start will fail and the
error_log will report that the port is in use.
For example:
Address already in use: make_sock: could not bind to port 8000
However, if restart is used,
apachectl will wait for the server to quit and
unbind the port and will then cleanly restart it.
Now, what happens if the new modules are broken and the newly
restarted server reports problems or refuses to start at all?
The aliased err command executes tail
-f on the error_log, so that the
failed restart or any other problems will be immediately apparent.
The situation can quickly and easily be rectified by returning the
system to its pre-upgrade state with this command:
panic% mv rel bad && mv old rel && stop && sleep 3 && restart && err
This command line moves the new code to the directory
bad, moves the original code back into the
runtime directory rel, then stops and restarts
the server. Once the server is back up and running, you can analyze
the cause of the problem, fix it, and repeat the upgrade again.
Usually everything will be fine if the code has been extensively
tested on the development server. When upgrades go smoothly, the
downtime should be only about 5-10 seconds, and most users will not
even notice anything has happened.
5.8.2.2.7. Using CVS for code upgrades
The Concurrent Versions System (CVS) is an open source version-control
system that allows multiple developers to work on code or
configuration in a central repository while tracking any changes
made. We use it because it's the dominant open
source tool, but it's not the only possibility:
commercial tools such as Perforce would also work for these purposes.
If you aren't familiar with CVS, you can learn about
it from the resources provided at the end of this chapter. CVS is too
broad a topic to be covered in this book. Instead, we will
concentrate on the CVS techniques that are relevant to our purpose.
Things are much simpler when using CVS for server updates, especially
since it allows you to tag each production release. By
tagging files, we mean having a group of files
under CVS control share a common label. Like RCS and other
revision-control systems, CVS gives each file its own version number,
which allows us to manipulate different versions of this file. But if
we want to operate on a group of many files, chances are that they
will have different version numbers. Suppose we want to take
snapshots of the whole project so we can refer to these snapshots
some time in the future, after the files have been modified and their
respective version numbers have been changed. We can do this using
tags.
To tag the project whose module name is
myproject, execute the following from any
directory on any machine:
panic% cvs -rtag PRODUCTION_1_20 myproject
Now when the time comes to update the online version, go to the
directory on the live machine that needs to be updated and execute:
panic% cvs update -dP -r PRODUCTION_1_20
The -P option to cvs prunes
empty directories and deleted files, the -d
option brings in new directories and files (like cvs
checkout does), and -r
PRODUCTION_1_20 tells CVS to update the current directory
recursively to the PRODUCTION_1_20 CVS version of
the project.
Suppose that after a while, we have more code updated and we need to
make a new release. The currently running version has the tag
PRODUCTION_1_20, and the new version has the tag
PRODUCTION_1_21. First we tag the files in the
current state with a new tag:
panic% cvs -rtag PRODUCTION_1_21 myproject
and update the live machine:
panic% cvs update -dP -r PRODUCTION_1_21
Now if there is a problem, we can go back to the previous working
version very easily. If we want to get back to version
PRODUCTION_1_20, we can run the command:
panic% cvs update -dP -r PRODUCTION_1_20
As before, the update brings in new files and directories not already
present in the local directory (because of the
-dP options).
Remember that when you use CVS to update the live server, you should
avoid making any minor changes to the code on this server.
That's because of potential collisions that might
happen during the CVS update. If you modify a single line in a single
file and then do cvs update, and someone else
modifies the same line at the same time and commits it just before
you do, CVS will try to merge the changes. If they are different, it
will see a conflict and insert both versions into the file. CVS
leaves it to you to resolve the conflict. If this file is Perl code,
it won't compile and it will cause temporal troubles
until the conflict is resolved. Therefore, the best approach is to
think of live server files as being read-only.
Updating the live code directory should be done only if the update is
atomic—i.e., if all files are updated in a very short period of
time, and when no network problems can occur that might delay the
completion of the file update.
The safest approach is to use CVS in conjunction with the safe code
update technique presented previously, by working with CVS in a
separate directory. When all files are extracted, move them to the
directory the live server uses. Better yet, use symbolic links, as
described earlier in this chapter: when you update the code, prepare
everything in a new directory and, when you're
ready, just change the symlink to point to this new directory. This
approach will prevent cases where only a partial update happens
because of a network or other problem.
The use of CVS needn't apply exclusively to code. It
can be of great benefit for configuration management, too. Just as
you want your mod_perl programs to be identical between the
development and production servers, you probably also want to keep
your httpd.conf files in sync. CVS is well
suited for this task too, and the same methods apply.
5.8.3. Disabling Scripts and Handlers on a Live Server
Perl programs
running
on the mod_perl server may be dependent on resources that can become
temporarily unavailable when they are being upgraded or maintained.
For example, once in a while a database server (and possibly its
corresponding DBD module) may need to be upgraded, rendering it
unusable for a short period of time.
Using the development server as a temporary replacement is probably
the best way to continue to provide service during the upgrade. But
if you can't, the service will be unavailable for a
while.
Since none of the code that relies on the temporarily unavailable
resource will work, users trying to access the mod_perl server will
see either the ugly gray "An Error has
occurred" message or a customized error message (if
code has been added to trap errors and customize the error-reporting
facility). In any case, it's not a good idea to let
users see these errors, as they will make your web site seem
amateurish.
A friendlier approach is to confess to the users that some
maintenance work is being undertaken and plead for patience,
promising that the service will become fully functional in a few
minutes (or however long the scheduled downtime is expected to be).
It is a good idea to be honest and report the real duration of the
maintenance operation, not just "we will be back in
10 minutes." Think of a user (or journalist) coming
back 20 minutes later and still seeing the same message! Make sure
that if the time of resumption of service is given, it is not the
system's local time, since users will be visiting
the site from different time zones. Instead, we suggest using
Greenwich Mean Time (GMT). Most users have some idea of the time
difference between their location and GMT, or can find out easily
enough. Although GMT is known by programmers as Universal Coordinated
Time (UTC), end users may not know what UTC is, so using the older
acronym is probably best.
5.8.3.1. Disabling code running under Apache::Registry
If just a few scripts
need to be disabled temporarily, and if they are running under the
Apache::Registry handler, a maintenance message
can be displayed without messing with the server. Prepare a little
script in /home/httpd/perl/down4maintenance.pl:
#!/usr/bin/perl -Tw
use strict;
print "Content-type: text/plain\n\n",
qq{We regret that the service is temporarily
unavailable while essential maintenance is undertaken.
It is expected to be back online from 12:20 GMT.
Please, bear with us. Thank you!};
Let's say you now want to disable the
/home/httpd/perl/chat.pl script. Just do this:
Now make sure that the script has the current timestamp:
panic% touch /home/httpd/perl/chat.pl
Apache::Registry will automatically detect the
change and use the new script from now on.
This scenario is possible because Apache::Registry
checks the modification time of the script before each invocation. If
the script's file is more recent than the version
already loaded in memory, Apache::Registry reloads
the script from disk.
5.8.3.2. Disabling code running under other handlers
Under non-Apache::Registry
handlers, you need to modify the configuration. You must either point
all requests to a new location or replace the handler with one that
will serve the requests during the maintenance period.
package Book::Maintenance;
use strict;
use Apache::Constants qw(:common);
sub handler {
my $r = shift;
$r->send_http_header("text/plain");
print qq{We regret that the service is temporarily
unavailable while essential maintenance is undertaken.
It is expected to be back online from 12:20 GMT.
Please be patient. Thank you!};
return OK;
}
1;
In practice, the maintenance script may well read the
"back online" time from a variable
set with a PerlSetVar directive in
httpd.conf, so the script itself need never be
changed.
Now restart the server. Users will be happy to read their email for
10 minutes, knowing that they will return to a much improved service.
5.8.3.3. Disabling services with help from the frontend server
Many sites use a more
complicated setup in which a
"light" Apache frontend server
serves static content but proxies all requests for dynamic content to
the "heavy" mod_perl backend server
(see Chapter 12). Those sites can use a third
solution to temporarily disable scripts.
Since the frontend machine rewrites all incoming requests to
appropriate requests for the backend machine, a change to the
RewriteRule is sufficient to take handlers out of
service. Just change the directives to rewrite all incoming requests
(or a subgroup of them) to a single URI. This URI simply tells users
that the service is not available during the maintenance period.
For example, the following
RewriteRule
rewrites all URIs starting with /perl to the
maintenance URI /control/maintain on the
mod_perl server:
The Book::Maintenance handler from the previous
section can be used to generate the response to the URI
/control/maintain.
Make sure that this rule is placed before all the other
RewriteRules so that none of the other rules need
to be commented out. Once the change has been made, check that the
configuration is not broken and restart the server so that the new
configuration takes effect. Now the database server can be shut down,
the upgrade can be performed, and the database server can be
restarted. The RewriteRule that has just been
added can be commented out and the Apache server stopped and
restarted. If the changes lead to any problems, the maintenance
RewriteRule can simply be uncommented while you
sort them out.
Of course, all this is error-prone, especially when the maintenance
is urgent. Therefore, it can be a good idea to prepare all the
required configurations ahead of time, by having different
configuration sections and enabling the right one with the help of
the IfDefine directive during server startup.
The following configuration will make this approach clear:
RewriteEngine On
<IfDefine maintain>
RewriteRule /perl/ http://localhost:8000/control/maintain [P,L]
</IfDefine>
<IfDefine !maintain>
RewriteRule ^/perl/(.*)$ http://localhost:8000/$1 [P,L]
# more directives
</IfDefine>
Now enable the maintenance section by starting the server with:
panic% httpd -Dmaintain
Request URIs starting with /perl/ will be
processed by the /control/maintain handler or
script on the mod_perl side.
If the -Dmaintain option is not passed, the
"normal" configuration will take
effect and each URI will be remapped to the mod_perl server as usual.
Of course, if apachectl or any other script is
used for server control, this new mode should be added so that it
will be easy to make the correct change without making any mistakes.
When you're in a rush, the less typing you have to
do, the better. Ideally, all you'd have to type is:
panic% apachectl maintain
Which will both shut down the server (if it is running) and start it
with the -Dmaintain option. Alternatively, you
could use:
panic% apachectl start_maintain
to start the server in maintenance mode. apachectl
graceful will stop the server and restart it in normal
mode.
5.8.4. Scheduled Routine Maintenance
If
maintenance
tasks can be scheduled when no one is using the server, you can write
a simple
PerlAccessHandler that will
automatically disable the server and return a page stating that the
server is under maintenance and will be back online at a specified
time. When using this approach, you don't need to
worry about fiddling with the server configuration when the
maintenance hour comes. However, all maintenance must be completed
within the given time frame, because once the time is up, the service
will resume.
The Apache::DayLimit module from http://www.modperl.com/ is a good example of
such a module. It provides options for specifying which day server
maintenance occurs. For example, if Sundays are used for maintenance,
the configuration for Apache::DayLimit is as
follows:
It is very easy to adapt this module to do more advanced filtering.
For example, to specify both a day and a time, use a configuration
similar to this: