The next step after building and installing a mod_perl-enabled Apache
server is to configure it. This is done in two distinct steps:
getting the server running with a standard Apache configuration, and
then applying mod_perl-specific configuration directives to get the
full benefit out of it.
For readers who haven't previously been exposed to
the Apache web server, our discussion begins with standard Apache
directives and then continues with mod_perl-specific material.
The startup.pl file can be used in many ways to
improve performance. We will talk about all these issues later in the
book. In this chapter, we discuss the configuration possibilities
that the startup.pl file gives us.
<Perl>sections are a great time saver if
you have complex configuration files. We'll talk
about <Perl>sections in this chapter.
Another important issue we'll cover in this chapter
is how to validate the configuration file. This is especially
important on a live production server. If we break something and
don't validate it, the server won't
restart. This chapter discusses techniques to prevent validation
problems.
At the end of this chapter, we discuss various tips and tricks you
may find useful for server configuration, talk about a few security
concerns related to server configuration, and finally look at a few
common pitfalls people encounter when they misconfigure their
servers.
4.1. Apache Configuration
Apache configuration can be confusing. To minimize the number of
things that can go wrong, it's a good idea to first
configure Apache itself without mod_perl. So before we go into
mod_perl configuration, let's look at the basics of
Apache itself.
4.1.1. Configuration Files
Prior to Version
1.3.4, the default Apache installation used three configuration
files: httpd.conf,
srm.conf, and access.conf.
Although there were historical reasons for having three separate
files (dating back to the NCSA server), it stopped mattering which
file you used for what a long time ago, and the Apache team finally
decided to combine them. Apache Versions 1.3.4 and later are
distributed with the configuration directives in a single file,
httpd.conf.
Therefore, whenever we mention a configuration file, we are referring
to httpd.conf.
By default, httpd.conf is
installed in the
conf directory under the server root directory.
The default server root is /usr/local/apache/ on
many Unix platforms, but it can be any directory of your choice
(within reason). Users new to Apache and mod_perl will probably find
it helpful to keep to the directory layouts we use in this book.
There is also a special file called
.htaccess, used for per-directory
configuration. When Apache tries to access a file on the filesystem,
it will first search for .htaccess files in the
requested file's parent directories. If found,
Apache scans .htaccess for further configuration
directives, which it then applies only to that directory in which the
file was found and its subdirectories. The name
.htaccess is confusing, because it can contain
almost any configuration directives, not just those related to
resource access control. Note that if the following directive is in
httpd.conf:
<Directory />
AllowOverride None
</Directory>
Apache will not look for .htaccess at all unless
AllowOverride is set to a value other than
None in a more specific
<Directory>section.
.htaccess can be renamed by using the
AccessFileName directive. The following example
configures Apache to look in the target directory for a file called
.acl instead of .htaccess:
AccessFileName .acl
However, you must also make sure that this file
can't be accessed directly from the Web, or else you
risk exposing your configuration. This is done automatically for
.ht* files by Apache, but for other files you
need to use:
<Files .acl>
Order Allow,Deny
Deny from all
</Files>
Another often-mentioned file is the startup file, usually named
startup.pl. This file contains Perl code that will be
executed at server startup. We'll discuss the
startup.pl file in greater detail later in this
chapter, in Section 4.3.
Beware of editing httpd.conf
without understanding all the implications.
Modifying the configuration file and adding new directives can
introduce security problems and have performance implications. If you
are going to modify anything, read through the documentation
beforehand. The Apache distribution comes with an extensive
configuration manual. In addition, each section of the distributed
configuration file includes helpful comments explaining how each
directive should be configured and what the default values are.
If you haven't moved Apache's
directories around, the installation program will configure
everything for you. You can just start the server and test it. To
start the server, use the apachectl utility
bundled with the Apache distribution. It resides in the same
directory as httpd, the Apache
server itself. Execute:
panic% /usr/local/apache/bin/apachectl start
Now you can test the server, for example by accessing
http://localhost/ from a browser running on the
same host.
4.1.2. Configuration Directives
A basic setup
requires little configuration.
If you moved any directories after Apache was installed, they should
be updated in httpd.conf. Here are just a couple
of examples:
You can change the port to which the server is bound by editing the
Port directive. This example sets the port to 8080
(the default for the HTTP protocol is 80):
Port 8080
You might want to change the user and group names under which the
server will run. If Apache is started by the user
root (which is generally the case), the parent
process will continue to run as root, but its
children will run as the user and group specified in the
configuration, thereby avoiding many potential security problems.
This example uses the httpd user and group:
User httpd
Group httpd
Make sure that the user and group httpd already
exist. They can be created using useradd(1) and
groupadd(1) or equivalent utilities.
Many other directives may need to be configured as well. In addition
to directives that take a single value, there are whole sections of
the configuration (such as the <Directory>
and <Location>sections) that apply to only
certain areas of the web space. The httpd.conf
file supplies a few examples, and these will be discussed shortly.
4.1.3. <Directory>, <Location>, and <Files> Sections
Let's discuss the basics of the
<Directory>,
<Location>, and
<Files>sections. Remember that there is
more to know about them than what we list here, and the rest of the
information is available in the Apache documentation. The information
we'll present here is just what is important for
understanding mod_perl configuration.
Apache considers directories and files on the machine it runs on as
resources. A particular behavior can be
specified for each resource; that behavior will apply to every
request for information from that particular resource.
Directives in <Directory>
sections apply to specific directories on the host machine, and those
in <Files>
sections apply only to specific files (actually, groups of files with
names that have something in common).
<Location> sections
apply to specific URIs. Locations are given relative to the document
root, whereas directories are given as absolute paths starting from
the filesystem root (/). For example, in the
default server directory layout where the server root is
/usr/local/apache and the document root is
/usr/local/apache/htdocs, files under the
/usr/local/apache/htdocs/pub directory can be
referred to as:
Exercise caution when using <Location> under
Win32. The Windows family of operating systems are case-insensitive.
In the above example, configuration directives specified for the
location /pub on a case-sensitive Unix machine
will not be applied when the request URI is
/Pub. When URIs map to existing files, such as
Apache::Registryscripts, it is safer to use the
<Directory> or
<Files> directives, which correctly
canonicalize filenames according to local filesystem semantics.
It is up to you to decide which directories on your host machine are
mapped to which locations. This should be done with care, because the
security of the server may be at stake. In particular, essential
system directories such as /etc/
shouldn't be mapped to locations accessible through
the web server. As a general rule, it might be best to organize
everything accessed from the Web under your
ServerRoot, so that it stays organized and you
can keep track of which directories are actually accessible.
Locations do not necessarily have to refer to existing physical
directories, but may refer to virtual resources that the server
creates upon a browser request. As you will see, this is often the
case for a mod_perl server.
When a client (browser)
requests a resource (URI plus optional
arguments) from the server, Apache determines from its configuration
whether or not to serve the request, whether to pass the request on
to another server, what (if any) authentication and authorization is
required for access to the resource, and which module(s) should be
invoked to generate the response.
For any given resource, the various sections in the configuration may
provide conflicting information. Consider, for example, a
<Directory>section that specifies that
authorization is required for access to the resource, and a
<Files>section that says that it is not. It
is not always obvious which directive takes precedence in such cases.
This can be a trap for the unwary.
Scope: Can appear in server and virtual host
configurations.
<Directory> and
</Directory> are used to enclose a group
of
directives that will apply to only the named directory and its
contents, including any subdirectories. Any directive that is allowed
in a directory context (see the Apache documentation) may be used.
The path given in the <Directory> directive
is either the full path to a directory, or a string containing
wildcard characters (also called globs). In the
latter case, ? matches any single character,
* matches any sequence of characters, and
[ ] matches character ranges. These are similar to
the wildcards used by sh and similar shells. For
example:
matches /home/httpd/docs and applies to all its
subdirectories.
Matching a regular expression is done by using the
<DirectoryMatch regex> ...
</DirectoryMatch> or <Directory
~ regex> ... </Directory>syntax. For example:
will match /home/www/foo/public but not
/home/www/foo/private. In a regular expression,
.* matches any character (represented by
.) zero or more times (represented by
*). This is entirely different from the
shell-style wildcards used by the
<Directory> directive. They make it easy to
apply a common configuration to a set of public directories. As
regular expressions are more flexible than globs, this method
provides more options to the experienced user.
If multiple (non-regular expression)
<Directory>sections match the directory (or
its parents) containing a document, the directives are applied in the
order of the shortest match first, interspersed with the directives
from any .htaccess files. Consider the following
configuration:
Let us detail the steps Apache goes through when it receives a
request for the file
/home/httpd/docs/index.html:
Apply the directive AllowOverride None (disabling
.htaccess files).
Apply the directive AllowOverride FileInfo for the
directory /home/httpd/docs/ (which now enables
.htaccess in
/home/httpd/docs/ and its subdirectories).
Apply any directives in the group FileInfo, which
control document types (AddEncoding,
AddLanguage, AddType,
etc.—see the Apache documentation for more information) found
in /home/httpd/docs/.htaccess.
4.1.3.2. <Files filename > ... </Files>
Scope: Can appear in server and virtual host
configurations, as well as in .htaccess files.
The <Files> directive provides access control by
filename and is comparable to the
<Directory> and
<Location> directives.
<Files>should be closed with the
corresponding </Files>. The directives
specified
within this section will be applied to any object with a basename
matching the specified filename. (A basename is the last component of
a path, generally the name of the file.)
<Files>sections are processed in the order
in which they appear in the configuration file, after the
<Directory>sections and
.htaccess files are read, but before
<Location>sections. Note that
<Files> can be nested inside
<Directory>sections to restrict the portion
of the filesystem to which they apply. However,
<Files> cannot be nested inside
<Location>sections.
The filename argument should include a filename or a wildcard string,
where ? matches any single character and
* matches any sequence of characters, just as with
<Directory>sections. Extended regular
expressions can also be used, placing a tilde character
(~) between the directive and the regular
expression. The regular expression should be in quotes. The dollar
symbol ($) refers to the end of the string. The
pipe character (|) indicates alternatives, and
parentheses (()) can be used for grouping. Special
characters in extended regular expressions must be escaped with
backslashes (\). For example:
would match all the files ending with the .pl or
.cgi extension (most likely Perl scripts).
Alternatively, the <FilesMatch regex> ...
</FilesMatch>syntax can be used.
Regular Expressions
There is much more to regular expressions than what we have shown you
here. As a Perl programmer, learning to use regular expressions is
very important, and what you can learn there will be applicable to
your Apache configuration too.
See the perlretut manpage and the book
Mastering Regular Expressions by Jeffrey E. F.
Friedl (O'Reilly) for more information.
4.1.3.3. <Location URI> ... </Location>
Scope: Can appear in server and virtual host
configurations.
The <Location>directive
provides for directive scope limitation by URI. It is similar to the
<Directory> directive and starts a section
that is terminated with the </Location>
directive.
<Location>sections are processed in the
order in which they appear in the configuration file, after the
<Directory>sections,
.htaccess files, and
<Files>sections have been interpreted.
The <Location>section is the directive that
is used most often with mod_perl.
Note that URIs do not have to refer to real directories or files
within the filesystem at all; <Location>
operates completely outside the filesystem. Indeed, it may sometimes
be wise to ensure that <Location>s do not
match real paths, to avoid confusion.
The URI may use wildcards. In a wildcard string, ?
matches any single character, * matches any
sequences of characters, and [ ] groups characters
to match. For regular expression matches, use the
<LocationMatch regex> ...
</LocationMatch>syntax.
The <Location> functionality is especially
useful when combined with the SetHandler
directive. For example, to enable server status requests (via
mod_status) but allow them only from browsers at
*.example.com, you might use:
<Location /status>
SetHandler server-status
Order Deny,Allow
Deny from all
Allow from .example.com
</Location>
As you can see, the /status path does not exist
on the filesystem, but that doesn't matter because
the filesystem isn't consulted for this
request—it's passed on directly to mod_status.
4.1.4. Merging <Directory>, <Location>, and <Files> Sections
When configuring the server,
it's important to understand the order in which the
rules of each section are applied to requests. The order of merging
is:
<Directory> (except for regular expressions)
and .htaccess are processed simultaneously, with
the directives in .htaccess overriding
<Directory>.
<DirectoryMatch> and <Directory
~ > with regular expressions are processed next.
<Files> and
<FilesMatch> are processed simultaneously.
<Location> and
<LocationMatch> are processed
simultaneously.
Apart from <Directory>, each group is
processed in the order in which it appears in the configuration
files. <Directory>s (group 1 above) are
processed in order from the shortest directory component to the
longest (e.g., first / and only then
/home/www). If multiple
<Directory>sections apply to the same
directory, they are processed in the configuration file order.
Sections inside <VirtualHost>sections are
applied as if you were running several independent servers. The
directives inside one <VirtualHost>section
do not interact with directives in other
<VirtualHost>sections. They are applied
only after processing any sections outside the virtual host
definition. This allows virtual host configurations to override the
main server configuration.
If there is a conflict, sections found later in the configuration
file override those that come earlier.
4.1.5. Subgrouping of <Directory>, <Location>, and <Files> Sections
Let's say that you want all files to be
handled the same way, except for a few of the files in a specific
directory and its subdirectories. For example, say you want all the
files in /home/httpd/docs to be processed as
plain files, but any files ending with .html and
.txt to be processed by the content handler of
the Apache::Compress module (assuming that you are
already running a mod_perl server):
The + before Apache::Compress
tells mod_perl to load the Apache::Compress module
before using it, as we will see later.
Using <FilesMatch>,
it is possible to embed sections inside other sections to create
subgroups that have their own distinct behavior. Alternatively, you
could also use a <Files>section inside an
.htaccess file.
Note that you can't put
<Files> or
<FilesMatch>sections inside a
<Location>section, but you can put them
inside a <Directory>section.
4.1.6. Options Directive Merging
Normally, if multiple Options
directives apply to a directory, the most specific one is taken
completely; the options are not merged.
However, if all the options on the Options
directive are preceded by either a + or
-symbol, the options are merged. Any options
preceded by + are added to the options currently
active, and any options preceded by - are removed.
Indexes and FollowSymLinks will
be set for /home/httpd/docs/, but only
Includes will be set for the
/home/httpd/docs/shtml/ directory. However, if
the second Options directive uses the
+ and -symbols:
then the options FollowSymLinks and
Includes will be set for the
/home/httpd/docs/shtml/ directory.
4.1.7. MinSpareServers, MaxSpareServers, StartServers, MaxClients, and MaxRequestsPerChild
MinSpareServers,
MaxSpareServers,
StartServers, and
MaxClients are
standard Apache configuration
directives that control the number of servers being launched at
server startup and kept alive during the server's
operation. When Apache starts, it spawns
StartServers child processes. Apache makes sure
that at any given time there will be at least
MinSpareServers but no more than
MaxSpareServers idle servers. However, the
MinSpareServers rule is completely satisfied only
if the total number of live servers is no bigger than
MaxClients.
MaxRequestsPerChild lets you specify the maximum
number of requests to be served by each child. When a process has
served MaxRequestsPerChild requests, the parent
kills it and replaces it with a new one. There may also be other
reasons why a child is killed, so each child will not necessarily
serve this many requests; however, each child will not be allowed to
serve more than this number of requests. This feature is handy to
gain more control of the server, and especially to avoid child
processes growing too big (RAM-wise) under mod_perl.
These five directives are very important for getting the best
performance out of your server. The process of tuning these variables
is described in great detail in Chapter 11.