To understand mod_perl, you should understand how
request processing works within Apache. When Apache receives a
request, it processes it in 11 phases. For every phase, a standard
default handler is supplied by Apache. You can also write your own
Perl handlers for each phase; they will override or extend the
default behavior. The 11 phases (illustrated in Figure 1-4) are:
Figure 1-4. Apache 1.3 request processing phases
Post-read-request
This phase occurs when the server has
read all the incoming request's data and parsed the
HTTP header. Usually, this stage is used to perform something that
should be done once per request, as early as possible.
Modules' authors usually use this phase to
initialize per-request data to be used in subsequent phases.
URI translation
In this phase, the requested URI is
translated to the name of a physical file or the name of a virtual
document that will be created on the fly. Apache performs the
translation based on configuration directives such as
ScriptAlias. This translation can be completely
modified by modules such as mod_rewrite, which
register themselves with Apache to be invoked in this phase of the
request processing.
Header parsing
During this phase, you can examine and
modify the request headers and take a special action if
needed—e.g., blocking unwanted agents as early as possible.
Access control
This phase allows the server owner to
restrict access to specific resources based on various rules, such as
the client's IP address or the day of week.
Authentication
Sometimes
you want to make sure that a user
really is who he claims to be. To verify his identity, challenge him
with a question that only he can answer. Generally, the question is a
login name and password, but it can be any other challenge that
allows you to distinguish between users.
Authorization
The service might have various restricted
areas, and you might want to allow the user to access some of these
areas. Once a user has passed the authentication process, it is easy
to check whether a specific location can be accessed by that user.
MIME type checking
Apache handles requests for different types
of files in different ways. For static HTML files, the content is
simply sent directly to the client from the filesystem. For CGI
scripts, the processing is done by mod_cgi, while for mod_perl
programs, the processing is done by mod_perl and the appropriate Perl
handler. During this phase, Apache actually decides on which method
to use, basing its choice on various things such as configuration
directives, the filename's extension, or an analysis
of its content. When the choice has been made, Apache selects the
appropriate content handler, which will be used in the next phase.
Fixup
This phase is provided to allow
last-minute adjustments to the environment and the request record
before the actual work in the content handler starts.
Response
This is the phase where most of the work
happens. First, the handler that generates the response (a content
handler) sends a set of HTTP headers to the client. These headers
include the Content-type header, which is either
picked by the MIME-type-checking phase or provided dynamically by a
program. Then the actual content is generated and sent to the client.
The content generation might entail reading a simple file (in the
case of static files) or performing a complex database query and
HTML-ifying the results (in the case of the dynamic content that
mod_perl handlers provide).
This is where mod_cgi, Apache::Registry, and other
content handlers run.
Logging
By default, a single line describing
every request is logged into a flat file. Using the configuration
directives, you can specify which bits of information should be
logged and where. This phase lets you hook custom logging
handlers—for example, logging into a relational database or
sending log information to a dedicated master machine that collects
the logs from many different hosts.
Cleanup
At the end of each request, the
modules that participated in one or more previous phases are allowed
to perform various cleanups, such as ensuring that the resources that
were locked but not freed are released (e.g., a process aborted by a
user who pressed the Stop button), deleting temporary files, and so
on.
Each module registers its cleanup code, either in its source code or
as a separate configuration entry.
At almost every phase, if there is an error and the request is
aborted, Apache returns an error code to the client using the default
error handler (or a custom one, if provided).
1.4.1. Apache 1.3 Modules and the mod_perl 1.0 API
The advantage of breaking up the request
process into phases is that Apache gives a programmer the opportunity
to "hook" into the process at any
of those phases. Apache has been designed with modularity in mind. A
small set of core functions handle the basic tasks of dealing with
the HTTP protocol and managing child processes. Everything else is
handled by modules. The core supplies an easy way to plug modules
into Apache at build time or runtime and enable them at runtime.
Modules for the most common tasks, such as serving directory indexes
or logging requests, are supplied and compiled in by default. mod_cgi
is one such module. Other modules are bundled with the Apache
distribution but are not compiled in by default: this is the case
with more specialized modules such as mod_rewrite or mod_proxy. There
are also a vast number of third-party modules, such as mod_perl, that
can handle a wide variety of tasks. Many of these can be found in the
Apache Module Registry
(http://modules.apache.org/).
Modules take control of request processing at each of the phases
through a set of well-defined hooks provided by Apache. The
subroutine or function in charge of a particular request phase is
called a handler. These include authentication
handlers such as mod_auth_dbi, as well as content handlers such as
mod_cgi. Some modules, such as mod_rewrite, install handlers for more
than one request phase.
Apache also provides modules with a comprehensive set of functions
they can call to achieve common tasks, including file I/O, sending
HTTP headers, or parsing URIs. These functions are collectively known
as the Apache Application Programming Interface (API).
Apache is written in C and currently requires that modules be written
in the same language. However, as we will see, mod_perl provides the
full Apache API in Perl, so modules can be written in Perl as well,
although mod_perl must be installed for them to run.
1.4.2. mod_perl 1.0 and the mod_perl API
Like other Apache modules, mod_perl is written in C,
registers handlers for request phases, and uses the Apache API.
However, mod_perl doesn't directly process requests.
Rather, it allows you to write handlers in
Perl.
When the Apache core yields control to mod_perl through one of its
registered handlers, mod_perl dispatches processing to one of the
registered Perl handlers.
Since Perl handlers need to perform the same basic tasks as their C
counterparts, mod_perl exposes the Apache API through a mod_perl API,
which is a set of Perl functions and objects. When a Perl handler
calls such a function or method, mod_perl translates it into the
appropriate Apache C function.
Perl handlers extract the last drop of performance from the Apache
server. Unlike mod_cgi and Apache::Registry, they
are not restricted to the content generation phase and can be tied to
any phase in the request loop. You can create your own custom
authentication by writing a PerlAuthenHandler, or
you can write specialized logging code in a
PerlLogHandler.
Handlers are not compatible with the CGI specification. Instead, they
use the mod_perl API directly for every aspect of request processing.
mod_perl provides access to the Apache API for Perl handlers via an
extensive collection of methods and variables exported by the Apache
core. This includes methods for dealing with the request (such as
retrieving headers or posted content), setting up the response (such
as sending HTTP headers and providing access to configuration
information derived from the server's configuration
file), and a slew of other methods providing access to most of
Apache's rich feature set.
Using the mod_perl API is not limited to mod_perl handlers.
Apache::Registryscripts can also call API
methods, at the price of forgoing CGI compatibility.
We suggest that you refer to the book Writing Apache
Modules with Perl and C, by Lincoln Stein and Doug
MacEachern (O'Reilly), if you want to learn more
about API methods.