Now that you know how CGI works, let's talk about
how Apache implements mod_cgi. This is important because it will help
you understand the limitations of mod_cgi and why mod_perl is such a
big improvement. This discussion will also build a foundation for the
rest of the performance chapters of this book.
1.2.1. Forking
Apache
1.3 on all Unix flavors uses the forking
model.[8] When you start the server, a single process, called the
parent process,
is started. Its main responsibility is starting and killing child
processes as needed. Various Apache configuration directives let you
control how many child processes are spawned initially, the number of
spare idle processes, and the maximum number of processes the parent
process is allowed to fork.
[8]In Chapter 24 we talk about
Apache 2.0, which introduces a few more server models.
Each child process has its own lifespan,
which is controlled by the configuration directive
MaxRequestsPerChild.
This directive specifies the number of requests that should be served
by the child before it is instructed to step down and is replaced by
another process. Figure 1-3 illustrates.
Figure 1-3. The Apache 1.3 server lifecycle
When a client initiates a request, the parent process checks whether
there is an idle child process and, if so, tells it to handle the
request. If there are no idle
processes,
the parent checks whether it is allowed to fork more processes. If it
is, a new process is forked to handle the request. Otherwise, the
incoming request is queued until a child process becomes available to
handle it.
The maximum number of queued requests is configurable by the
ListenBacklog
configuration directive. When this number is reached, a client
issuing a new request will receive an error response informing it
that the server is unreachable.
This is how requests for static objects, such as HTML documents and
images, are processed. When a CGI request is received, an additional
step is performed: mod_cgi in the child Apache process forks a new
process to execute the CGI script. When the script has completed
processing the request, the forked process exits.
1.2.2. CGI Scripts Under the Forking Model
One of
the benefits of this model is that if
something causes the child process to die (e.g., a badly written CGI
script), it won't cause the whole service to fail.
In fact, only the client that initiated the request will notice there
was a problem.
Many free (and non-free) CGI scripts are badly written, but they
still work, which is why no one
tries to improve them. Examples of poor CGI
programming practices include
forgetting to close open files, using uninitialized global variables,
ignoring the warnings Perl generates, and forgetting to turn on taint
checks (thus creating huge security holes that are happily used by
crackers to break into online systems).
Why do these sloppily written scripts work under mod_cgi? The reason
lies in the way mod_cgi invokes them: every time a Perl CGI script is
run, a new process is forked, and a new Perl interpreter is loaded.
This Perl interpreter lives for the span of the
request's life, and when the script exits (no matter
how), the process and the interpreter exit as well, cleaning up on
the way. When a new interpreter is started, it has no history of
previous requests. All the variables are created from scratch, and
all the files are reopened if needed. Although this detail may seem
obvious, it will be of paramount importance when we discuss mod_perl.
1.2.3. Performance Drawbacks of Forking
There are several
drawbacks to mod_cgi that
triggered the development of improved web technologies. The first
problem lies in the fact that a new process is forked and a new Perl
interpreter is loaded for each CGI script invocation. This has
several implications:
It adds the overhead of forking, although this is almost
insignificant on modern Unix systems.
Loading the Perl interpreter adds significant overhead to server
response times.
The script's source code and the modules that it
uses need to be loaded into memory and compiled each time from
scratch. This adds even more overhead to response times.
Process termination on the script's completion makes
it impossible to create persistent variables, which in turn prevents
the establishment of persistent database connections and in-memory
databases.
Starting a new interpreter removes the benefit of memory sharing that
could be obtained by preloading code modules at server startup. Also,
database connections can't be pre-opened at server
startup.
Another drawback is limited functionality: mod_cgi allows developers
to write only content handlers within CGI scripts. If you need to
access the much broader core functionality Apache provides, such as
authentication or URL rewriting, you must resort to third-party
Apache modules written in C, which sometimes make the production
server environment somewhat cumbersome. More components require more
administration work to keep the server in a healthy state.