25.2. New Apache Phases and Corresponding Perl*Handlers
Because the majority of the
Apache phases supported by
mod_perl haven't changed since mod_perl 1.0, in this
chapter we will discuss only those phases and corresponding handlers
that were added or changed in mod_perl 2.0.
Figure 25-1 depicts the Apache 2.0 server cycle. You
can see the mod_perl phases PerlOpenLogsHandler,
PerlPostConfigHandler, and
PerlChildInitHandler, which we will discuss
shortly. Later, we will zoom into the connection cycle depicted in
Figure 25-2, which will expose other mod_perl
handlers.
Figure 25-1. Apache 2.0 server lifecycle
Apache 2.0 starts by parsing the configuration file. After the
configuration file is parsed, any
PerlOpenLogsHandler handlers are executed. After
that, any PerlPostConfigHandler handlers are run.
When the post_config phase is finished the
server immediately restarts, to make sure that it can survive
graceful restarts after starting to serve the clients.
When the restart is completed, Apache 2.0 spawns the workers that
will do the actual work. Depending on the MPM used, these can be
threads, processes, or a mixture of both. For example, the
worker MPM spawns a number of processes, each
running a number of threads. When each child process is started
PerlChildInitHandlers are executed. Notice that
they are run for each starting process, not thread.
From that moment on each working process (or thread) processes
connections until it's killed by the server or the
server is shut down. When the server is shut down, any registered
PerlChildExitHandlers are executed.
package Book::StartupLog;
use strict;
use warnings;
use Apache::Log ( );
use Apache::ServerUtil ( );
use File::Spec::Functions;
use Apache::Const -compile => 'OK';
my $log_file = catfile "logs", "startup_log";
my $log_fh;
sub open_logs {
my($conf_pool, $log_pool, $temp_pool, $s) = @_;
my $log_path = Apache::server_root_relative($conf_pool, $log_file);
$s->warn("opening the log file: $log_path");
open $log_fh, ">>$log_path" or die "can't open $log_path: $!";
my $oldfh = select($log_fh); $| = 1; select($oldfh);
say("process $$ is born to reproduce");
return Apache::OK;
}
sub post_config {
my($conf_pool, $log_pool, $temp_pool, $s) = @_;
say("configuration is completed");
return Apache::OK;
}
sub child_exit {
my($child_pool, $s) = @_;
say("process $$ now exits");
return Apache::OK;
}
sub child_init {
my($child_pool, $s) = @_;
say("process $$ is born to serve");
return Apache::OK;
}
sub say {
my($caller) = (caller(1))[3] =~ /([^:]+)$/;
if (defined $log_fh) {
printf $log_fh "[%s] - %-11s: %s\n",
scalar(localtime), $caller, $_[0];
}
else {
# when the log file is not open
warn _ _PACKAGE_ _ . " says: $_[0]\n";
}
}
END {
say("process $$ is shutdown\n");
}
1;
When we perform a server
startup followed by a shutdown, the
logs/startup_log is created, if it
didn't exist already (it shares the same directory
with error_log and other standard log files),
and each stage appends to it its log information. So when we perform:
panic% bin/apachectl start && bin/apachectl stop
the following is logged to logs/startup_log:
[Thu Mar 6 15:57:08 2003] - open_logs : process 21823 is born to reproduce
[Thu Mar 6 15:57:08 2003] - post_config: configuration is completed
[Thu Mar 6 15:57:09 2003] - END : process 21823 is shutdown
[Thu Mar 6 15:57:10 2003] - open_logs : process 21825 is born to reproduce
[Thu Mar 6 15:57:10 2003] - post_config: configuration is completed
[Thu Mar 6 15:57:11 2003] - child_init : process 21830 is born to serve
[Thu Mar 6 15:57:11 2003] - child_init : process 21831 is born to serve
[Thu Mar 6 15:57:11 2003] - child_init : process 21832 is born to serve
[Thu Mar 6 15:57:11 2003] - child_init : process 21833 is born to serve
[Thu Mar 6 15:57:12 2003] - child_exit : process 21833 now exits
[Thu Mar 6 15:57:12 2003] - child_exit : process 21832 now exits
[Thu Mar 6 15:57:12 2003] - child_exit : process 21831 now exits
[Thu Mar 6 15:57:12 2003] - child_exit : process 21830 now exits
[Thu Mar 6 15:57:12 2003] - END : process 21825 is shutdown
First, we can clearly see that Apache always restarts itself after
the first post_config phase is over. The logs
show that the
post_config phase is preceded by the
open_logs phase. Only after Apache has restarted
itself and has completed the open_logs and
post_config phases again is the
child_init phase run for each child process. In our
example we had the setting StartServers=4;
therefore, you can see that four child processes were started.
Finally, you can see that on server shutdown, the
child_exit phase is run for each child process and
the END { } block is executed by the parent
process only.
Apache also specifies the
pre_config phase, which is executed before the
configuration files are parsed, but this is of no use to mod_perl,
because mod_perl is loaded only during the configuration phase.
Now let's discuss each of the mentioned startup
handlers and their implementation in the
Book::StartupLog module in detail.
25.2.1. Server Configuration and Startup Phases
open_logs, configured with
PerlOpenLogsHandler, and
post_config, configured with
PerlPostConfigHandler, are the two new phases
available during server startup.
25.2.1.1. PerlOpenLogsHandler
The open_logs phase happens just before the
post_config phase.
Handlers registered by
PerlOpenLogsHandler are usually used for opening
module-specific log files (e.g., httpd core and
mod_ssl open their log files during this phase).
At this stage the STDERRstream is not yet
redirected to error_log, and therefore any
messages to that stream will be printed to the console from which the
server is starting (if one exists).
The PerlOpenLogsHandler directive may appear in
the main configuration files and within
<VirtualHost>sections.
Apache will continue executing all handlers registered for this phase
until the first handler returns something other than
Apache::OK or Apache::DECLINED.
As we saw in the Book::StartupLog::open_logs
handler, the open_logs phase handlers accept
four arguments: the configuration pool,[60] the logging streams
pool, the temporary pool, and the server object:
[60]Pools are
used by Apache for memory-handling functions. You can make use of
them from the Perl space, too.
sub open_logs {
my($conf_pool, $log_pool, $temp_pool, $s) = @_;
my $log_path = Apache::server_root_relative($conf_pool, $log_file);
$s->warn("opening the log file: $log_path");
open $log_fh, ">>$log_path" or die "can't open $log_path: $!";
my $oldfh = select($log_fh); $| = 1; select($oldfh);
say("process $$ is born to reproduce");
return Apache::OK;
}
In our example the handler uses the
Apache::server_root_relative( ) function to set
the full path to the log file, which is then opened for appending and
set to unbuffered mode. Finally, it logs the fact that
it's running in the parent process.
As you've seen in this example, this handler is
configured by adding the following to
httpd.conf:
PerlOpenLogsHandler Book::StartupLog::open_logs
25.2.1.2. PerlPostConfigHandler
The post_config phase happens right after Apache has
processed the configuration files, before any child processes are
spawned (which happens at the child_init phase).
This phase can be used for initializing things to be shared between
all child processes. You can do the same in the startup file, but in
the post_config phase you have access to a
complete configuration tree.
The post_config phase is very similar to the
open_logs phase. The
PerlPostConfigHandler directive may appear in the main
configuration files and within <VirtualHost>
sections. Apache will run all registered handlers for this phase
until a handler returns something other than
Apache::OK or Apache::DECLINED.
This phase's handlers receive the same four
arguments as the open_logs
phase's handlers. From our example:
sub post_config {
my($conf_pool, $log_pool, $temp_pool, $s) = @_;
say("configuration is completed");
return Apache::OK;
}
This example handler just logs that the configuration was completed
and returns right away.
This handler is configured by adding the following to
httpd.conf:
PerlOpenLogsHandler Book::StartupLog::post_config
25.2.1.3. PerlChildInitHandler
The child_init phase happens immediately after a child
process is spawned. Each child process (not a thread!) will run the
hooks of this phase only once in its life-time.
In the prefork MPM this phase is useful for
initializing any data structures that should be private to each
process. For example, Apache::DBI preopens
database connections during this phase, and
Apache::Resourcesets the
process's resource limits.
The
PerlChildInitHandler directive should appear in the
top-level server configuration file. All
PerlChildInitHandlers will be executed,
disregarding their return values (although mod_perl expects a return
value, so returning Apache::OK is a good idea).
In the Book::StartupLog example we used the
child_init( ) handler:
sub child_init {
my($child_pool, $s) = @_;
say("process $$ is born to serve");
return Apache::OK;
}
The child_init( ) handler accepts two arguments:
the child process pool and the server object. The example handler
logs the PID of the child process in which it's run
and returns.
This handler is configured by adding the following to
httpd.conf:
PerlOpenLogsHandler Book::StartupLog::child_init
25.2.1.4. PerlChildExitHandler
The child_exit phase is executed before the child
process exits. Notice that it happens only when the process exits,
not when the thread exits (assuming that you are using a threaded
MPM).
The PerlChildExitHandler directive should appear
in the top-level server configuration file. mod_perl will run all
registered
PerlChildExitHandler handlers for this phase until a
handler returns something other than Apache::OK or
Apache::DECLINED.
In the Book::StartupLog example we used the
child_exit( ) handler:
sub child_exit {
my($child_pool, $s) = @_;
say("process $$ now exits");
return Apache::OK;
}
The child_exit( ) handler accepts two arguments:
the child process pool and the server object. The example handler
logs the PID of the child process in which it's run
and returns.
As you saw in the example, this handler is configured by adding the
following to httpd.conf:
PerlOpenLogsHandler Book::StartupLog::child_exit
25.2.2. Connection Phases
Since Apache 2.0 makes it possible to implement protocols other than
HTTP, the connection phases
pre_connection, configured with
PerlPreConnectionHandler, and
process_connection, configured with
PerlProcessConnectionHandler, were added. The
pre_connection phase is used for runtime adjustments of
things for each connection—for example, mod_ssl uses the
pre_connection phase to add the SSL filters if
SSLEngine On is configured, regardless of whether
the protocol is HTTP, FTP, NNTP, etc. The
process_connection phase is used to implement various
protocols, usually those similar to HTTP. The HTTP protocol itself is
handled like any other protocol; internally it runs the request
handlers similar to Apache 1.3.
When a connection is issued by a client, it's first
run through the PerlPreConnectionHandler and then
passed to the PerlProcessConnectionHandler, which
generates the response. When
PerlProcessConnectionHandler is reading data from
the client, it can be filtered by connection input filters. The
generated response can also be filtered though connection output
filters. Filters are usually used for modifying the data flowing
though them, but they can be used for other purposes as well (e.g.,
logging interesting information). Figure 25-2
depicts the connection cycle and the data flow and highlights which
handlers are available to mod_perl 2.0.
Figure 25-2. Apache 2.0 connection cycle
Now let's discuss the
PerlPreConnectionHandler and
PerlProcessConnectionHandler handlers in detail.
25.2.2.1. PerlPreConnectionHandler
The
pre_connection phase happens just after the server
accepts the connection, but before it is handed off to a protocol
module to be served. It gives modules an opportunity to modify the
connection as soon as possible and insert filters if needed. The core
server uses this phase to set up the connection record based on the
type of connection that is being used. mod_perl itself uses this
phase to register the connection input and output filters.
In mod_perl 1.0, during code development
Apache::Reload was used to automatically reload
Perl modules modified since the last request. It was invoked during
post_read_request, the first HTTP
request's phase. In mod_perl 2.0,
pre_connection is the earliest phase, so if we
want to make sure that all modified Perl modules are reloaded for any
protocols and their phases, it's best to set the
scope of the Perl interpreter to the lifetime of the connection via:
PerlInterpScope connection
and invoke the Apache::Reload handler during the
pre_connection phase. However, this
development-time advantage can become a disadvantage in
production—for example, if a connection handled by the HTTP
protocol is configured as KeepAlive and there are
several requests coming on the same connection (one handled by
mod_perl and the others by the default image handler), the Perl
interpreter won't be available to other threads
while the images are being served.
Apache will continue executing all handlers registered for this phase
until the first handler returns something other than
Apache::OK or Apache::DECLINED.
The
PerlPreConnectionHandler directive may appear in the main
configuration files and within <VirtualHost>
sections.
A pre_connection handler accepts a connection
record and a socket object as its arguments:
sub handler {
my ($c, $socket) = @_;
# ...
return Apache::OK;
}
25.2.2.2. PerlProcessConnectionHandler
The
process_connection phase is used to process incoming
connections. Only protocol modules should assign handlers for this
phase, as it gives them an opportunity to replace the standard HTTP
processing with processing for some other protocol (e.g., POP3, FTP,
etc.).
Apache will continue executing all handlers registered for this phase
until the first handler returns something other than
Apache::DECLINED.
The
PerlProcessConnectionHandler directive may appear in the main
configuration files and within <VirtualHost>
sections.
The process_connection handler can be written in
two ways. The first way is to manipulate bucket brigades, in a way
very similar to the filters. The second, simpler way is to bypass all
the filters and to read from and write to the connection socket
directly.
A process_connection handler accepts a
connection record object as its only argument:
sub handler {
my ($c) = @_;
# ...
return Apache::OK;
}
Now let's look at two examples of connection
handlers. The first uses the connection socket to read and write the
data, and the second uses bucket brigades to accomplish the same
thing and allow the connection filters to do their work.
25.2.2.2.1. Socket-based protocol module
To demonstrate the workings of a protocol module, we'll
take a look at the Book::Eliza module, which sends
the data read from the client as input to
Chatbot::Eliza, which in turn implements a mock
Rogerian psychotherapist and forwards the response from the
psychotherapist back to the client. In this module we will use the
implementation that works directly with the connection socket and
therefore bypasses any connection filters.
A protocol handler is configured using the
PerlProcessConnectionHandler directive, and we
will use the Listen and
<VirtualHost> directives to bind to the
nonstandard port 8084:
panic% telnet localhost 8084
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
Hello Eliza
How do you do. Please state your problem.
How are you?
Oh, I?
Why do I have core dumped?
You say Why do you have core dumped?
I feel like writing some tests today, you?
I'm not sure I understand you fully.
Good bye, Eliza
Does talking about this bother you?
Connection closed by foreign host.
package Book::Eliza;
use strict;
use warnings FATAL => 'all';
use Apache::Connection ( );
use APR::Socket ( );
require Chatbot::Eliza;
use Apache::Const -compile => 'OK';
use constant BUFF_LEN => 1024;
my $eliza = new Chatbot::Eliza;
sub handler {
my $c = shift;
my $socket = $c->client_socket;
my $buff;
my $last = 0;
while (1) {
my($rlen, $wlen);
$rlen = BUFF_LEN;
$socket->recv($buff, $rlen);
last if $rlen <= 0;
# \r is sent instead of \n if the client is talking over telnet
$buff =~ s/[\r\n]*$//;
$last++ if $buff =~ /good bye/i;
$buff = $eliza->transform( $buff ) . "\n\n";
$socket->send($buff, length $buff);
last if $last;
}
Apache::OK;
}
1;
The example handler starts with the standard package declaration and,
of course, usestrict;. As with
all Perl*Handlers, the subroutine name defaults to
handler. However, in the case of a protocol
handler, the first argument is not a request_rec,
but a conn_rec blessed into the
Apache::Connection class. We have direct access to
the client socket via
Apache::Connection's
client_socket( ) method, which returns an object
blessed into the APR::Socket class.
Inside the read/send loop, the handler attempts to read
BUFF_LEN bytes from the client socket into the
$buff buffer. The $rlen
parameter will be set to the number of bytes actually read. The
APR::Socket::recv( ) method returns an APR status
value, but we need only check the read length to break out of the
loop if it is less than or equal to 0 bytes. The handler also breaks
the loop after processing an input including the
"good bye" string.
Otherwise, if the handler receives some data, it sends this data to
the $eliza object (which represents the
psychotherapist), whose returned text is then sent back to the client
with the APR::Socket::send( ) method. When the
read/print loop is finished the handler returns
Apache::OK, telling Apache to terminate the
connection. As mentioned earlier, since this handler is working
directly with the connection socket, no filters can be applied.
25.2.2.2.2. Bucket brigade-based protocol module
Now let's look at the
same module, but this time implemented by manipulating bucket
brigades. It runs its output through a connection output filter that
turns all uppercase characters into their lowercase equivalents.
The following configuration defines a
<VirtualHost> listening on port 8085 that
enables the Book::Eliza2 connection handler, which
will run its output through the
Book::Eliza2::lowercase_filter filter:
panic% telnet localhost 8085
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Hello Eliza!
hi. what seems to be your problem?
Problem? I don't have any problems ;)
does that trouble you?
Not at all, I don't like problems.
i'm not sure i understand you fully.
I said that I don't like problems.
that is interesting. please continue.
You are boring :(
does it please you to believe i am boring?
Yes, yes!
please tell me some more about this.
Good bye!
i'm not sure i understand you fully.
Connection closed by foreign host.
As you can see, the response, which normally is a mix of upper- and
lowercase words, now is all in lowercase, because of the output
filter. The implementation of the connection and the filter handlers
is shown in Example 25-4.
Example 25-4. Book/Eliza2.pm
package Book::Eliza2;
use strict;
use warnings FATAL => 'all';
use Apache::Connection ( );
use APR::Bucket ( );
use APR::Brigade ( );
use APR::Util ( );
require Chatbot::Eliza;
use APR::Const -compile => qw(SUCCESS EOF);
use Apache::Const -compile => qw(OK MODE_GETLINE);
my $eliza = new Chatbot::Eliza;
sub handler {
my $c = shift;
my $bb_in = APR::Brigade->new($c->pool, $c->bucket_alloc);
my $bb_out = APR::Brigade->new($c->pool, $c->bucket_alloc);
my $last = 0;
while (1) {
my $rv = $c->input_filters->get_brigade($bb_in,
Apache::MODE_GETLINE);
if ($rv != APR::SUCCESS or $bb_in->empty) {
my $error = APR::strerror($rv);
unless ($rv = = APR::EOF) {
warn "[eliza] get_brigade: $error\n";
}
$bb_in->destroy;
last;
}
while (!$bb_in->empty) {
my $bucket = $bb_in->first;
$bucket->remove;
if ($bucket->is_eos) {
$bb_out->insert_tail($bucket);
last;
}
my $data;
my $status = $bucket->read($data);
return $status unless $status = = APR::SUCCESS;
if ($data) {
$data =~ s/[\r\n]*$//;
$last++ if $data =~ /good bye/i;
$data = $eliza->transform( $data ) . "\n\n";
$bucket = APR::Bucket->new($data);
}
$bb_out->insert_tail($bucket);
}
my $b = APR::Bucket::flush_create($c->bucket_alloc);
$bb_out->insert_tail($b);
$c->output_filters->pass_brigade($bb_out);
last if $last;
}
Apache::OK;
}
use base qw(Apache::Filter);
use constant BUFF_LEN => 1024;
sub lowercase_filter : FilterConnectionHandler {
my $filter = shift;
while ($filter->read(my $buffer, BUFF_LEN)) {
$filter->print(lc $buffer);
}
return Apache::OK;
}
1;
For the purpose of explaining how this connection handler works, we
are going to simplify the handler. The whole handler can be
represented by the following pseudocode:
while ($bb_in = get_brigade( )) {
while ($bucket_in = $bb_in->get_bucket( )) {
my $data = $bucket_in->read( );
$data = transform($data);
$bucket_out = new_bucket($data);
$bb_out->insert_tail($bucket_out);
}
$bb_out->insert_tail($flush_bucket);
pass_brigade($bb_out);
}
The handler receives the incoming data via bucket bridages, one at a
time, in a loop. It then processes each brigade, by retrieving the
buckets contained in it, reading in the data, transforming that data,
creating new buckets using the transformed data, and attaching them
to the outgoing brigade. When all the buckets from the incoming
bucket brigade are transformed and attached to the outgoing bucket
brigade, a flush bucket is created and added as the last bucket, so
when the outgoing bucket brigade is passed out to the outgoing
connection filters, it will be sent to the client right away, not
buffered.
If you look at the complete handler, the loop is terminated when one
of the following conditions occurs: an error happens, the
end-of-stream bucket has been seen (i.e., there's no
more input at the connection), or the received data contains the
string "good bye". As you saw in
the demonstration, we used the string "good
bye" to terminate our shrink's
session.
We will skip the filter discussion here, since we are going to talk
in depth about filters in the following sections. All you need to
know at this stage is that the data sent from the connection handler
is filtered by the outgoing filter, which transforms it to be
all lowercase.
25.2.3. HTTP Request Phases
The HTTP request phases themselves have not
changed from mod_perl 1.0, except the PerlHandler
directive has been renamed PerlResponseHandler to
better match the corresponding Apache phase name
(response).
The only difference is that now it's possible to
register HTTP request input and output filters, so
PerlResponseHandler will filter its input and output
through them. Figure 25-3 depicts the HTTP request
cycle, which should be familiar to mod_perl 1.0 users, with the new
addition of the request filters. From the diagram you can also see
that the request filters are stacked on top of the connection
filters. The request input filters filter only a request body, and
the request output filters filter only a response body. Request and
response headers can be accessed and modified using the
$r->headers_in,
$r->headers_out, and other methods.