New Apache Phases and Corresponding Perl*Handlers (Practical mod

25.2. New Apache Phases and Corresponding Perl*Handlers

Because the majority of the Apache phases supported by mod_perl haven't changed since mod_perl 1.0, in this chapter we will discuss only those phases and corresponding handlers that were added or changed in mod_perl 2.0.

Figure 25-1 depicts the Apache 2.0 server cycle. You can see the mod_perl phases PerlOpenLogsHandler, PerlPostConfigHandler, and PerlChildInitHandler, which we will discuss shortly. Later, we will zoom into the connection cycle depicted in Figure 25-2, which will expose other mod_perl handlers.

Figure 25-1. Apache 2.0 server lifecycle

Apache 2.0 starts by parsing the configuration file. After the configuration file is parsed, any PerlOpenLogsHandler handlers are executed. After that, any PerlPostConfigHandler handlers are run. When the post_config phase is finished the server immediately restarts, to make sure that it can survive graceful restarts after starting to serve the clients.

When the restart is completed, Apache 2.0 spawns the workers that will do the actual work. Depending on the MPM used, these can be threads, processes, or a mixture of both. For example, the worker MPM spawns a number of processes, each running a number of threads. When each child process is started PerlChildInitHandlers are executed. Notice that they are run for each starting process, not thread.

From that moment on each working process (or thread) processes connections until it's killed by the server or the server is shut down. When the server is shut down, any registered PerlChildExitHandlers are executed.

Example 25-2 demonstrates all the startup phases.

Example 25-2. Book/StartupLog.pm

package Book::StartupLog;

use strict;
use warnings;

use Apache::Log ( );
use Apache::ServerUtil ( );

use File::Spec::Functions;

use Apache::Const -compile => 'OK';

my $log_file = catfile "logs", "startup_log";
my $log_fh;

sub open_logs {
    my($conf_pool, $log_pool, $temp_pool, $s) = @_;
    my $log_path = Apache::server_root_relative($conf_pool, $log_file);

    $s->warn("opening the log file: $log_path");
    open $log_fh, ">>$log_path" or die "can't open $log_path: $!";
    my $oldfh = select($log_fh); $| = 1; select($oldfh);

    say("process $$ is born to reproduce");
    return Apache::OK;
}

sub post_config {
    my($conf_pool, $log_pool, $temp_pool, $s) = @_;
    say("configuration is completed");
    return Apache::OK;
}

sub child_exit {
    my($child_pool, $s) = @_;
    say("process $$ now exits");
    return Apache::OK;
}

sub child_init {
    my($child_pool, $s) = @_;
    say("process $$ is born to serve");
    return Apache::OK;
}

sub say {
    my($caller) = (caller(1))[3] =~ /([^:]+)$/;
    if (defined $log_fh) {
        printf $log_fh "[%s] - %-11s: %s\n", 
            scalar(localtime), $caller, $_[0];
    }
    else {
        # when the log file is not open
        warn _ _PACKAGE_ _ . " says: $_[0]\n";
    }
}

END {
    say("process $$ is shutdown\n");
}

1;

Here's the httpd.conf configuration section:

PerlModule            Book::StartupLog
PerlOpenLogsHandler   Book::StartupLog::open_logs
PerlPostConfigHandler Book::StartupLog::post_config
PerlChildInitHandler  Book::StartupLog::child_init
PerlChildExitHandler  Book::StartupLog::child_exit

When we perform a server startup followed by a shutdown, the logs/startup_log is created, if it didn't exist already (it shares the same directory with error_log and other standard log files), and each stage appends to it its log information. So when we perform:

panic% bin/apachectl start && bin/apachectl stop

the following is logged to logs/startup_log:

[Thu Mar  6 15:57:08 2003] - open_logs  : process 21823 is born to reproduce
[Thu Mar  6 15:57:08 2003] - post_config: configuration is completed
[Thu Mar  6 15:57:09 2003] - END        : process 21823 is shutdown

[Thu Mar  6 15:57:10 2003] - open_logs  : process 21825 is born to reproduce
[Thu Mar  6 15:57:10 2003] - post_config: configuration is completed
[Thu Mar  6 15:57:11 2003] - child_init : process 21830 is born to serve
[Thu Mar  6 15:57:11 2003] - child_init : process 21831 is born to serve
[Thu Mar  6 15:57:11 2003] - child_init : process 21832 is born to serve
[Thu Mar  6 15:57:11 2003] - child_init : process 21833 is born to serve
[Thu Mar  6 15:57:12 2003] - child_exit : process 21833 now exits
[Thu Mar  6 15:57:12 2003] - child_exit : process 21832 now exits
[Thu Mar  6 15:57:12 2003] - child_exit : process 21831 now exits
[Thu Mar  6 15:57:12 2003] - child_exit : process 21830 now exits
[Thu Mar  6 15:57:12 2003] - END        : process 21825 is shutdown

First, we can clearly see that Apache always restarts itself after the first post_config phase is over. The logs show that the post_config phase is preceded by the open_logs phase. Only after Apache has restarted itself and has completed the open_logs and post_config phases again is the child_init phase run for each child process. In our example we had the setting StartServers=4; therefore, you can see that four child processes were started.

Finally, you can see that on server shutdown, the child_exit phase is run for each child process and the END { } block is executed by the parent process only.

Apache also specifies the pre_config phase, which is executed before the configuration files are parsed, but this is of no use to mod_perl, because mod_perl is loaded only during the configuration phase.

Now let's discuss each of the mentioned startup handlers and their implementation in the Book::StartupLog module in detail.

25.2.1. Server Configuration and Startup Phases

open_logs, configured with PerlOpenLogsHandler, and post_config, configured with PerlPostConfigHandler, are the two new phases available during server startup.

25.2.1.1. PerlOpenLogsHandler

The open_logs phase happens just before the post_config phase.

Handlers registered by PerlOpenLogsHandler are usually used for opening module-specific log files (e.g., httpd core and mod_ssl open their log files during this phase).

At this stage the STDERRstream is not yet redirected to error_log, and therefore any messages to that stream will be printed to the console from which the server is starting (if one exists).

The PerlOpenLogsHandler directive may appear in the main configuration files and within <VirtualHost>sections.

Apache will continue executing all handlers registered for this phase until the first handler returns something other than Apache::OK or Apache::DECLINED.

As we saw in the Book::StartupLog::open_logs handler, the open_logs phase handlers accept four arguments: the configuration pool,[60] the logging streams pool, the temporary pool, and the server object:

[60]Pools are used by Apache for memory-handling functions. You can make use of them from the Perl space, too.

sub open_logs {
    my($conf_pool, $log_pool, $temp_pool, $s) = @_;
    my $log_path = Apache::server_root_relative($conf_pool, $log_file);

    $s->warn("opening the log file: $log_path");
    open $log_fh, ">>$log_path" or die "can't open $log_path: $!";
    my $oldfh = select($log_fh); $| = 1; select($oldfh);

    say("process $$ is born to reproduce");
    return Apache::OK;
}

In our example the handler uses the Apache::server_root_relative( ) function to set the full path to the log file, which is then opened for appending and set to unbuffered mode. Finally, it logs the fact that it's running in the parent process.

As you've seen in this example, this handler is configured by adding the following to httpd.conf:

PerlOpenLogsHandler Book::StartupLog::open_logs

25.2.1.2. PerlPostConfigHandler

The post_config phase happens right after Apache has processed the configuration files, before any child processes are spawned (which happens at the child_init phase).

This phase can be used for initializing things to be shared between all child processes. You can do the same in the startup file, but in the post_config phase you have access to a complete configuration tree.

The post_config phase is very similar to the open_logs phase. The PerlPostConfigHandler directive may appear in the main configuration files and within <VirtualHost> sections. Apache will run all registered handlers for this phase until a handler returns something other than Apache::OK or Apache::DECLINED. This phase's handlers receive the same four arguments as the open_logs phase's handlers. From our example:

sub post_config {
    my($conf_pool, $log_pool, $temp_pool, $s) = @_;
    say("configuration is completed");
    return Apache::OK;
}

This example handler just logs that the configuration was completed and returns right away.

This handler is configured by adding the following to httpd.conf:

PerlOpenLogsHandler Book::StartupLog::post_config

25.2.1.3. PerlChildInitHandler

The child_init phase happens immediately after a child process is spawned. Each child process (not a thread!) will run the hooks of this phase only once in its life-time.

In the prefork MPM this phase is useful for initializing any data structures that should be private to each process. For example, Apache::DBI preopens database connections during this phase, and Apache::Resourcesets the process's resource limits.

The PerlChildInitHandler directive should appear in the top-level server configuration file. All PerlChildInitHandlers will be executed, disregarding their return values (although mod_perl expects a return value, so returning Apache::OK is a good idea).

In the Book::StartupLog example we used the child_init( ) handler:

sub child_init {
    my($child_pool, $s) = @_;
    say("process $$ is born to serve");
    return Apache::OK;
}

The child_init( ) handler accepts two arguments: the child process pool and the server object. The example handler logs the PID of the child process in which it's run and returns.

This handler is configured by adding the following to httpd.conf:

PerlOpenLogsHandler Book::StartupLog::child_init

25.2.1.4. PerlChildExitHandler

The child_exit phase is executed before the child process exits. Notice that it happens only when the process exits, not when the thread exits (assuming that you are using a threaded MPM).

The PerlChildExitHandler directive should appear in the top-level server configuration file. mod_perl will run all registered PerlChildExitHandler handlers for this phase until a handler returns something other than Apache::OK or Apache::DECLINED.

In the Book::StartupLog example we used the child_exit( ) handler:

sub child_exit {
    my($child_pool, $s) = @_;
    say("process $$ now exits");
    return Apache::OK;
}

The child_exit( ) handler accepts two arguments: the child process pool and the server object. The example handler logs the PID of the child process in which it's run and returns.

As you saw in the example, this handler is configured by adding the following to httpd.conf:

PerlOpenLogsHandler Book::StartupLog::child_exit

25.2.2. Connection Phases

Since Apache 2.0 makes it possible to implement protocols other than HTTP, the connection phases pre_connection, configured with PerlPreConnectionHandler, and process_connection, configured with PerlProcessConnectionHandler, were added. The pre_connection phase is used for runtime adjustments of things for each connection—for example, mod_ssl uses the pre_connection phase to add the SSL filters if SSLEngine On is configured, regardless of whether the protocol is HTTP, FTP, NNTP, etc. The process_connection phase is used to implement various protocols, usually those similar to HTTP. The HTTP protocol itself is handled like any other protocol; internally it runs the request handlers similar to Apache 1.3.

When a connection is issued by a client, it's first run through the PerlPreConnectionHandler and then passed to the PerlProcessConnectionHandler, which generates the response. When PerlProcessConnectionHandler is reading data from the client, it can be filtered by connection input filters. The generated response can also be filtered though connection output filters. Filters are usually used for modifying the data flowing though them, but they can be used for other purposes as well (e.g., logging interesting information). Figure 25-2 depicts the connection cycle and the data flow and highlights which handlers are available to mod_perl 2.0.

Figure 25-2. Apache 2.0 connection cycle

Now let's discuss the PerlPreConnectionHandler and PerlProcessConnectionHandler handlers in detail.

25.2.2.1. PerlPreConnectionHandler

The pre_connection phase happens just after the server accepts the connection, but before it is handed off to a protocol module to be served. It gives modules an opportunity to modify the connection as soon as possible and insert filters if needed. The core server uses this phase to set up the connection record based on the type of connection that is being used. mod_perl itself uses this phase to register the connection input and output filters.

In mod_perl 1.0, during code development Apache::Reload was used to automatically reload Perl modules modified since the last request. It was invoked during post_read_request, the first HTTP request's phase. In mod_perl 2.0, pre_connection is the earliest phase, so if we want to make sure that all modified Perl modules are reloaded for any protocols and their phases, it's best to set the scope of the Perl interpreter to the lifetime of the connection via:

PerlInterpScope connection

and invoke the Apache::Reload handler during the pre_connection phase. However, this development-time advantage can become a disadvantage in production—for example, if a connection handled by the HTTP protocol is configured as KeepAlive and there are several requests coming on the same connection (one handled by mod_perl and the others by the default image handler), the Perl interpreter won't be available to other threads while the images are being served.

Apache will continue executing all handlers registered for this phase until the first handler returns something other than Apache::OK or Apache::DECLINED.

The PerlPreConnectionHandler directive may appear in the main configuration files and within <VirtualHost> sections.

A pre_connection handler accepts a connection record and a socket object as its arguments:

sub handler {
    my ($c, $socket) = @_;
    # ...
    return Apache::OK;
}

25.2.2.2. PerlProcessConnectionHandler

The process_connection phase is used to process incoming connections. Only protocol modules should assign handlers for this phase, as it gives them an opportunity to replace the standard HTTP processing with processing for some other protocol (e.g., POP3, FTP, etc.).

Apache will continue executing all handlers registered for this phase until the first handler returns something other than Apache::DECLINED.

The PerlProcessConnectionHandler directive may appear in the main configuration files and within <VirtualHost> sections.

The process_connection handler can be written in two ways. The first way is to manipulate bucket brigades, in a way very similar to the filters. The second, simpler way is to bypass all the filters and to read from and write to the connection socket directly.

A process_connection handler accepts a connection record object as its only argument:

sub handler {
    my ($c) = @_;
    # ...
    return Apache::OK;
}

Now let's look at two examples of connection handlers. The first uses the connection socket to read and write the data, and the second uses bucket brigades to accomplish the same thing and allow the connection filters to do their work.

25.2.2.2.1. Socket-based protocol module

To demonstrate the workings of a protocol module, we'll take a look at the Book::Eliza module, which sends the data read from the client as input to Chatbot::Eliza, which in turn implements a mock Rogerian psychotherapist and forwards the response from the psychotherapist back to the client. In this module we will use the implementation that works directly with the connection socket and therefore bypasses any connection filters.

A protocol handler is configured using the PerlProcessConnectionHandler directive, and we will use the Listen and <VirtualHost> directives to bind to the nonstandard port 8084:

Listen 8084
<VirtualHost _default_:8084>
    PerlModule                   Book::Eliza
    PerlProcessConnectionHandler Book::Eliza
</VirtualHost>

Book::Eliza is then enabled when starting Apache:

panic% httpd

And we give it a whirl:

panic% telnet localhost 8084
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
Hello Eliza
How do you do. Please state your problem.

How are you?
Oh, I?

Why do I have core dumped?
You say Why do you have core dumped?

I feel like writing some tests today, you?
I'm not sure I understand you fully.

Good bye, Eliza
Does talking about this bother you?

Connection closed by foreign host.

The code is shown in Example 25-3.

Example 25-3. Book/Eliza.pm

package Book::Eliza;

use strict;
use warnings FATAL => 'all';

use Apache::Connection ( );
use APR::Socket ( );

require Chatbot::Eliza;

use Apache::Const -compile => 'OK';

use constant BUFF_LEN => 1024;

my $eliza = new Chatbot::Eliza;

sub handler {
    my $c = shift;
    my $socket = $c->client_socket;

    my $buff;
    my $last = 0;
    while (1) {
        my($rlen, $wlen);
        $rlen = BUFF_LEN;
        $socket->recv($buff, $rlen);
        last if $rlen <= 0;

        # \r is sent instead of \n if the client is talking over telnet
        $buff =~ s/[\r\n]*$//;
        $last++ if $buff =~ /good bye/i;
        $buff = $eliza->transform( $buff ) . "\n\n";
        $socket->send($buff, length $buff);
        last if $last;
    }

    Apache::OK;
}
1;

The example handler starts with the standard package declaration and, of course, use strict;. As with all Perl*Handlers, the subroutine name defaults to handler. However, in the case of a protocol handler, the first argument is not a request_rec, but a conn_rec blessed into the Apache::Connection class. We have direct access to the client socket via Apache::Connection's client_socket( ) method, which returns an object blessed into the APR::Socket class.

Inside the read/send loop, the handler attempts to read BUFF_LEN bytes from the client socket into the $buff buffer. The $rlen parameter will be set to the number of bytes actually read. The APR::Socket::recv( ) method returns an APR status value, but we need only check the read length to break out of the loop if it is less than or equal to 0 bytes. The handler also breaks the loop after processing an input including the "good bye" string.

Otherwise, if the handler receives some data, it sends this data to the $eliza object (which represents the psychotherapist), whose returned text is then sent back to the client with the APR::Socket::send( ) method. When the read/print loop is finished the handler returns Apache::OK, telling Apache to terminate the connection. As mentioned earlier, since this handler is working directly with the connection socket, no filters can be applied.

25.2.2.2.2. Bucket brigade-based protocol module

Now let's look at the same module, but this time implemented by manipulating bucket brigades. It runs its output through a connection output filter that turns all uppercase characters into their lowercase equivalents.

The following configuration defines a <VirtualHost> listening on port 8085 that enables the Book::Eliza2 connection handler, which will run its output through the Book::Eliza2::lowercase_filter filter:

Listen 8085
<VirtualHost _default_:8085>
    PerlModule                   Book::Eliza2
    PerlProcessConnectionHandler Book::Eliza2
    PerlOutputFilterHandler      Book::Eliza2::lowercase_filter
</VirtualHost>

As before, we start the httpd server:

panic% httpd

and try the new connection handler in action:

panic% telnet localhost 8085
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Hello Eliza!
hi. what seems to be your problem?

Problem? I don't have any problems ;)
does that trouble you?

Not at all, I don't like problems.
i'm not sure i understand you fully.

I said that I don't like problems.
that is interesting. please continue.

You are boring :(
does it please you to believe i am boring?

Yes, yes!
please tell me some more about this.

Good bye!
i'm not sure i understand you fully.

Connection closed by foreign host.

As you can see, the response, which normally is a mix of upper- and lowercase words, now is all in lowercase, because of the output filter. The implementation of the connection and the filter handlers is shown in Example 25-4.

Example 25-4. Book/Eliza2.pm

package Book::Eliza2;

use strict;
use warnings FATAL => 'all';

use Apache::Connection ( );
use APR::Bucket ( );
use APR::Brigade ( );
use APR::Util ( );

require Chatbot::Eliza;

use APR::Const -compile => qw(SUCCESS EOF);
use Apache::Const -compile => qw(OK MODE_GETLINE);

my $eliza = new Chatbot::Eliza;

sub handler {
    my $c = shift;

    my $bb_in  = APR::Brigade->new($c->pool, $c->bucket_alloc);
    my $bb_out = APR::Brigade->new($c->pool, $c->bucket_alloc);
    my $last = 0;

    while (1) {
        my $rv = $c->input_filters->get_brigade($bb_in, 
                                                Apache::MODE_GETLINE);

        if ($rv != APR::SUCCESS or $bb_in->empty) {
            my $error = APR::strerror($rv);
            unless ($rv =  = APR::EOF) {
                warn "[eliza] get_brigade: $error\n";
            }
            $bb_in->destroy;
            last;
        }

        while (!$bb_in->empty) {
            my $bucket = $bb_in->first;

            $bucket->remove;

            if ($bucket->is_eos) {
                $bb_out->insert_tail($bucket);
                last;
            }

            my $data;
            my $status = $bucket->read($data);
            return $status unless $status =  = APR::SUCCESS;

            if ($data) {
                $data =~ s/[\r\n]*$//;
                $last++ if $data =~ /good bye/i;
                $data = $eliza->transform( $data ) . "\n\n";
                $bucket = APR::Bucket->new($data);
            }

            $bb_out->insert_tail($bucket);
        }

        my $b = APR::Bucket::flush_create($c->bucket_alloc);
        $bb_out->insert_tail($b);
        $c->output_filters->pass_brigade($bb_out);
        last if $last;
    }

    Apache::OK;
}

use base qw(Apache::Filter);
use constant BUFF_LEN => 1024;

sub lowercase_filter : FilterConnectionHandler {
    my $filter = shift;

    while ($filter->read(my $buffer, BUFF_LEN)) {
        $filter->print(lc $buffer);
    }

    return Apache::OK;
}

1;

For the purpose of explaining how this connection handler works, we are going to simplify the handler. The whole handler can be represented by the following pseudocode:

while ($bb_in = get_brigade( )) {
    while ($bucket_in = $bb_in->get_bucket( )) {
        my $data = $bucket_in->read( );
        $data = transform($data);
        $bucket_out = new_bucket($data);

        $bb_out->insert_tail($bucket_out);
    }
    $bb_out->insert_tail($flush_bucket);
    pass_brigade($bb_out);
}

The handler receives the incoming data via bucket bridages, one at a time, in a loop. It then processes each brigade, by retrieving the buckets contained in it, reading in the data, transforming that data, creating new buckets using the transformed data, and attaching them to the outgoing brigade. When all the buckets from the incoming bucket brigade are transformed and attached to the outgoing bucket brigade, a flush bucket is created and added as the last bucket, so when the outgoing bucket brigade is passed out to the outgoing connection filters, it will be sent to the client right away, not buffered.

If you look at the complete handler, the loop is terminated when one of the following conditions occurs: an error happens, the end-of-stream bucket has been seen (i.e., there's no more input at the connection), or the received data contains the string "good bye". As you saw in the demonstration, we used the string "good bye" to terminate our shrink's session.

We will skip the filter discussion here, since we are going to talk in depth about filters in the following sections. All you need to know at this stage is that the data sent from the connection handler is filtered by the outgoing filter, which transforms it to be all lowercase.

25.2.3. HTTP Request Phases

The HTTP request phases themselves have not changed from mod_perl 1.0, except the PerlHandler directive has been renamed PerlResponseHandler to better match the corresponding Apache phase name (response).

The only difference is that now it's possible to register HTTP request input and output filters, so PerlResponseHandler will filter its input and output through them. Figure 25-3 depicts the HTTP request cycle, which should be familiar to mod_perl 1.0 users, with the new addition of the request filters. From the diagram you can also see that the request filters are stacked on top of the connection filters. The request input filters filter only a request body, and the request output filters filter only a response body. Request and response headers can be accessed and modified using the $r->headers_in, $r->headers_out, and other methods.