Perl Specifics in the mod_perl Environment (Practical mod

6.4. Perl Specifics in the mod_perl Environment

In the following sections, we discuss the specifics of Perl's behavior under mod_perl.

6.4.1. exit( )

Perl's core exit( ) function shouldn't be used in mod_perl code. Calling it causes the mod_perl process to exit, which defeats the purpose of using mod_perl. The Apache::exit( ) function should be used instead. Starting with Perl Version 5.6.0, mod_perl overrides exit( ) behind the scenes using CORE::GLOBAL::, a new magical package.

The CORE:: Package

CORE:: is a special package that provides access to Perl's built-in functions. You may need to use this package to override some of the built-in functions. For example, if you want to override the exit( ) built-in function, you can do so with:

use subs qw(exit); exit( ) if $DEBUG; sub exit { warn "exit( ) was called"; }

Now when you call exit( ) in the same scope in which it was overridden, the program won't exit, but instead will just print a warning "exit( ) was called". If you want to use the original built-in function, you can still do so with:

# the 'real' exit CORE::exit( );

Apache::Registry and Apache::PerlRun override exit( ) with Apache::exit( ) behind the scenes; therefore, scripts running under these modules don't need to be modified to use Apache::exit( ).

If CORE::exit( ) is used in scripts running under mod_perl, the child will exit, but the current request won't be logged. More importantly, a proper exit won't be performed. For example, if there are some database handles, they will remain open, causing costly memory and (even worse) database connection leaks.

If the child process needs to be killed, Apache::exit(Apache::Constants::DONE)should be used instead. This will cause the server to exit gracefully, completing the logging functions and protocol requirements.

If the child process needs to be killed cleanly after the request has completed, use the $r->child_terminate method. This method can be called anywhere in the code, not just at the end. This method sets the value of the MaxRequestsPerChild configuration directive to 1 and clears the keepalive flag. After the request is serviced, the current connection is broken because of the keepalive flag, which is set to false, and the parent tells the child to cleanly quit because MaxRequestsPerChild is smaller than or equal to the number of requests served.

In an Apache::Registryscript you would write:

Apache->request->child_terminate;

and in httpd.conf:

PerlFixupHandler "sub { shift->child_terminate }"

You would want to use the latter example only if you wanted the child to terminate every time the registered handler was called. This is probably not what you want.

You can also use a post-processing handler to trigger child termination. You might do this if you wanted to execute your own cleanup code before the process exits:

my $r = shift;
$r->post_connection(\&exit_child);

sub exit_child {
    # some logic here if needed
    $r->child_terminate;
}

This is the code that is used by the Apache::SizeLimit module, which terminates processes that grow bigger than a preset quota.

6.4.2. die( )

die( ) is usually used to abort the flow of the program if something goes wrong. For example, this common idiom is used when opening files:

open FILE, "foo" or die "Cannot open 'foo' for reading: $!";

If the file cannot be opened, the script will die( ): script execution is aborted, the reason for death is printed, and the Perl interpreter is terminated.

You will hardly find any properly written Perl scripts that don't have at least one die( ) statement in them.

CGI scripts running under mod_cgi exit on completion, and the Perl interpreter exits as well. Therefore, it doesn't matter whether the interpreter exits because the script died by natural death (when the last statement in the code flow was executed) or was aborted by a die( )statement.

Under mod_perl, we don't want the process to quit. Therefore, mod_perl takes care of it behind the scenes, and die( ) calls don't abort the process. When die( ) is called, mod_perl logs the error message and calls Apache::exit( ) instead of CORE::die( ). Thus, the script stops, but the process doesn't quit. Of course, we are talking about the cases where the code calling die( ) is not wrapped inside an exception handler (e.g., an eval { } block) that traps die( ) calls, or the $SIG{_ _DIE_ _}sighandler, which allows you to override the behavior of die( ) (see Chapter 21). Section 6.13 at the end of this chapter mentions a few exception-handling modules available from CPAN.

6.4.3. Global Variable Persistence

Under mod_perl a child process doesn't exit after serving a single request. Thus, global variables persist inside the same process from request to request. This means that you should be careful not to rely on the value of a global variable if it isn't initialized at the beginning of each request. For example:

# the very beginning of the script
use strict;
use vars qw($counter);
$counter++;

relies on the fact that Perl interprets an undefined value of $counter as a zero value, because of the increment operator, and therefore sets the value to 1. However, when the same code is executed a second time in the same process, the value of $counter is not undefined any more; instead, it holds the value it had at the end of the previous execution in the same process. Therefore, a cleaner way to code this snippet would be:

use strict;
use vars qw($counter);
$counter = 0;
$counter++;

In practice, you should avoid using global variables unless there really is no alternative. Most of the problems with global variables arise from the fact that they keep their values across functions, and it's easy to lose track of which function modifies the variable and where. This problem is solved by localizing these variables with local( ). But if you are already doing this, using lexical scoping (with my( )) is even better because its scope is clearly defined, whereas localized variables are seen and can be modified from anywhere in the code. Refer to the perlsub manpage for more details. Our example will now be written as:

use strict;
my $counter = 0;
$counter++;

Note that it is a good practice to both declare and initialize variables, since doing so will clearly convey your intention to the code's maintainer.

You should be especially careful with Perl special variables, which cannot be lexically scoped. With special variables, local( ) must be used. For example, if you want to read in a whole file at once, you need to undef( ) the input record separator. The following code reads the contents of an entire file in one go:

open IN, $file or die $!;
$/ = undef;
$content = <IN>; # slurp the whole file in
close IN;

Since you have modified the special Perl variable $/ globally, it'll affect any other code running under the same process. If somewhere in the code (or any other code running on the same server) there is a snippet reading a file's content line by line, relying on the default value of $/ (\n), this code will work incorrectly. Localizing the modification of this special variable solves this potential problem:

{
  local $/; # $/ is undef now
  $content = <IN>; # slurp the whole file in
}

Note that the localization is enclosed in a block. When control passes out of the block, the previous value of $/ will be restored automatically.

6.4.4. STDIN, STDOUT, and STDERR Streams

Under mod_perl, both STDIN and STDOUT are tied to the socket from which the request originated. If, for example, you use a third-party module that prints some output to STDOUT when it shouldn't (for example, control messages) and you want to avoid this, you must temporarily redirect STDOUT to /dev/null. You will then have to restore STDOUT to the original handle when you want to send a response to the client. The following code demonstrates a possible implementation of this workaround:

{
    my $nullfh = Apache::gensym( );
    open $nullfh, '>/dev/null' or die "Can't open /dev/null: $!";
    local *STDOUT = $nullfh;
    call_something_thats_way_too_verbose( );
    close $nullfh;
}

The code defines a block in which the STDOUT stream is localized to print to /dev/null. When control passes out of this block, STDOUT gets restored to the previous value.

STDERR is tied to a file defined by the ErrorLog directive. When native syslog support is enabled, the STDERRstream will be redirected to /dev/null.

6.4.5. Redirecting STDOUT into a Scalar Variable

Sometimes you encounter a black-box function that prints its output to the default file handle (usually STDOUT) when you would rather put the output into a scalar. This is very relevant under mod_perl, where STDOUT is tied to the Apache request object. In this situation, the IO::String package is especially useful. You can re-tie( ) STDOUT (or any other file handle) to a string by doing a simple select( ) on the IO::String object. Call select( ) again at the end on the original file handle to re-tie( ) STDOUT back to its original stream:

my $str;
my $str_fh = IO::String->new($str);

my $old_fh = select($str_fh);
black_box_print( );
select($old_fh) if defined $old_fh;

In this example, a new IO::String object is created. The object is then selected, the black_box_print( ) function is called, and its output goes into the string object. Finally, we restore the original file handle, by re-select( ) ing the originally selected file handle. The $str variable contains all the output produced by the black_box_print( ) function.

6.4.6. print( )

Under mod_perl, CORE::print( ) (using either STDOUT as a filehandle argument or no filehandle at all) will redirect output to Apache::print( ), since the STDOUT file handle is tied to Apache. That is, these two are functionally equivalent:

print "Hello";
$r->print("Hello");

Apache::print( ) will return immediately without printing anything if $r->connection->aborted returns true. This happens if the connection has been aborted by the client (e.g., by pressing the Stop button).

There is also an optimization built into Apache::print( ): if any of the arguments to this function are scalar references to strings, they are automatically dereferenced. This avoids needless copying of large strings when passing them to subroutines. For example, the following code will print the actual value of $long_string:

my $long_string = "A" x 10000000;
$r->print(\$long_string);

To print the reference value itself, use a double reference:

$r->print(\\$long_string);

When Apache::print( )sees that the passed value is a reference, it dereferences it once and prints the real reference value:

SCALAR(0x8576e0c)

6.4.7. Formats

The interface to file handles that are linked to variables with Perl's tie( ) function is not yet complete. The format( ) and write( ) functions are missing. If you configure Perl with sfio, write( ) and format( )should work just fine.

Instead of format( ), you can use printf( ). For example, the following formats are equivalent:

format   printf
---------------
##.##    %2.2f
####.##  %4.2f

To print a string with fixed-length elements, use the printf( ) format %n.ms where n is the length of the field allocated for the string and m is the maximum number of characters to take from the string. For example:

printf "[%5.3s][%10.10s][%30.30s]\n",
       12345, "John Doe", "1234 Abbey Road"

prints:

[  123][  John Doe][                1234 Abbey Road]

Notice that the first string was allocated five characters in the output, but only three were used because m=5 and n=3 (%5.3s). If you want to ensure that the text will always be correctly aligned without being truncated, n should always be greater than or equal to m.

You can change the alignment to the left by adding a minus sign (-) after the %. For example:

printf "[%-5.5s][%-10.10s][%-30.30s]\n",
       123, "John Doe", "1234 Abbey Road"

prints:

[123  ][John Doe  ][1234 Abbey Road                ]

You can also use a plus sign (+) for the right-side alignment. For example:

printf "[%+5s][%+10s][%+30s]\n",
       123, "John Doe", "1234 Abbey Road"

prints:

[  123][  John Doe][                1234 Abbey Road]

Another alternative to format( ) and printf( ) is to use the Text::Reform module from CPAN.

In the examples above we've printed the number 123 as a string (because we used the %s format specifier), but numbers can also be printed using numeric formats. See perldoc -f sprintf for full details.

6.4.8. Output from System Calls

The output of system( ), exec( ), and open(PIPE,"|program") calls will not be sent to the browser unless Perl was configured with sfio. To learn if your version of Perl is sfio-enabled, look at the output of the perl -V command for the useperlio and d_sfio strings.

You can use backticks as a possible workaround:

print `command here`;

But this technique has very poor performance, since it forks a new process. See the discussion about forking in Chapter 10.

6.4.9. BEGIN blocks

Perl executes BEGIN blocks as soon as possible, when it's compiling the code. The same is true under mod_perl. However, since mod_perl normally compiles scripts and modules only once, either in the parent process or just once per child, BEGIN blocks are run only once. As the perlmod manpage explains, once a BEGIN block has run, it is immediately undefined. In the mod_perl environment, this means that BEGIN blocks will not be run during the response to an incoming request unless that request happens to be the one that causes the compilation of the code. However, there are cases when BEGIN blocks will be rerun for each request.

BEGIN blocks in modules and files pulled in via require( ) or use( ) will be executed:

Only once, if pulled in by the parent process.
Once per child process, if not pulled in by the parent process.
One additional time per child process, if the module is reloaded from disk by Apache::StatINC.
One additional time in the parent process on each restart, if PerlFreshRestart is On.
On every request, if the module with the BEGIN block is deleted from %INC, before the module's compilation is needed. The same thing happens when do( ) is used, which loads the module even if it's already loaded.

BEGIN blocks in Apache::Registry scripts will be executed:

Only once, if pulled in by the parent process via Apache::RegistryLoader.
Once per child process, if not pulled in by the parent process.
One additional time per child process, each time the script file changes on disk.
One additional time in the parent process on each restart, if pulled in by the parent process via Apache::RegistryLoader and PerlFreshRestart is On.

Note that this second list is applicable only to the scripts themselves. For the modules used by the scripts, the previous list applies.

6.4.10. END Blocks

As the perlmod manpage explains, an ENDsubroutine is executed when the Perl interpreter exits. In the mod_perl environment, the Perl interpreter exits only when the child process exits. Usually a single process serves many requests before it exits, so END blocks cannot be used if they are expected to do something at the end of each request's processing.

If there is a need to run some code after a request has been processed, the $r->register_cleanup( ) function should be used. This function accepts a reference to a function to be called during the PerlCleanupHandler phase, which behaves just like the END block in the normal Perl environment. For example:

$r->register_cleanup(sub { warn "$$ does cleanup\n" });

or:

sub cleanup { warn "$$ does cleanup\n" };
$r->register_cleanup(\&cleanup);

will run the registered code at the end of each request, similar to END blocks under mod_cgi.

As you already know by now, Apache::Registry handles things differently. It does execute all END blocks encountered during compilation of Apache::Registryscripts at the end of each request, like mod_cgi does. That includes any END blocks defined in the packages use( ) d by the scripts.

If you want something to run only once in the parent process on shutdown and restart, you can use register_cleanup( ) in startup.pl:

warn "parent pid is $$\n";
Apache->server->register_cleanup(
    sub { warn "server cleanup in $$\n" });

This is useful when some server-wide cleanup should be performed when the server is stopped or restarted.