6.5.1. $^T and time( )
Under mod_perl, processes don't quit after serving a
single request. Thus, $^T
gets initialized to the server startup time and retains this value
throughout the process's life. Even if you
don't use this variable directly,
it's important to know that Perl refers to the value
of $^T internally.
For example, Perl uses $^T with the
-M, -C, or
-A file test operators. As a result, files created
after the child server's startup are reported as
having a negative age when using those operators.
-M returns the age of the script file relative to
the value of the $^Tspecial variable.
If you want to have -M report the
file's age relative to the current request, reset
$^T, just as in any other Perl script. Add the
following line at the beginning of your scripts:
local $^T = time;
You can also do:
local $^T = $r->request_time;
The second technique is better performance-wise, as it skips the
time( ) system call and uses the timestamp of the
request's start time, available via the
$r->request_time method.
If this correction needs to be applied to a lot of handlers, a more
scalable solution is to specify a fixup handler, which will be
executed during the fixup stage:
sub Apache::PerlBaseTime::handler {
$^T = shift->request_time;
return Apache::Constants::DECLINED;
}
and then add the following line to httpd.conf:
PerlFixupHandler Apache::PerlBaseTime
Now no modifications to the content-handler code and scripts need to
be performed.
6.5.2. Command-Line Switches
When a Perl
script
is run from the command line, the shell invokes the Perl interpreter
via the
#!/bin/perl directive, which is the first line of
the script (sometimes referred to as the shebang
line). In scripts running under mod_cgi, you may use Perl
switches as described in the perlrun manpage,
such as -w, -T, or
-d. Under the
Apache::Registry handlers family, all switches
except -w are ignored (and use of the
-T switch triggers a warning). The support for
-w was added for backward compatibility with
mod_cgi.
Most command-line switches have special Perl variable equivalents
that allow them to be set/unset in code. Consult the
perlvar manpage for more details.
mod_perl provides its own equivalents to -w and
-T in the form of configuration directives, as
we'll discuss presently.
Finally, if you still need to set additional Perl startup flags, such
as -d and -D, you can use
the PERL5OPT environment variable. Switches in
this variable are treated as if they were on every Perl command line.
According to the perlrun manpage, only the
-[DIMUdmw] switches are allowed.
6.5.2.1. Warnings
There
are
three ways to enable warnings:
- Globally to all processes
-
In httpd.conf, set:
PerlWarn On
You can then fine-tune your code, turning warnings off and on by
setting the $^W variable in your scripts.
- Locally to a script
-
Including the following line:
#!/usr/bin/perl -w
will turn warnings on for the scope of the script. You can turn them
off and on in the script by setting the $^W
variable, as noted above.
- Locally to a block
-
This code turns warnings on for the scope of the block:
{
local $^W = 1;
# some code
}
# $^W assumes its previous value here
This turns warnings off:
{
local $^W = 0;
# some code
}
# $^W assumes its previous value here
If $^W
isn't properly localized, this code will affect the
current request and all subsequent requests processed by this child.
Thus:
$^W = 0;
will turn the warnings off, no matter what.
If you want to turn warnings on for the scope of the whole file, as
in the previous item, you can do this by adding:
local $^W = 1;
at the beginning of the file. Since a file is effectively a block,
file scope behaves like a block's curly braces
({ }), and local $^W at the
start of the file will be effective for the whole file.
While having warnings mode turned on is essential for a development
server, you should turn it globally off on a production server.
Having warnings enabled
introduces a non-negligible performance
penalty. Also, if every request served generates one warning, and
your server processes millions of requests per day, the
error_log file will eat up all your disk space
and the system won't be able to function normally
anymore.
Perl 5.6.x introduced the
warnings
pragma,
which allows very flexible control over warnings. This pragma allows
you to enable and disable groups of warnings. For example, to enable
only the syntax warnings, you can use:
use warnings 'syntax';
Later in the code, if you want to disable syntax warnings and enable
signal-related warnings, you can use:
no warnings 'syntax';
use warnings 'signal';
But usually you just want to use:
use warnings;
which is the equivalent of:
use warnings 'all';
If you want your code to be really
clean and consider all warnings
as errors, Perl will help you to do that. With the following code,
any warning in the lexical scope of the definition will trigger a
fatal error:
use warnings FATAL => 'all';
Of course, you can fine-tune the groups of warnings and make only
certain groups of warnings fatal. For example, to make only closure
problems fatal, you can use:
use warnings FATAL => 'closure';
Using the warnings pragma, you can also disable
warnings locally:
{
no warnings;
# some code that would normally emit warnings
}
In this way, you can avoid some warnings that you are aware of but
can't do anything about.
For more information about the warnings pragma,
refer to the perllexwarn manpage.
6.5.2.2. Taint mode
Perl's -T switch enables taint
mode.
In taint mode, Perl performs some checks on
how your program is using the data passed to it. For example, taint
checks prevent your program from passing some external data to a
system call without this data being explicitly checked for nastiness,
thus avoiding a fairly large number of common security holes. If you
don't force all your scripts and handlers to run
under taint mode, it's more likely that
you'll leave some holes to be exploited by malicious
users. (See Chapter 23 and the
perlsec manpage for more information. Also read
the re pragma's manpage.)
Since the -Tswitch can't be
turned on from within Perl (this is because when Perl is running,
it's already too late to mark
all external data as tainted), mod_perl provides
the
PerlTaintCheck directive to turn on taint checks
globally. Enable this mode with:
PerlTaintCheck On
anywhere in httpd.conf (though
it's better to place it as early as possible for
clarity).
For more information on taint checks and how to untaint data, refer
to the perlsec manpage.
6.5.3. Compiled Regular Expressions
When
using a regular expression containing an interpolated Perl variable
that you are confident will not change during the execution of the
program, a standard speed-optimization technique is to add the
/o modifier to the regex pattern.
This compiles the regular expression once, for the entire lifetime of
the script, rather than every time the pattern is executed. Consider:
my $pattern = '^\d+$'; # likely to be input from an HTML form field
foreach (@list) {
print if /$pattern/o;
}
This is usually a big win in loops over lists, or when using the
grep( ) or map( ) operators.
In long-lived mod_perl scripts and handlers, however, the variable
may change with each invocation. In that case, this memorization can
pose a problem. The first request processed by a fresh mod_perl child
process will compile the regex and perform the search correctly.
However, all subsequent requests running the same code in the same
process will use the memorized pattern and not the fresh one supplied
by users. The code will appear to be broken.
Imagine that you run a search engine service, and one person enters a
search keyword of her choice and finds what she's
looking for. Then another person who happens to be served by the same
process searches for a different keyword, but unexpectedly receives
the same search results as the previous person.
There are two solutions to this problem.
The first solution is to use the eval
q// construct to force the code to be
evaluated each time it's run. It's
important that the eval block covers the entire
processing loop, not just the pattern match itself.
The original code fragment would be rewritten as:
my $pattern = '^\d+$';
eval q{
foreach (@list) {
print if /$pattern/o;
}
}
If we were to write this:
foreach (@list) {
eval q{ print if /$pattern/o; };
}
the regex would be compiled for every element in the list, instead of
just once for the entire loop over the list (and the
/o modifier would essentially be useless).
However, watch out for using strings coming from an untrusted origin
inside eval—they might contain Perl code
dangerous to your system, so make sure to sanity-check them first.
This approach can be used if there is more than one pattern-match
operator in a given section of code. If the section contains only one
regex operator (be it m// or
s///), you can rely on the property of the
null pattern, which reuses the last pattern
seen. This leads to the second solution, which also eliminates the
use of eval.
The above code fragment becomes:
my $pattern = '^\d+$';
"0" =~ /$pattern/; # dummy match that must not fail!
foreach (@list) {
print if //;
}
The only caveat is that the dummy match that boots the regular
expression engine mustsucceed—otherwise
the pattern will not be cached, and the // will
match everything. If you can't count on fixed text
to ensure the match succeeds, you have two options.
If you can guarantee that the pattern variable contains no
metacharacters (such as *, +,
^, $, \d,
etc.), you can use the dummy match of the pattern itself:
$pattern =~ /\Q$pattern\E/; # guaranteed if no metacharacters present
The \Q modifier ensures that any special regex
characters will be escaped.
If there is a possibility that the pattern contains metacharacters,
you should match the pattern itself, or the nonsearchable
\377 character, as follows:
"\377" =~ /$pattern|^\377$/; # guaranteed if metacharacters present
6.5.3.1. Matching patterns repeatedly
Another technique may also be used,
depending on the complexity of the regex to which it is applied. One
common situation in which a compiled regex is usually more efficient
is when you are matching any one of a group of patterns over and over
again.
To make this approach easier to use, we'll use a
slightly modified helper routine from Jeffrey
Friedl's book Mastering Regular
Expressions (O'Reilly):
sub build_match_many_function {
my @list = @_;
my $expr = join '||',
map { "\$_[0] =~ m/\$list[$_]/o" } (0..$#list);
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @list: $@" if $@;
return $matchsub;
}
This function accepts a list of patterns as an argument, builds a
match regex for each item in the list against
$_[0], and uses the logical ||
(OR) operator to stop the matching when the first match succeeds. The
chain of pattern matches is then placed into a string and compiled
within an anonymous subroutine using eval. If
eval fails, the code aborts with die(
); otherwise, a reference to this subroutine is returned to
the caller.
Here is how it can be used:
my @agents = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
my $known_agent_sub = build_match_many_function(@agents);
while (<ACCESS_LOG>) {
my $agent = get_agent_field($_);
warn "Unknown Agent: $agent\n"
unless $known_agent_sub->($agent);
}
This code takes lines of log entries from the
access_log file already opened on the
ACCESS_LOG file handle, extracts the agent field
from each entry in the log file, and tries to match it against the
list of known agents. Every time the match fails, it prints a warning
with the name of the unknown agent.
An alternative approach is to use the
qr// operator, which is used to compile a
regex. The previous example can be rewritten as:
my @agents = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
my @compiled_re = map qr/$_/, @agents;
while (<ACCESS_LOG>) {
my $agent = get_agent_field($_);
my $ok = 0;
for my $re (@compiled_re) {
$ok = 1, last if /$re/;
}
warn "Unknown Agent: $agent\n"
unless $ok;
}
In this code, we compile the patterns once before we use them,
similar to build_match_many_function( ) from the
previous example, but now we save an extra call to a subroutine. A
simple benchmark shows that this example is about 2.5 times faster
than the previous one.