And in fact that's what we see when we execute this
script for the first time. But let's reload it a few
times.... After a few reloads, the counter suddenly stops counting
from 1. As we continue to reload, we see that it keeps on growing,
but not steadily, starting almost randomly at 10, 10, 10, 15, 20...,
which makes no sense at all!
6.2.1. The First Mystery: Why Does the Script Go Beyond 5?
If we look at
the error_log file (we did
enable warnings), we'll see something like this:
Variable "$counter" will not stay shared
at /home/httpd/perl/counter.pl line 13.
This warning is generated when a script contains a named (as opposed
to an anonymous) nested subroutine that refers to a lexically scoped
(with my( )) variable defined outside this nested
subroutine.
Do you see a nested named subroutine in our script? We
don't! What's going on? Maybe
it's a bug in Perl? But wait, maybe the Perl
interpreter sees the script in a different way! Maybe the code goes
through some changes before it actually gets executed? The easiest
way to check what's actually happening is to run the
script with a debugger.
Since we must debug the script when it's being
executed by the web server, a normal debugger won't
help, because the debugger has to be invoked from within the web
server. Fortunately, we can use Doug MacEachern's
Apache::DB module to debug our script. While
Apache::DB allows us to debug the code
interactively (as we will show in Chapter 21), we
will use it noninteractively in this example.
To enable the debugger, modify the
httpd.conf file in the following way:
PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2"
PerlModule Apache::DB
<Location /perl>
PerlFixupHandler Apache::DB
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
</Location>
We have added a debugger configuration setting using the
PERLDB_OPTS
environment
variable, which has the same effect as calling the debugger from the
command line. We have also loaded and enabled
Apache::DB as a
PerlFixupHandler.
In addition, we'll load the Carp
module, using <Perl>sections (this could
also be done in the startup.pl file):
<Perl>
use Carp;
</Perl>
After applying the changes, we restart the server and issue a request
to /perl/counter.pl, as before. On the surface,
nothing has changed; we still see the same output as before. But two
things have happened in the background:
-
The file /tmp/db.out was written, with a
complete trace of the code that was executed.
-
Since we have loaded the Carp module, the
error_log file now contains the real code that
was actually executed. This is produced as a side effect of reporting
the "Variable
"$counter" will not stay shared
at..." warning that we saw earlier.
Here is the code that was actually executed:
package Apache::ROOT::perl::counter_2epl;
use Apache qw(exit);
sub handler {
BEGIN {
$^W = 1;
};
$^W = 1;
use strict;
print "Content-type: text/plain\n\n";
my $counter = 0;
for (1..5) {
increment_counter( );
}
sub increment_counter {
$counter++;
print "Counter is equal to $counter !\n";
}
}
Note that the code in error_log
wasn't indented—we've
indented it to make it obvious that the code was wrapped inside the
handler( )subroutine.
From looking at this code, we learn that every
Apache::Registryscript is cached under a package
whose name is formed from the Apache::ROOT::
prefix and the script's URI
(/perl/counter.pl) by replacing all occurrences
of / with :: and
. with _2e.
That's how mod_perl knows which script should be
fetched from the cache on each request—each script is
transformed into a package with a unique name and with a single
subroutine named handler( ), which includes all
the code that was originally in the script.
Essentially, what's happened is that because
increment_counter( ) is a subroutine that refers
to a lexical variable defined outside of its scope, it has become a
closure. Closures
don't normally trigger warnings, but in this case we
have a nested subroutine. That means that the first time the
enclosing subroutine handler( ) is called, both
subroutines are referring to the same variable, but after that,
increment_counter( ) will keep its own copy of
$counter (which is why $counter
is not shared) and increment its own copy.
Because of this, the value of $counter keeps
increasing and is never reset to 0.
If we were to use the
diagnostics
pragma in the script, which by default turns terse warnings into
verbose warnings, we would see a reference to an inner (nested)
subroutine in the text of the warning. By observing the code that
gets executed, it is clear that increment_counter(
) is a named nested subroutine since it gets defined inside
the handler( )subroutine.
Any subroutine defined in the body of the script executed under
Apache::Registry becomes a nested subroutine. If the code
is placed into
a library or a module that the
script require( )s or use( )s,
this effect doesn't occur.
For example, if we move the code from the script into the subroutine
run( ), place the subroutines in the
mylib.pl file, save it in the same directory as
the script itself, and require( ) it, there will
be no problem at all.[29]
Examples Example 6-1 and Example 6-2 show how we spread the code across the two
files.
[29]Don't forget
the 1; at the end of the library, or the
require( ) call might fail.
Example 6-1. mylib.pl
my $counter;
sub run {
$counter = 0;
for (1..5) {
increment_counter( );
}
}
sub increment_counter {
$counter++;
print "Counter is equal to $counter !\n";
}
1;
Example 6-2. counter.pl
use strict;
require "./mylib.pl";
print "Content-type: text/plain\n\n";
run( );
This solution is the easiest and fastest way to solve the nested
subroutine problem. All you have to do is to move the code into a
separate file, by first wrapping the initial code into some function
that you later call from the script, and keeping the lexically scoped
variables that could cause the problem out of this function.
As a general rule, it's best to put all the code in
external libraries (unless the script is very short) and have only a
few lines of code in the main script. Usually the main script simply
calls the main function in the library, which is often called
init( ) or run( ). This way,
you don't have to worry about the effects of named
nested subroutines.
As we will show later in this chapter, however, this quick solution
might be problematic on a different front. If you have many scripts,
you might try to move more than one script's code
into a file with a similar filename, like
mylib.pl.
A much cleaner solution would be to
spend a little bit more time on the porting process and use a fully
qualified package, as in
Examples Example 6-3 and Example 6-4.
Example 6-3. Book/Counter.pm
package Book::Counter;
my $counter = 0;
sub run {
$counter = 0;
for (1..5) {
increment_counter( );
}
}
sub increment_counter {
$counter++;
print "Counter is equal to $counter !<BR>\n";
}
1;
_ _END_ _
Example 6-4. counter-clean.pl
use strict;
use Book::Counter;
print "Content-type: text/plain\n\n";
Book::Counter::run( );
As you can see, the only difference is in the package declaration. As
long as the package name is unique, you won't
encounter any collisions with other scripts running on the same
server.
Another solution to this problem is to change the
lexical variables to global
variables. There are two ways global variables can be used:
-
Using the vars pragma. With the
use strict 'vars' setting, global variables can be
used after being declared with vars. For example,
this code:
use strict;
use vars qw($counter $result);
# later in the code
$counter = 0;
$result = 1;
is similar to this code if use strict is not used:
$counter = 0;
$result = 1;
However, the former style of coding is much cleaner, because it
allows you to use global variables by declaring them, while avoiding
the problem of misspelled variables being treated as undeclared
globals.
The only drawback to using vars is that each
global declared with it consumes more memory than the undeclared but
fully qualified globals, as we will see in the next item.
-
Using fully qualified variables. Instead of using
$counter, we can use
$Foo::counter, which will place the global
variable $counter into the package
Foo. Note that we don't know
which package name Apache::Registry will assign to
the script, since it depends on the location from which the script
will be called. Remember that globals must always be initialized
before they can be used.
Perl 5.6.x also introduces a third way, with the our(
) declaration. our(
) can be used in different scopes, similar to my(
), but it creates global variables.
Finally, it's possible to avoid this problem
altogether by always passing the
variables as arguments to the
functions (see Example 6-5).
Example 6-5. counter2.pl
#!/usr/bin/perl -w
use strict;
print "Content-type: text/plain\n\n";
my $counter = 0;
for (1..5) {
$counter = increment_counter($counter);
}
sub increment_counter {
my $counter = shift;
$counter++;
print "Counter is equal to $counter !\n";
return $counter;
}
In this case, there is no variable-sharing problem. The drawback is
that this approach adds the overhead of passing and returning the
variable from the function. But on the other hand, it ensures that
your code is doing the right thing and is not dependent on whether
the functions are wrapped in other blocks, which is the case with the
Apache::Registry handlers family.
When Stas (one of the authors of this book) had just started using
mod_perl and wasn't aware of the nested subroutine
problem, he happened to write a pretty complicated registration
program that was run under mod_perl. We will reproduce here only the
interesting part of that script:
use CGI;
$q = CGI->new;
my $name = $q->param('name');
print_response( );
sub print_response {
print "Content-type: text/plain\n\n";
print "Thank you, $name!";
}
Stas and his boss checked the program on the development server and
it worked fine, so they decided to put it in production. Everything
seemed to be normal, but the boss decided to keep on checking the
program by submitting variations of his profile using The
Boss as his username. Imagine his surprise when, after a
few successful submissions, he saw the response
"Thank you,
Stas!" instead of
"Thank you, The
Boss!"
After investigating the problem, they learned that they had been hit
by the nested subroutine problem. Why didn't they
notice this when they were trying the software on their development
server? We'll explain shortly.
To conclude this first mystery, remember to keep the
warnings mode On on the
development server and to watch the error_log
file for warnings.
6.2.2. The Second Mystery—Inconsistent Growth over Reloads
Let's return to our original example and
proceed with the
second mystery we noticed. Why have we seen inconsistent results over
numerous reloads?
What happens is that each time the parent process gets a request for
the page, it hands the request over to a child process. Each child
process runs its own copy of the script. This means that each child
process has its own copy of $counter, which will
increment independently of all the others. So not only does the value
of each $counter increase independently with each
invocation, but because different children handle the requests at
different times, the increment seems to grow inconsistently. For
example, if there are 10 httpd children, the
first 10 reloads might be correct (if each request went to a
different child). But once reloads start reinvoking the script from
the child processes, strange results will appear.
Moreover, requests can appear at random since child processes
don't always run the same requests. At any given
moment, one of the children could have served the same script more
times than any other, while another child may never have run it.
Stas and his boss didn't discover the aforementioned
problem with the user registration system before going into
production because the error_log file was too
crowded with warnings continuously logged by multiple child
processes.
To immediately recognize the problem visually (so you can see
incorrect results), you need to run the server as a
single process. You
can do this by invoking the server with the -X
option:
panic% httpd -X
Since there are no other servers (children) running, you will get the
problem report on the second reload.
Enabling the warnings mode (as explained earlier
in this chapter) and monitoring the error_log
file will help you detect most of the possible errors. Some warnings
can become errors, as we have just seen. You should check every
reported warning and eliminate it, so it won't
appear in error_log again. If your
error_log file is filled up with hundreds of
lines on every script invocation, you will have difficulty noticing
and locating real problems, and on a production server
you'll soon run out of disk space if your site is
popular.