You may find the modules discussed in this section very useful while
debugging your code. They will allow you to learn a lot about Perl
internals.
21.8.1. B::Deparse
Perl optimizes many things away at compile time, which
explains why Perl is so fast under mod_perl. If you want to see what
code is actually executed at runtime, use the
-MO=Deparse Perl option.
For example, if you aren't sure whether Perl will do
what you expect it to, it will show you what Perl is really going to
do. Consider this trap we discussed earlier:
open IN, "filename" || die $!;
This looks like perfectly valid code, and indeed it compiles without
any errors, but let's see what Perl is executing:
panic% perl -MO=Deparse -e 'open IN, "filename" || die $!'
open IN, 'filename';
As you can see, the die( ) part was optimized
away. open( ) is a list operator (since it accepts
a list of arguments), and list operators have lower precedence than
the || operator. Therefore, Perl executes the
following:
open IN, ("filename" || die $!);
Since in our example we have used "filename",
which is a true value, the rest of the expression in the parentheses
above is discarded. The code is reduced to:
open IN, "filename";
at compile time. So if the file cannot be opened for some reason, the
program will never call die( ), since Perl has
removed this part of the statement.
To do the right thing you should either use parentheses explicitly to
specify the order of execution or use the low-precedence
or operator. Both examples below will do the right
thing:
panic% perl -MO=Deparse -e 'open(IN, "filename") || die $!'
die $! unless open IN, 'filename';
panic% perl -MO=Deparse -e 'open IN, "filename" or die $!'
die $! unless open IN, 'filename';
As you can see, Perl compiles both sources into exactly the same code.
Notice that if the "filename" argument is not
true, the code gets compiled to this:
panic% perl -MO=Deparse,-p -e 'open IN, "" || die $!'
open(IN, die($!));
which causes the program to die($!) without any
reason in $!:
panic% perl -e 'open IN, "" || die $!'
Died at -e line 1.
while if we do the right thing, we should see the reason for the
open( ) failure:
panic% perl -e 'open IN, "" or die $!'
No such file or directory at -e line 1.
Also consider:
panic% perl -MO=Deparse,-p -e 'select MYSTD || die $!'
select(MYSTD);
Since select( ) always returns a true value, the
right part of the expression will never be executed. Therefore, Perl
optimizes it away. In the case of select( ), it
always returns the currently selected file handle, and there always
is one.
We have used this cool -MO=Deparse technique
without explaining it so far. B::Deparse is a
backend module for the Perl compiler that generates Perl source code,
based on the internal compiled structure that Perl itself creates
after parsing a program. Therefore, you may find it useful while
developing and debugging your code. We will show here one more useful
thing it does. See its manpage for an extensive usage manual.
When you use the -p option, the output also
includes parentheses (even when they are not required by precedence),
which can make it easy to see if Perl is parsing your expressions the
way you intended. If we repeat the last example:
panic% perl -MO=Deparse,-p -e 'open IN, "filename" or die $!'
(open(IN, 'filename') or die($!));
we can see the exact execution precedence. For example, if you are
writing constructor code that can serve as a class method and an
instance method, so you can instantiate objects in both ways:
my $cool1 = PerlCool->new( );
my $cool2 = $cool1->new( );
and you are unsure whether you can write this:
package PerlCool;
sub new {
my $self = shift;
my $type = ref $self || $self;
return bless { }, type;
}
or whether you have to put in parentheses:
my $type = ref ($self) || $self;
you can use B::Deparse to verify your assumptions:
Indeed, ref( ) has a higher precedence than
||, and therefore this code will do the right
thing:
my $type = ref $self || $self;
On the other hand, it might confuse other readers of your code, or
even yourself some time in the future, so if you are unsure about
code readability, use the parentheses.
Of course, if you forget the simple mathematical operations
precedences, you can ask the backend compiler to help you. This one
is obvious:
Just as we explained earlier, the and operator has
a lower precendence than the = operator. We can
explicitly see this in the output of B::Deparse,
which rewrites the statement in a less obscure way.
Of course, it's worth learning the precedences of
the Perl operators from the perlop manpage so
you don't have to resort to usingB::Deparse.
21.8.2. -D Runtime Option
You can watch your code as
it's being compiled and executed by Perl via the
-D runtime option. Perl will generate different
output according to the extra options (letters or numbers) specified
after -D. You can supply one or more options at
the same time. Here are the available options for Perl Version 5.6.0
(reproduced from the perlrun manpage):
1 p Tokenizing and parsing
2 s Stack snapshots
4 l Context (loop) stack processing
8 t Trace execution
16 o Method and overloading resolution
32 c String/numeric conversions
64 P Print preprocessor command for -P
128 m Memory allocation
256 f Format processing
512 r Regular expression parsing and execution
1024 x Syntax tree dump
2048 u Tainting checks
4096 L Memory leaks (needs -DLEAKTEST when compiling Perl)
8192 H Hash dump -- usurps values( )
16384 X Scratchpad allocation
32768 D Cleaning up
65536 S Thread synchronization
Let's look at some of these options. Consider this
one-line example:
which simply substitutes the string
"foo" with the string
"bar" in the variable
$_ and prints out its value. Now
let's see how Perl compiles and executes the regular
expression substitution part of the code. We will use
Perl's -Dr (or
-D512) option:
panic% perl -Dr -le '$_="yafoo"; s/foo/bar/; print'
Compiling REx 'foo'
size 3 first at 1
rarest char f at 0
1: EXACT <foo>(3)
3: END(0)
anchored 'foo' at 0 (checking anchored isall) minlen 3
Omitting $` $& $' support.
EXECUTING...
Guessing start of match, REx 'foo' against 'yafoo'...
Found anchored substr 'foo' at offset 2...
Starting position does not contradict /^/m...
Guessed: match at offset 2
Matching REx 'foo' against 'foo'
Setting an EVAL scope, savestack=3
2 <ya> <foo> | 1: EXACT <foo>
5 <yafoo> <> | 3: END
Match successful!
yabar
Freeing REx: 'foo'
As you can see, there are two stages: compilation and execution.
During the compilation stage, Perl records the stages it should go
through when matching the string, notes what length it should match
for, and notes whether one of the $',
$&, or $' special variables
will be used.[57]
During the execution we can see the actual process of matching. In
our example the match was successful.
[57]You should avoid using these at all,
since they add a performance hit, and once used in any regular
expression they will be set in every other regular expression, even
if you didn't ask for them.
The trace doesn't mention the
replace segment of the s///
construct, since it's not a part of the regular
expression per se.
The -Dx (or -D1024) option
tells Perl to print the syntax tree dump. We'll use
some very simple code so the execution tree will not be too long to
be presented here:
This code shows us the tree of opcodes after the compilation process.
Each opcode is prefixed with a number, which then is used to
determine the order of execution. You can see that each opcode is
linked to some other opcode (a number following the = =
=> tag). If you start from the opcode numbered 1, jump
to the opcode it's linked to (2, in this example),
and continue this way, you will see the execution pass of the code.
Since the code might have conditional branches, Perl cannot
predetermine a definite order at compile time; therefore, when you
follow the execution, the numbers will not necessarily be in
sequence.
Of course, internally Perl uses opcode pointers in memory, not
numbers. Numbers are used in the debug printout only for our
convenience.
Another interesting fact that we learn from this output is that Perl
optimizes everything it can at compile time. For example, when you
need to know how many seconds are in 12 hours, you could calculate it
manually and use the resulting number. But, as we see from:
SV = IV(43200)
Perl has already done the calculation at compile time, so no runtime
overhead occurs if you say 12*60*60 and not
43200. The former is also more self-explanatory,
while the latter may require an explicit comment to tell us what it
is.
Now let's bundle a few other options together and
see a subroutine argument stack snapshot via s,
context stack processing via l, and trace
execution via t all at once:
panic% perl -Dtls -le 'print 12*60*60'
=>
(-e:1) const(IV(12))
=> IV(12)
(-e:1) const(IV(60))
=> IV(12) IV(60)
(-e:1) multiply
=>
(-e:1) const(IV(720))
=> IV(720)
(-e:1) const(IV(60))
=> IV(720) IV(60)
(-e:1) multiply
(-e:1) ENTER scope 2 at op.c:6501
(-e:1) LEAVE scope 2 at op.c:6811
(-e:0) LEAVE scope 1 at perl.c:1319
(-e:0) ENTER scope 1 at perl.c:1327
(-e:0) Setting up jumplevel 0xbffff8cc, was 0x40129f40
You can see how Perl pushes constants 12 and 60 onto an argument
stack, executes multiply( ), gets a result of 720,
pushes it back onto the stack, pushes 60 again, and executes another
multiplication. The tracing and argument stack options show us this
information. All this happens at compile time.
In addition, we see a number of scope entering and leaving messages,
which come from the context stack status report. These options might
be helpful when you want to see Perl entering and leaving block
scopes (loops, subroutines, files, etc.). As you can see, bundling a
few options together gives very useful reports.
Since we have been using command-line execution rather than code
placed in the file, Perl uses -e as the
code's filename. Line 0 doesn't
exist; it's used for special purposes.
Having finished the compilation, now we proceed to the execution part:
EXECUTING...
=>
(-e:0) enter
(-e:0) ENTER scope 2 at pp_hot.c:1535
Entering block 0, type BLOCK
=>
(-e:0) nextstate
=>
(-e:1) pushmark => * (-e:1) const(IV(43200)) => * IV(43200) (-e:1) print43200 => SV_YES
(-e:1) leave
Leaving block 0, type BLOCK
(-e:0) LEAVE scope 2 at pp_hot.c:1657
(-e:0) LEAVE scope 1 at perl.c:395
Here you can see what Perl does on each line of your source code. So
basically the gist of this code (bolded in the example) is pushing
the constant integer scalar (const(IV)) onto the
execution stack, and then calling print( ). The
SV_YESsymbol indicates that print(
) returns a scalar value. The rest of the output consists
of controlling messages, where Perl changes scopes.
Of course, as the code gets more complicated, the traces will get
longer and trickier to understand. But sometimes these traces can be
as indispensable as interactive debugging.
You can use the -D[letter|number] techniques
from within mod_perl as well by setting the
PERL5OPT environment variable. For example, using
the bashshell to see the compilation and
execution traces, you can start the server in this way:
panic% PERL5OPT=-Dt ./httpd_perl -X
You will see a lot of output while the server starts. Once it
finishes the tracing, open the error_log file
and issue a request to your code. The tracing output will show up in
this file.
21.8.3. Devel::Peek and Apache::Peek
Devel::Peek is
a very useful module for looking at the
Perl internals. It's especially useful for debugging
XS code. With Devel::Peek we can look at Perl
variables' data structures. This code:
use Devel::Peek;
my $x = 'mod_perl rules';
Dump $x;
prints:
SV = PV(0x804c674) at 0x80571fc
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x805ce78 "mod_perl rules"\0
CUR = 14
LEN = 15
We can see that this variable is a scalar, whose reference count is 1
(there are no other variables pointing to it). Its value is the
string "mod_perl rules", terminated
by \0 (one more character is used for the
string-terminating \0 character, which is handled
behind the scenes, transparently to the user), whose length is 15
characters including the terminating \0 character.
The data structure starts at 0x80571fc, and its
string value is stored starting from the address
0x805ce78.
If you want to look at more complicated structures, such as a hash or
an array, you should create references to them and pass the
references to the Dump( ) function.
The Apache::Peek module is built for use with
mod_perl's Devel::Peek, so you
can use it to peek at mod_perl's code internals.
In Chapter 10 we showed a few examples where
Devel::Peek and Apache::Peek
have been found very useful. To learn about Perl
variables' internals, refer to the
perlguts manpage.
21.8.4. Devel::Symdump and Apache::Symdump
Devel::Symdumpallows us to access
Perl's symbol table. This package is object
oriented. To instantiate an object, you should provide the name of
the package to traverse. If no package is provided as an argument,
the main package is used. If the object is created
with new( ), Devel::Symdump
analyzes only the packages that are given as arguments; if
rnew( ) is used, nested modules are analyzed
recursively.
Once the object is instantiated, the methods packages(
), scalars( ), arrays(
), hashes( ), functions(
), ios( ), and unknowns(
) can be used. Each method returns an array of fully
qualified symbols of the specified type in all packages that are held
within a Devel::Symdump object, but without the
leading "$", "@", or
"%". In a scalar context, they will return the
number of such symbols. Unknown symbols are usually either formats or
variables that don't yet have defined values.
You may find this package useful to see what symbols are defined,
traverse trees of symbols from inherited packages, and more. See the
package's manpage for more information.
Apache::Symdump uses
Devel::Symdump to record snapshots of the Perl
symbol table in ServerRoot/logs/symdump.$$.$n.
Here $$ is the process ID and
$n is incremented each time the handler is run.
To enable this module, add the following to
httpd.conf:
PerlLogHandler Apache::Symdump
This module is useful for watching the growth of the processes and
hopefully, by taking steps against the growth, reducing it. One of
the reasons for process growth is the definition of new symbols. You
can use the diff utility to compare snapshots
and get an idea of what might be making a process grow. For example:
panic% diff -u symdump.1892.0 symdump.1892.1
where 1892 is PID. Normally, new symbols come from modules or scripts
that were not preloaded, the Perl method cache, and so on.
Let's write a simple script that uses
DB_File, which wasn't preloaded
(see Example 21-13).
Example 21-13. use_dbfile.pl
use strict;
require DB_File;
my $r = shift;
$r->send_http_header("text/plain");
$r->print("Hello $$\n");
If we issue a few requests and then compare two consecutive request
dumps for the same process, nothing happens. That's
because the module is loaded on the first request, and therefore from
now on the symbol table will be the same. So in order to help
Apache::Symdump to help us detect the load, we
will require the module only on the second reload (see Example 21-14).
Example 21-14. use_dbfile1.pl
use strict;
use vars qw($loaded);
require DB_File if defined $loaded;
$loaded = 1;
my $r = shift;
$r->send_http_header("text/plain");
$r->print("Hello $$\n");
reveals that there were 301 symbols defined, mostly from the
DB_File and Fcntl packages. We
can also see what new files were loaded, by applying
diff on the incdump.$$.$n
files, which dump the contents of %INC after each
request:
The following are a few other modules that you may find of use, but
in this book we won't delve deeply into their
details:
Apache::DumpHeaders is used to watch an HTTP transaction,
looking at the client and server headers.
Apache::DebugInfo offers the ability to monitor various
bits of per-request data. Similar to
Apache::DumpHeaders.
Devel::StackTrace encapsulates the information that can be
found through using the caller( ) function and
provides a simple interface to this data.
Apache::Symbol provides XS tricks to avoid a mandatory
"Subroutine redefined" warning when
reloading a module that contains a subroutine that is eligible for
inlining. Useful during development when using
Apache::Reload or
Apache::StatINC to reload modules.
21.7. Hanging Processes: Detection and Diagnostics