When do you need to profile your code? You do that when you suspect
that some part of your code is called very often and maybe there is a
need to optimize it, which could significantly improve the overall
performance.
9.5.1. Profiling with Devel::DProf
Devel::DProf collects
information on the execution time of a Perl script and of the
subroutines in that script.
Let's take for example the
diagnostics pragma and measure the impact of its
usage on the program compilation and execution speed. This pragma
extends the terse diagnostics normally emitted by both the Perl
compiler and the Perl interpreter, augmenting them with the more
verbose and endearing descriptions found in the
perldiag manpage. We have claimed that this
pragma should not be used on a production server. We are going to use
Devel::DProf to explain our claim.
We will run a benchmark, once with diagnostics enabled and once
disabled, on a subroutine called test_code( ).
The code inside the subroutine does either a lexical or a numeric
comparison of two strings. It assigns one string to another if the
condition tests true, but the condition is always false. To
demonstrate the diagnostics pragma overhead, the
comparison operator that we use in Example 9-7 is
intentionally wrong. It should be a string comparison
(eq), and we use a numeric one (=
=).
Example 9-7. bench_diagnostics.pl
use Benchmark;
use diagnostics;
use strict;
my $count = 50000;
disable diagnostics;
my $t1 = timeit($count,\&test_code);
enable diagnostics;
my $t2 = timeit($count,\&test_code);
print "Off: ",timestr($t1),"\n";
print "On : ",timestr($t2),"\n";
sub test_code {
my ($a, $b) = qw(foo bar);
my $c;
if ($a = = $b) {
$c = $a;
}
}
For only a few lines of code we get:
Off: 1 wallclock secs ( 0.81 usr + 0.00 sys = 0.81 CPU)
On : 13 wallclock secs (12.54 usr + 0.01 sys = 12.55 CPU)
With diagnostics enabled, the subroutine test_code(
) is 16 times slower (12.55/0.81: remember that
we're talking in CPU time, not wallclock seconds)
than with diagnostics disabled!
Now let's fix the comparison the way it should be,
by replacing = = with eq, so we
get:
my ($a, $b) = qw(foo bar);
my $c;
if ($a eq $b) {
$c = $a;
}
and run the same benchmark again:
Off: 1 wallclock secs ( 0.57 usr + 0.00 sys = 0.57 CPU)
On : 1 wallclock secs ( 0.56 usr + 0.00 sys = 0.56 CPU)
Now there is no overhead at all. The diagnostics
pragma slows things down only when warnings are generated.
After we have verified that using the diagnostics
pragma might add a big overhead to execution runtime,
let's use code profiling to understand why this
happens. We use Devel::DProf to profile the code
shown in Example 9-8.
Example 9-8. diagnostics.pl
use diagnostics;
test_code( );
sub test_code {
my($a, $b) = qw(foo bar);
my $c;
if ($a = = $b) {
$c = $a;
}
}
Run it with the profiler enabled, and then create the profiling
statistics with the help of dprofpp:
panic% perl -d:DProf diagnostics.pl
panic% dprofpp
Total Elapsed Time = 0.342236 Seconds
User+System Time = 0.335420 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
92.1 0.309 0.358 1 0.3089 0.3578 main::BEGIN
14.9 0.050 0.039 3161 0.0000 0.0000 diagnostics::unescape
2.98 0.010 0.010 2 0.0050 0.0050 diagnostics::BEGIN
0.00 0.000 -0.000 2 0.0000 - Exporter::import
0.00 0.000 -0.000 2 0.0000 - Exporter::export
0.00 0.000 -0.000 1 0.0000 - Config::BEGIN
0.00 0.000 -0.000 1 0.0000 - Config::TIEHASH
0.00 0.000 -0.000 2 0.0000 - Config::FETCH
0.00 0.000 -0.000 1 0.0000 - diagnostics::import
0.00 0.000 -0.000 1 0.0000 - main::test_code
0.00 0.000 -0.000 2 0.0000 - diagnostics::warn_trap
0.00 0.000 -0.000 2 0.0000 - diagnostics::splainthis
0.00 0.000 -0.000 2 0.0000 - diagnostics::transmo
0.00 0.000 -0.000 2 0.0000 - diagnostics::shorten
0.00 0.000 -0.000 2 0.0000 - diagnostics::autodescribe
It's not easy to see what is responsible for this
enormous overhead, even if main::BEGINseems to be
running most of the time. To get the full picture we must see the OPs
tree, which shows us who calls whom, so we run:
panic% dprofpp -T
The output is:
main::BEGIN
diagnostics::BEGIN
Exporter::import
Exporter::export
diagnostics::BEGIN
Config::BEGIN
Config::TIEHASH
Exporter::import
Exporter::export
Config::FETCH
Config::FETCH
diagnostics::unescape
.....................
3159 times [diagnostics::unescape] snipped
.....................
diagnostics::unescape
diagnostics::import
diagnostics::warn_trap
diagnostics::splainthis
diagnostics::transmo
diagnostics::shorten
diagnostics::autodescribe
main::test_code
diagnostics::warn_trap
diagnostics::splainthis
diagnostics::transmo
diagnostics::shorten
diagnostics::autodescribe
diagnostics::warn_trap
diagnostics::splainthis
diagnostics::transmo
diagnostics::shorten
diagnostics::autodescribe
So we see that 2 executions of diagnostics::BEGIN
and 3,161 of diagnostics::unescape are responsible
for most of the running overhead.
If we comment out the diagnostics module, we get:
Total Elapsed Time = 0.079974 Seconds
User+System Time = 0.059974 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
0.00 0.000 -0.000 1 0.0000 - main::test_code
It is possible to profile code running under mod_perl with the
Devel::DProf module, available on CPAN. However,
you must have PerlChildExitHandler enabled during
the mod_perl build process. When the server is started,
Devel::DProf installs an END
block to write the tmon.out file. This block
will be called at server shutdown. Here is how to start and stop a
server with the profiler enabled:
panic% setenv PERL5OPT -d:DProf
panic% httpd -X -d `pwd` &
... make some requests to the server here ...
panic% kill `cat logs/httpd.pid`
panic% unsetenv PERL5OPT
panic% dprofpp
The Devel::DProf package is a Perl code profiler.
It will collect information on the execution time of a Perl script
and of the subroutines in that script (remember that print(
) and map( ) are just like any other
subroutines you write, but they come bundled with Perl!).
Another approach is to use Apache::DProf, which
hooks Devel::DProf into mod_perl. The
Apache::DProf module will run a
Devel::DProf profiler inside the process and write
the tmon.out file in the directory
$ServerRoot/logs/dprof/$$ (make sure that
it's writable by the server!) when the process is
shut down (where $$ is the PID of the process).
All it takes to activate this module is to modify
httpd.conf.
You can test for a command-line switch in
httpd.conf. For example, to test if the server
was started with -DPERLDPROF, use:
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
<IfDefine PERLDPROF>
PerlModule Apache::DProf
</IfDefine>
</Location>
And to activate profiling, use:
panic% httpd -X -DPERLDPROF &
Remember that any PerlHandler that was pulled in
before Apache::DProf in the
httpd.conf or startup.pl
file will not have code-debugging information inserted. To run
dprofpp, chdir to
$ServerRoot/logs/dprof/$$[36] and run:
[36]Look
up the ServerRoot directive's
value in httpd.conf to figure out what your
$ServerRoot is.
panic% dprofpp
Use the command-line options for dropfpp(1) if a
nondefault output is desired, as explained in the
dropfpp manpage. You might especially want to
look at the -r switch to display wallclock times
(more relevant in a web-serving environment) and the
-l switch to sort by number of subroutine calls.
If you are running Perl 5.6.0 or higher, take a look at the new
module Devel::Profiler (Version 0.04 as of this
writing), which is supposed to be a drop-in replacement for
Apache::DProf, with improved
functionality and stability.
9.5.2. Profiling with Devel::SmallProf
The Devel::SmallProf profiler
is focused on the time taken for a program run on a line-by-line
basis. It is called "small" because
it's supposed to impose very little extra load on
the machine (speed- and memory-wise) while profiling the code.
Let's take a look at the simple example shown in
Example 9-9.
Example 9-9. table_gen.pl
for (1..1000) {
my @rows = ( );
push @rows, Tr( map { td($_) } 'a'..'d' );
push @rows, Tr( map { td($_) } 'e'..'h' );
my $var = table(@rows);
}
sub table { my @rows = @_; return "<table>\n@rows</table>\n";}
sub Tr { my @cells = @_; return "<tr>@cells</tr>\n"; }
sub td { my $cell = shift; return "<td>$cell</td>"; }
It creates the same HTML table in $var, with the
cells of the table filled with single letters. The functions
table( ), Tr( ), and
td( ) insert the data into appropriate HTML tags.
Notice that we have used Tr( ) and not
tr( ), since the latter is a built-in Perl
function, and we have used the same function name as in
CGI.pm that does the same thing. If we print
$var we will see something like this:
<table>
<tr><td>a</td> <td>b</td> <td>c</td> <td>d</td></tr>
<tr><td>e</td> <td>f</td> <td>g</td> <td>h</td></tr>
</table>
We have looped a thousand times through the same code in order to get
a more precise speed measurement. If the code runs very quickly we
won't be able to get any meaningful results from
just one loop.
If we run this code with Devel::SmallProf:
panic% perl -d:SmallProf table_gen.pl
we get the following output in the autogenerated
smallprof.out file:
count wall tm cpu time line
1001 0.003855 0.030000 1: for (1..1000) {
1000 0.004823 0.040000 2: my @rows = ( );
5000 0.272651 0.410000 3: push @rows, Tr( map { td($_) }
5000 0.267107 0.360000 4: push @rows, Tr( map { td($_) }
1000 0.067115 0.120000 5: my $var = table(@rows);
0 0.000000 0.000000 6: }
3000 0.033798 0.080000 7: sub table { my @rows = @_; return
6000 0.078491 0.120000 8: sub Tr { my @cells = @_; return
24000 0.267353 0.490000 9: sub td { my $cell = shift; return
0 0.000000 0.000000 10:
We can see in the CPU time column that Perl
spends most of its time in the td( ) function;
it's also the code that's visited
by Perl the most times. In this example we could find this out
ourselves without much effort, but if the code is longer it will be
harder to find the lines of code where Perl spends most of its time.
So we sort the output by the third column as a numerical value, in
descending order:
panic% sort -k 3nr,3 smallprof.out | less
24000 0.267353 0.490000 9: sub td { my $cell = shift; return
5000 0.272651 0.410000 3: push @rows, Tr( map { td($_) }
5000 0.267107 0.360000 4: push @rows, Tr( map { td($_) }
1000 0.067115 0.120000 5: my $var = table(@rows);
6000 0.078491 0.120000 8: sub Tr { my @cells = @_; return
3000 0.033798 0.080000 7: sub table { my @rows = @_; return
1000 0.004823 0.040000 2: my @rows = ( );
1001 0.003855 0.030000 1: for (1..1000) {
According to the Devel::SmallProf manpage, the
wallclock's measurements are fairly accurate (we
suppose that they're correct on an unloaded
machine), but CPU clock time is always more accurate.
That's because if it takes more than one CPU time
slice for a directive to complete, the time that some other process
uses CPU is counted in the wallclock counts. Since the load on the
same machine may vary greatly from moment to moment,
it's possible that if we rerun the same test a few
times we will get inconsistent results.
Let's try to improve the td( )
function and at the same time the Tr( ) and
table( ) functions. We will not copy the passed
arguments, but we will use them directly in all three functions.
Example 9-10 shows the new version of our script.
Example 9-10. table_gen2.pl
for (1..1000) {
my @rows = ( );
push @rows, Tr( map { td($_) } 'a'..'d' );
push @rows, Tr( map { td($_) } 'e'..'h' );
my $var = table(@rows);
}
sub table { return "<table>\n@_</table>\n";}
sub Tr { return "<tr>@_</tr>\n"; }
sub td { return "<td>@_</td>"; }
Now let's rerun the code with the profiler:
panic% perl -d:SmallProf table_gen2.pl
The results are much better now—only 0.34 CPU clocks are spent
in td( ), versus 0.49 in the earlier run:
panic% sort -k 3nr,3 smallprof.out | less
5000 0.279138 0.400000 4: push @rows, Tr( map { td($_) }
16000 0.241350 0.340000 9: sub td { return "<td>@_</td>"; }
5000 0.269940 0.320000 3: push @rows, Tr( map { td($_) }
4000 0.050050 0.130000 8: sub Tr { return "<tr>@_</tr>\n"; }
1000 0.065324 0.080000 5: my $var = table(@rows);
1000 0.006650 0.010000 2: my @rows = ( );
2000 0.020314 0.030000 7: sub table{ return "<table>\n@_</table>\n";}
1001 0.006165 0.030000 1: for (1..1000) {
You will also notice that Devel::SmallProf reports
that the functions were executed different numbers of times in the
two runs. That's because in our original example all
three functions had two statements on each line, but in the improved
version they each had only one. Devel::SmallProf
looks at the code after it's been parsed and
optimized by Perl—thus, if some optimizations took place, it
might not be exactly the same as the code that you wrote.
In most cases you will probably find Devel::DProf
more useful than Devel::SmallProf, as it allows
you to analyze the code by subroutine and not by line.
Just as there is the Apache::DProf equivalent for
Devel::DProf, there is the
Apache::SmallProf equivalent for
Devel::SmallProf. It uses a configuration similar
to Apache::DProf—i.e., it is registered as a
PerlFixupHandler—but it also requires
Apache::DB. Therefore, to use it you should add
the following configuration to httpd.conf:
<Perl>
if (Apache->define('PERLSMALLPROF')) {
require Apache::DB;
Apache::DB->init;
}
</Perl>
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
<IfDefine PERLSMALLPROF>
PerlFixupHandler Apache::SmallProf
</IfDefine>
</Location>
Now start the server:
panic% httpd -X -DPERLSMALLPROF &
This will activate Apache::SmallProf::handler
during the request. As a result, the profile files will be written to
the $ServerRoot/logs/smallprof/ directory.
Unlike with Devel::SmallProf, the profile is split
into several files based on package name. For example, if
CGI.pm was used, one of the generated profile
files will be called CGI.pm.prof.