If you want to benchmark your Perl code, you can use
the Benchmark module. For example,
let's say that our code generates many long strings
and finally prints them out. We wonder what is the most efficient way
to handle this task—we can try to concatenate the strings into
a single string, or we can store them (or references to them) in an
array before generating the output. The easiest way to get an answer
is to try each approach, so we wrote the benchmark shown in Example 9-3.
Example 9-3. strings_benchmark.pl
use Benchmark;
use Symbol;
my $fh = gensym;
open $fh, ">/dev/null" or die $!;
my($one, $two, $three) = map { $_ x 4096 } 'a'..'c';
timethese(100_000, {
ref_array => sub {
my @a;
push @a, \($one, $two, $three);
my_print(@a);
},
array => sub {
my @a;
push @a, $one, $two, $three;
my_print(@a);
},
concat => sub {
my $s;
$s .= $one;
$s .= $two;
$s .= $three;
my_print($s);
},
});
sub my_print {
for (@_) {
print $fh ref($_) ? $$_ : $_;
}
}
As you can see, we generate three big strings and then use three
anonymous functions to print them out. The first one
(ref_array) stores the references to the strings
in an array. The second function (array) stores
the strings themselves in an array. The third function
(concat) concatenates the three strings into a
single string. At the end of each function we print the stored data.
If the data structure includes references, they are first
dereferenced (relevant for the first function only). We execute each
subtest 100,000 times to get more precise results. If your results
are too close and are below 1 CPU clocks, you should try setting the
number of iterations to a bigger number. Let's
execute this benchmark and check the results:
First, it's important to remember that the reported
wallclock times can be misleading and thus should not be relied upon.
If during one of the subtests your computer was more heavily loaded
than during the others, it's possible that this
particular subtest will take more wallclocks to complete, but this
doesn't matter for our purposes. What matters is the
CPU clocks, which tell us the exact amount of CPU time each test took
to complete. You can also see the fraction of the CPU allocated to
usr and sys, which stand
for the user and kernel (system) modes, respectively. This tells us
what proportions of the time the subtest has spent running code in
user mode and in kernel mode.
Now that you know how to read the results, you can see that
concatenation outperforms the two array functions, because
concatenation only has to grow the size of the string, whereas array
functions have to extend the array and, during the print, iterate
over it. Moreover, the array method also creates a string copy before
appending the new element to the array, which makes it the slowest
method of the three.
Let's make the strings much smaller. Using our
original code with a small correction:
my($one, $two, $three) = map { $_ x 8 } 'a'..'c';
we now make three strings of 8 characters, instead of 4,096. When we
execute the modified version we get the following picture:
Concatenation still wins, but this time the array method is a bit
faster than ref_array, because the overhead of taking string
references before pushing them into an array and dereferencing them
afterward during print( ) is bigger than the
overhead of making copies of the short strings.
As these examples show, you should benchmark your code by rewriting
parts of the code and comparing the benchmarks of the modified and
original versions.
Also note that benchmarks can give different results under different
versions of the Perl interpreter, because each version might have
built-in optimizations for some of the functions. Therefore, if you
upgrade your Perl interpreter, it's best to
benchmark your code again. You may see a completely different result.
Another Perl code benchmarking method is to use the
Time::HiRes module, which allows you to get the
runtime of your code with a fine-grained resolution of the order of
microseconds. Let's compare a few methods to
multiply two numbers (see Example 9-4).
Example 9-4. hires_benchmark_time.pl
use Time::HiRes qw(gettimeofday tv_interval);
my %subs = (
obvious => sub {
$_[0] * $_[1]
},
decrement => sub {
my $a = shift;
my $c = 0;
$c += $_[0] while $a--;
$c;
},
);
for my $x (qw(10 100)) {
for my $y (qw(10 100)) {
for (sort keys %subs) {
my $start_time = [ gettimeofday ];
my $z = $subs{$_}->($x,$y);
my $end_time = [ gettimeofday ];
my $elapsed = tv_interval($start_time,$end_time);
printf "%-9.9s: Doing %3.d * %3.d = %5.d took %f seconds\n",
$_, $x, $y, $z, $elapsed;
}
print "\n";
}
}
We have used two methods here. The first (obvious)
is doing the normal multiplication, $z=$x*$y. The
second method is using a trick of the systems where there is no
built-in multiplication function available; it uses only the addition
and subtraction operations. The trick is to add $x
for $y times (as you did in school before you
learned multiplication).
Note that if the processor is very fast or the OS has a coarse
time-resolution granularity (i.e., cannot count microseconds) you may
get zeros as reported times. This of course
shouldn't be the case with applications that do a
lot more work.
If you run this benchmark again, you will notice that the numbers
will be slightly different. This is because the code measures
absolute time, not the real execution time (unlike the previous
benchmark using the Benchmark module).
You can see that doing 10*100 as opposed to
100*10 results in quite different results for the
decrement method. When the arguments are 10*100,
the code performs the add 100 operation only 10
times, which is obviously faster than the second invocation,
100*10, where the code performs the add
10 operation 100 times. However, the normal multiplication
takes a constant time.
Let's run the same code using the
Benchmark module, as shown in Example 9-5.
Example 9-5. hires_benchmark.pl
use Benchmark;
my %subs = (
obvious => sub {
$_[0] * $_[1]
},
decrement => sub {
my $a = shift;
my $c = 0;
$c += $_[0] while $a--;
$c;
},
);
for my $x (qw(10 100)) {
for my $y (qw(10 100)) {
print "\nTesting $x*$y\n";
timethese(300_000, {
obvious => sub {$subs{obvious}->($x, $y) },
decrement => sub {$subs{decrement}->($x, $y)},
});
}
}
You can observe exactly the same behavior, but this time using the
average CPU clocks collected over 300,000 tests and not the absolute
time collected over a single sample. Obviously, you can use the
Time::HiRes module in a benchmark that will
execute the same code many times to report a more precise runtime,
similar to the way the Benchmark module reports
the CPU time.
However, there are situations where getting the average speed is not
enough. For example, if you're testing some code
with various inputs and calculate only the average processing times,
you may not notice that for some particular inputs the code is very
ineffective. Let's say that the average is 0.72
seconds. This doesn't reveal the possible fact that
there were a few cases when it took 20 seconds to process the input.
Therefore, getting the variance[34] in addition to the average may be important.
Unfortunately Benchmark.pm cannot provide such
results—system timers are rarely good enough to measure fast
code that well, even on single-user systems, so you must run the code
thousands of times to get any significant CPU time. If the code is
slow enough that each single execution can be measured, most likely
you can use the profiling tools.
[34]See Chapter 15 in the
book Mastering Algorithms with Perl, by Jon
Orwant, Jarkko Hietaniemi, and John Macdonald
(O'Reilly). Of course, there are gazillions of
statistics-related books and resources on the Web; http://mathforum.org/ and http://mathworld.wolfram.com/ are two good
starting points for anything that has to do with mathematics.