Chapter 9. Essential Tools for Performance Tuning

To be able to improve the performance of your system you need a prior understanding of what can be improved, how it can be improved, how much it can be improved, and, most importantly, what impact the improvement will have on the overall performance of your system. You need to be able to identify those things that, after you have done your best to improve them, will yield substantial benefits for the overall system performance. Concentrate your efforts on them, and avoid wasting time on improvements that give little overall gain.

If you have a small application it may be possible to detect places that could be improved simply by inspecting the code. On the other hand, if you have a large application, or many applications, it's usually impossible to do the detective work with the naked eye. You need observation instruments and measurement tools. These belong to the benchmarking and code-profiling categories.

It's important to understand that in the majority of the benchmarking tests that we will execute, we will not be looking at absolute results. Few machines will have exactly the same hardware and software setup, so this kind of comparison would usually be misleading, and in most cases we will be trying to show which coding approach is preferable, so the hardware is almost irrelevant.

Rather than looking at absolute results, we will be looking at the differences between two or more result sets run on the same machine. This is what you should do; you shouldn't try to compare the absolute results collected here with the results of those same benchmarks on your own machines.

In this chapter we will present a few existing tools that are widely used; we will apply them to example code snippets to show you how performance can be measured, monitored, and improved; and we will give you an idea of how you can develop your own tools.

9.1. Server Benchmarking

As web service developers, the most important thing we should strive for is to offer the user a fast, trouble-free browsing experience. Measuring the response rates of our servers under a variety of load conditions and benchmark programs helps us to do this.

A benchmark program may consume significant resources, so you cannot find the real times that a typical user will wait for a response from your service by running the benchmark on the server itself. Ideally you should run it from a different machine. A benchmark program is unlike a typical user in the way it generates requests. It should be able to emulate multiple concurrent users connecting to the server by generating many concurrent requests. We want to be able to tell the benchmark program what load we want to emulate—for example, by specifying the number or rate of requests to be made, the number of concurrent users to emulate, lists of URLs to request, and other relevant arguments.

9.1.1. ApacheBench

ApacheBench (ab) is a tool for benchmarking your Apache HTTP server. It is designed to give you an idea of the performance that your current Apache installation can give. In particular, it shows you how many requests per second your Apache server is capable of serving. The ab tool comes bundled with the Apache source distribution, and like the Apache web server itself, it's free.

Let's try it. First we create a test script, as shown in Example 9-1.

Example 9-1. simple_test.pl

my $r = shift;
$r->send_http_header('text/plain');
print "Hello\n";

We will simulate 10 users concurrently requesting the file simple_test.pl through https://localhost/perl/simple_test.pl. Each simulated user makes 500 requests. We generate 5,000 requests in total:

panic% ./ab -n 5000 -c 10 https://localhost/perl/simple_test.pl

Server Software:        Apache/1.3.25-dev
Server Hostname:        localhost
Server Port:            8000

Document Path:          /perl/simple_test.pl
Document Length:        6 bytes

Concurrency Level:      10
Time taken for tests:   5.843 seconds
Complete requests:      5000
Failed requests:        0
Broken pipe errors:     0
Total transferred:      810162 bytes
HTML transferred:       30006 bytes
Requests per second:    855.72 [#/sec] (mean)
Time per request:       11.69 [ms] (mean)
Time per request:       1.17 [ms] (mean, across all concurrent requests)
Transfer rate:          138.66 [Kbytes/sec] received

Connnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0     1    1.4      0    17
Processing:     1    10   12.9      7   208
Waiting:        0     9   13.0      7   208
Total:          1    11   13.1      8   208

Most of the report is not very interesting to us. What we really care about are the Requests per second and Connection Times results:

Requests per second: The number of requests (to our test script) the server was able to serve in one second
Connect and Waiting times: The amount of time it took to establish the connection and get the first bits of a response
Processing time: The server response time—i.e., the time it took for the server to process the request and send a reply
Total time: The sum of the Connect and Processing times

As you can see, the server was able to respond on average to 856 requests per second. On average, it took no time to establish a connection to the server both the client and the server are running on the same machine and 10 milliseconds to process each request. As the code becomes more complicated you will see that the processing time grows while the connection time remains constant. The latter isn't influenced by the code complexity, so when you are working on your code performance, you care only about the processing time. When you are benchmarking the overall service, you are interested in both.

Just for fun, let's benchmark a similar script, shown in Example 9-2, under mod_cgi.

Example 9-2. simple_test_mod_cgi.pl

#!/usr/bin/perl
print "Content-type: text/plain\n\n";
print "Hello\n";

The script is configured as:

ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/

panic% /usr/local/apache/bin/ab -n 5000 -c 10 \
https://localhost/cgi-bin/simple_test_mod_cgi.pl

We will show only the results that interest us:

Requests per second:    156.40 [#/sec] (mean)
Time per request:       63.94 [ms] (mean)

Now, when essentially the same script is executed under mod_cgi instead of mod_perl, we get 156 requests per second responded to, not 856.

ApacheBench can generate KeepAlives, GET (default) and POST requests, use Basic Authentication, send cookies and custom HTTP headers. The version of ApacheBench released with Apache version 1.3.20 adds SSL support, generates gnuplot and CSV output for postprocessing, and reports median and standard deviation values.

HTTPD::Bench::ApacheBench, available from CPAN, provides a Perl interface for ab.

9.1.2. httperf

httperf is another tool for measuring web server performance. Its input and reports are different from the ones we saw while using ApacheBench. This tool's manpage includes an in-depth explanation of all the options it accepts and the results it generates. Here we will concentrate on the input and on the part of the output that is most interesting to us.

With httperf you cannot specify the concurrency level; instead, you have to specify the connection opening rate (—rate) and the number of calls (—num-call) to perform on each opened connection. To compare the results we received from ApacheBench we will use a connection rate slightly higher than the number of requests responded to per second reported by ApacheBench. That number was 856, so we will try a rate of 860 (—rate 860) with just one request per connection (—num-call 1). As in the previous test, we are going to make 5,000 requests (—num-conn 5000). We have set a timeout of 60 seconds and allowed httperf to use as many ports as it needs (—hog).

So let's execute the benchmark and analyze the results:

panic% httperf --server localhost --port 80 --uri /perl/simple_test.pl \
--hog --rate 860 --num-conn 5000 --num-call 1 --timeout 60

Maximum connect burst length: 11

Total: connections 5000 requests 5000 replies 5000 test-duration 5.854 s

Connection rate: 854.1 conn/s (1.2 ms/conn, <=50 concurrent connections)
Connection time [ms]: min 0.8 avg 23.5 max 226.9 median 20.5 stddev 13.7
Connection time [ms]: connect 4.0
Connection length [replies/conn]: 1.000

Request rate: 854.1 req/s (1.2 ms/req)
Request size [B]: 79.0

Reply rate [replies/s]: min 855.6 avg 855.6 max 855.6 stddev 0.0 (1 samples)
Reply time [ms]: response 19.5 transfer 0.0
Reply size [B]: header 184.0 content 6.0 footer 2.0 (total 192.0)
Reply status: 1xx=0 2xx=5000 3xx=0 4xx=0 5xx=0

CPU time [s]: user 0.33 system 1.53 (user 5.6% system 26.1% total 31.8%)
Net I/O: 224.4 KB/s (1.8*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

As before, we are mostly interested in the average Reply rate—855, almost exactly the same result reported by ab in the previous section. Notice that when we tried —rate 900 for this particular setup, the reported request rate went down drastically, since the server's performance gets worse when there are more requests than it can handle.

9.1.3. http_load

http_load is yet another utility that does web server load testing. It can simulate a 33.6 Kbps modem connection (-throttle) and allows you to provide a file with a list of URLs that will be fetched randomly. You can specify how many parallel connections to run (-parallel N) and the number of requests to generate per second (-rate N). Finally, you can tell the utility when to stop by specifying either the test time length (-seconds N) or the total number of fetches (-fetches N).

Again, we will try to verify the results reported by ab (claiming that the script under test can handle about 855 requests per second on our machine). Therefore we run http_load with a rate of 860 requests per second, for 5 seconds in total. We invoke is on the file urls, containing a single URL:

https://localhost/perl/simple_test.pl

Here is the generated output:

panic% http_load -rate 860 -seconds 5 urls
4278 fetches, 325 max parallel, 25668 bytes, in 5.00351 seconds
6 mean bytes/connection
855 fetches/sec, 5130 bytes/sec
msecs/connect: 20.0881 mean, 3006.54 max, 0.099 min
msecs/first-response: 51.3568 mean, 342.488 max, 1.423 min
HTTP response codes:
  code 200 -- 4278

This application also reports almost exactly the same response-rate capability: 855 requests per second. Of course, you may think that it's because we have specified a rate close to this number. But no, if we try the same test with a higher rate:

panic% http_load -rate 870 -seconds 5 urls
4045 fetches, 254 max parallel, 24270 bytes, in 5.00735 seconds
6 mean bytes/connection
807.813 fetches/sec, 4846.88 bytes/sec
msecs/connect: 78.4026 mean, 3005.08 max, 0.102 min

we can see that the performance goes down—it reports a response rate of only 808 requests per second.

The nice thing about this utility is that you can list a few URLs to test. The URLs that get fetched are chosen randomly from the specified file.

Note that when you provide a file with a list of URLs, you must make sure that you don't have empty lines in it. If you do, the utility will fail and complain:

./http_load: unknown protocol -

9.1.4. Other Web Server Benchmark Utilities

The following are also interesting benchmarking applications implemented in Perl:

HTTP::WebTest: The HTTP::WebTest module (available from CPAN) runs tests on remote URLs or local web files containing Perl, JSP, HTML, JavaScript, etc. and generates a detailed test report.
HTTP::Monkeywrench: HTTP::Monkeywrench is a test-harness application to test the integrity of a user's path through a web site.
Apache::Recorder and HTTP::RecordedSession: Apache::Recorder (available from CPAN) is a mod_perl handler that records an HTTP session and stores it on the web server's filesystem. HTTP::RecordedSession reads the recorded session from the filesystem and formats it for playback using HTTP::WebTest or HTTP::Monkeywrench. This is useful when writing acceptance and regression tests.

Many other benchmark utilities are available both for free and for money. If you find that none of these suits your needs, it's quite easy to roll your own utility. The easiest way to do this is to write a Perl script that uses the LWP::Parallel::UserAgent and Time::HiRes modules. The former module allows you to open many parallel connections and the latter allows you to take time samples with microsecond resolution.