Handling the "User Pressed Stop Button" Case (Practical mod

6.9. Handling the "User Pressed Stop Button" Case

When a user presses the Stop or Reload button, the current socket connection is broken (aborted). It would be nice if Apache could always immediately detect this event. Unfortunately, there is no way to tell whether the connection is still valid unless an attempt to read from or write to the connection is made.

Note that no detection technique will work if the connection to the backend mod_perl server is coming from a frontend mod_proxy (as discussed in Chapter 12). This is because mod_proxy doesn't break the connection to the backend when the user has aborted the connection.

If the reading of the request's data is completed and the code does its processing without writing anything back to the client, the broken connection won't be noticed. When an attempt is made to send at least one character to the client, the broken connection will be noticed and the SIGPIPE signal (Broken Pipe) will be sent to the process. The program can then halt its execution and perform all its cleanup requirements.

Prior to Apache 1.3.6, SIGPIPE was handled by Apache. Currently, Apache does not handle SIGPIPE, but mod_perl takes care of it.

Under mod_perl, $r->print (or just print( )) returns a true value on success and a false value on failure. The latter usually happens when the connection is broken.

If you want behavior similar to the old SIGPIPE (as it was before Apache version 1.3.6), add the following configuration directive:

PerlFixupHandler Apache::SIG

When Apache's SIGPIPE handler is used, Perl may be left in the middle of its eval( ) context, causing bizarre errors when subsequent requests are handled by that child. When Apache::SIG is used, it installs a different SIGPIPE handler that rewinds the context to make sure Perl is in a normal state before the new request is served, preventing these bizarre errors. But in general, you don't need to use Apache::SIG.

If you use Apache::SIG and you would like to log when a request was canceled by a SIGPIPE in your Apache access_log, you must define a custom LogFormat in your httpd.conf. For example:

PerlFixupHandler Apache::SIG
LogFormat "%h %l %u %t \"%r\" %s %b %{SIGPIPE}e"

If the server has noticed that the request was canceled via a SIGPIPE, the log line will end with 1. Otherwise, it will just be a dash. For example:

127.0.0.1 - - [09/Jan/2001:10:27:15 +0100] 
"GET /perl/stopping_detector.pl HTTP/1.0" 200 16 1
127.0.0.1 - - [09/Jan/2001:10:28:18 +0100] 
"GET /perl/test.pl HTTP/1.0"              200 10 -

6.9.1. Detecting Aborted Connections

Now let's use the knowledge we have acquired to trace the execution of the code and watch all the events as they happen. Let's take a simple Apache::Registryscript that purposely hangs the server process, like the one in Example 6-28.

Example 6-28. stopping_detector.pl

my $r = shift;
$r->send_http_header('text/plain');

print "PID = $$\n";
$r->rflush;

while (1) {
    sleep 1;
}

The script gets a request object $r by shift( ) ing it from the @_ argument list (passed by the handler( )subroutine that was created on the fly by Apache::Registry). Then the script sends a Content-Type header telling the client that we are going to send a plain-text response.

Next, the script prints out a single line telling us the ID of the process that handled the request, which we need to know in order to run the tracing utility. Then we flush Apache's STDOUT buffer. If we don't flush the buffer, we will never see this information printed (our output is shorter than the buffer size used for print( ), and the script intentionally hangs, so the buffer won't be auto-flushed).[31]

[31]Buffering is used to reduce the number of system calls (which do the actual writing) and therefore improve performance. When the buffer (usually a few kilobytes in size) is getting full, it's flushed and the data is written.

Then we enter an infinite while loop that does nothing but sleep( ), emulating code that doesn't generate any output. For example, it might be a long-running mathematical calculation, a database query, or a search for extraterrestrial life.

Running strace -p PID, where PID is the process ID as printed on the browser, we see the following output printed every second:

rt_sigprocmask(SIG_BLOCK, [CHLD], [  ], 8)  = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [  ], NULL, 8)  = 0
nanosleep({1, 0}, {1, 0})                 = 0
time([978969822])                         = 978969822
time([978969822])                         = 978969822

Alternatively, we can run the server in single-server mode. In single-server mode, we don't need to print the process ID, since the PID is the process of the single mod_perl process that we're running. When the process is started in the background, the shell program usually prints the PID of the process, as shown here:

panic% httpd -X &
 [1] 20107

Now we know what process we have to attach to with strace (or a similar utility):

panic% strace -p 20107
rt_sigprocmask(SIG_BLOCK, [CHLD], [  ], 8)  = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [  ], NULL, 8)  = 0
nanosleep({1, 0}, {1, 0})                 = 0
time([978969822])                         = 978969822
time([978969822])                         = 978969822

We see the same output as before.

Let's leave strace running and press the Stop button. Did anything change? No, the same system calls trace is printed every second, which means that Apache didn't detect the broken connection.

Now we are going to write \0 (NULL) characters to the client in an attempt to detect the broken connection as soon as possible after the Stop button is pressed. Since these are NULL characters, they won't be seen in the output. Therefore, we modify the loop code in the following way:

while (1) {
    $r->print("\0");
    last if $r->connection->aborted;
    sleep 1;
}

We add a print( )statement to print a NULL character, then we check whether the connection was aborted, with the help of the $r->connection->aborted method. If the connection is broken, we break out of the loop.

We run this script and run strace on it as before, but we see that it still doesn't work—the script doesn't stop when the Stop button is pressed.

The problem is that we aren't flushing the buffer. The NULL characters won't be printed until the buffer is full and is autoflushed. Since we want to try writing to the connection pipe all the time, we add an $r->rflush( ) call. Example 6-29 is a new version of the code.

Example 6-29. stopping_detector2.pl

my $r = shift;
$r->send_http_header('text/plain');

print "PID = $$\n";
$r->rflush;

while (1) {
    $r->print("\0");
    $r->rflush;
    last if $r->connection->aborted;
    sleep 1;
}

After starting the strace utility on the running process and pressing the Stop button, we see the following output:

rt_sigprocmask(SIG_BLOCK, [CHLD], [  ], 8)  = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [  ], NULL, 8)  = 0
nanosleep({1, 0}, {1, 0})               = 0
time([978970895])                       = 978970895
alarm(300)                              = 0
alarm(0)                                = 300
write(3, "\0", 1)                       = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---
chdir("/usr/src/httpd_perl")            = 0
select(4, [3], NULL, NULL, {0, 0})      = 1 (in [3], left {0, 0})
time(NULL)                              = 978970895
write(17, "127.0.0.1 - - [08/Jan/2001:19:21"..., 92) = 92
gettimeofday({978970895, 554755}, NULL) = 0
times({tms_utime=46, tms_stime=5, tms_cutime=0, 
  tms_cstime=0}) = 8425400
close(3)                                = 0
rt_sigaction(SIGUSR1, {0x8099524, [  ], SA_INTERRUPT|0x4000000}, 
  {SIG_IGN}, 8) = 0alarm(0)                                = 0
rt_sigprocmask(SIG_BLOCK, NULL, [  ], 8)  = 0
rt_sigaction(SIGALRM, {0x8098168, [  ], SA_RESTART|0x4000000}, 
  {0x8098168, [  ], SA_INTERRUPT|0x4000000}, 8) = 0
fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, 
  start=0, len=0}) = 0

Apache detects the broken pipe, as you can see from this snippet:

write(3, "\0", 1)                       = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---

Then it stops the script and does all the cleanup work, such as access logging:

write(17, "127.0.0.1 - - [08/Jan/2001:19:21"..., 92) = 92

where 17 is a file descriptor of the opened access_log file.

6.9.2. The Importance of Cleanup Code

Cleanup code is a critical issue with aborted scripts. For example, what happens to locked resources, if there are any? Will they be freed or not? If not, scripts using these resources and the same locking scheme might hang forever, waiting for these resources to be freed.

And what happens if a file was opened and never closed? In some cases, this might lead to a file-descriptor leakage. In the long run, many leaks of this kind might make your system unusable: when all file descriptors are used, the system will be unable to open new files.

First, let's take a step back and recall what the problems and solutions for these issues are under mod_cgi. Under mod_cgi, the resource-locking issue is a problem only if you use external lock files and use them for lock indication, instead of using flock( ). If the script running under mod_cgi is aborted between the lock and the unlock code, and you didn't bother to write cleanup code to remove old, dead locks, you're in big trouble.

The solution is to place the cleanup code in an END block:

END {
    # code that ensures that locks are removed
}

When the script is aborted, Perl will run the END block while shutting down.

If you use flock( ), things are much simpler, since all opened files will be closed when the script exits. When the file is closed, the lock is removed as well—all the locked resources are freed. There are systems where flock( ) is unavailable; on those systems, you can use Perl's emulation of this function.

With mod_perl, things can be more complex when you use global variables as filehandles. Because processes don't exit after processing a request, files won't be closed unless you explicitly close( ) them or reopen them with the open( ) call, which first closes the file. Let's see what problems we might encounter and look at some possible solutions.

6.9.2.1. Critical section

First, we want to take a little detour to discuss the "critical section" issue. Let's start with a resource-locking scheme. A schematic representation of a proper locking technique is as follows:

Lock a resource

<critical section starts>
Do something with the resource

<critical section ends>
Unlock the resource

If the locking is exclusive, only one process can hold the resource at any given time, which means that all the other processes will have to wait. The code between the locking and unlocking functions cannot be interrupted and can therefore become a service bottleneck. That's why this code section is called critical. Its execution time should be as short as possible.

Even if you use a shared locking scheme, in which many processes are allowed to concurrently access the resource, it's still important to keep the critical section as short as possible, in case a process requires an exclusive lock.

Example 6-30 uses a shared lock but has a poorly designed critical section.

Example 6-30. critical_section_sh.pl

use Fcntl qw(:flock);
use Symbol;

my $fh = gensym;
open $fh, "/tmp/foo" or die $!;

# start critical section
flock $fh, LOCK_SH;  # shared lock, appropriate for reading
seek $fh, 0, 0;
my @lines = <$fh>;
for (@lines) {
    print if /foo/;
}
close $fh; # close unlocks the file
# end critical section

The code opens the file for reading, locks and rewinds it to the beginning, reads all the lines from the file, and prints out the lines that contain the string "foo".

The gensym( ) function imported by the Symbol module creates an anonymous glob data structure and returns a reference to it. Such a glob reference can be used as a file or directory handle. Therefore, it allows lexically scoped variables to be used as filehandles.

Fcntl imports file-locking symbols, such as LOCK_SH, LOCK_EX, and others with the :flock group tag, into the script's namespace. Refer to the Fcntl manpage for more information about these symbols.

If the file being read is big, it will take a relatively long time for this code to complete printing out the lines. During this time, the file remains open and locked with a shared lock. While other processes may access this file for reading, any process that wants to modify the file (which requires an exclusive lock) will be blocked waiting for this section to complete.

We can optimize the critical section as follows. Once the file has been read, we have all the information we need from it. To make the example simpler, we've chosen to just print out the matching lines. In reality, the code might be much longer.

We don't need the file to be open while the loop executes, because we don't access it inside the loop. Closing the file before we start the loop will allow other processes to obtain exclusive access to the file if they need it, instead of being blocked for no reason.

Example 6-31 is an improved version of the previous example, in which we only read the contents of the file during the critical section and process it afterward, without creating a possible bottleneck.

Example 6-31. critical_section_sh2.pl

use Fcntl qw(:flock);
use Symbol;

my $fh = gensym;
open $fh, "/tmp/foo" or die $!;

# start critical section
flock $fh, LOCK_SH;
seek $fh, 0, 0;
my @lines = <$fh>;
close $fh; # close unlocks the file
# end critical section

for (@lines) {
    print if /foo/;
}

Example 6-32 is a similar example that uses an exclusive lock. The script reads in a file and writes it back, prepending a number of new text lines to the head of the file.

Example 6-32. critical_section_ex.pl

use Fcntl qw(:flock);
use Symbol;

my $fh = gensym;
open $fh, "+>>/tmp/foo" or die $!;

# start critical section
flock $fh, LOCK_EX;
seek $fh, 0, 0;
my @add_lines =
  (
   qq{Complete documentation for Perl, including FAQ lists,\n},
   qq{should be found on this system using 'man perl' or\n},
   qq{'perldoc perl'. If you have access to the Internet, point\n},
   qq{your browser at https://www.perl.com/, the Perl Home Page.\n},
  );

my @lines = (@add_lines, <$fh>);
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
close $fh; # close unlocks the file
# end critical section

Since we want to read the file, modify it, and write it back without anyone else changing it in between, we open it for reading and writing with the help of "+>>" and lock it with an exclusive lock. You cannot safely accomplish this task by opening the file first for reading and then reopening it for writing, since another process might change the file between the two events. (You could get away with "+<" as well; please refer to the perlfunc manpage for more information about the open( ) function.)

Next, the code prepares the lines of text it wants to prepend to the head of the file and assigns them and the content of the file to the @lines array. Now we have our data ready to be written back to the file, so we seek( ) to the start of the file and truncate( ) it to zero size. Truncating is necessary when there's a chance the file might shrink. In our example, the file always grows, so in this case there is actually no need to truncate it; however, it's good practice to always use truncate( ), as you never know what changes your code might undergo in the future, and truncate( ) doesn't significantly affect performance.

Finally, we write the data back to the file and close it, which unlocks it as well.

Did you notice that we created the text lines to be prepended as close to the place of usage as possible? This complies with good "locality of code" style, but it makes the critical section longer. In cases like this, you should sacrifice style in order to make the critical section as short as possible. An improved version of this script with a shorter critical section is shown in Example 6-33.

Example 6-33. critical_section_ex2.pl

use Fcntl qw(:flock);
use Symbol;

my @lines =
  (
   qq{Complete documentation for Perl, including FAQ lists,\n},
   qq{should be found on this system using 'man perl' or\n},
   qq{'perldoc perl'. If you have access to the Internet, point\n},
   qq{your browser at https://www.perl.com/, the Perl Home Page.\n},
  );

my $fh = gensym;
open $fh, "+>>/tmp/foo" or die $!;

# start critical section
flock $fh, LOCK_EX;
seek $fh, 0, 0;
push @lines, <$fh>;

seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
close $fh; # close unlocks the file
# end critical section

There are two important differences. First, we prepared the text lines to be prepended before the file is locked. Second, rather than creating a new array and copying lines from one array to another, we appended the file directly to the @lines array.

6.9.2.2. Safe resource locking and cleanup code

Now let's get back to this section's main issue, safe resource locking. If you don't make a habit of closing all files that you open, you may encounter many problems (unless you use the Apache::PerlRun handler, which does the cleanup for you). An open file that isn't closed can cause file-descriptor leakage. Since the number of file descriptors available is finite, at some point you will run out of them and your service will fail. This will happen quite fast on a heavily used server.

You can use system utilities to observe the opened and locked files, as well as the processes that have opened (and locked) the files. On FreeBSD, use the fstat utility. On many other Unix flavors, use lsof. On systems with a /proc filesystem, you can see the opened file descriptors under /proc/PID/fd/, where PID is the actual process ID.

However, file-descriptor leakage is nothing compared to the trouble you will give yourself if the code terminates and the file remains locked. Any other process requesting a lock on the same file (or resource) will wait indefinitely for it to become unlocked. Since this will not happen until the server reboots, all processes trying to use this resource will hang.

Example 6-34 is an example of such a terrible mistake.

Example 6-34. flock.pl

use Fcntl qw(:flock);
open IN, "+>>filename" or die "$!";
flock IN, LOCK_EX;
# do something
# quit without closing and unlocking the file

Is this safe code? No—we forgot to close the file. So let's add the close( ), as in Example 6-35.

Example 6-35. flock2.pl

use Fcntl qw(:flock);
open IN, "+>>filename" or die "$!";
flock IN, LOCK_EX;
# do something
close IN;

Is it safe code now? Unfortunately, it is not. If the user aborts the request (for example, by pressing the browser's Stop or Reload buttons) during the critical section, the script will be aborted before it has had a chance to close( ) the file, which is just as bad as if we forgot to close it.

In fact, if the same process runs the same code again, an open( ) call will close( ) the file first, which will unlock the resource. This is because IN is a global variable. But it's quite possible that the process that created the lock will not serve the same request for a while, since it might be busy serving other requests. During that time, the file will be locked for other processes, making them hang. So relying on the same process to reopen the file is a bad idea.

This problem happens only if you use global variables as file handles. Example 6-36 has the same problem.

Example 6-36. flock3.pl

use Fcntl qw(:flock);
use Symbol ( );
use vars qw($fh);
$fh = Symbol::gensym( );
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# do something
close $fh;

$fh is still a global variable, and therefore the code using it suffers from the same problem.

The simplest solution to this problem is to always use lexically scoped variables (created with my( )). The lexically scoped variable will always go out of scope (assuming that it's not used in a closure, as explained in the beginning of this chapter), whether the script gets aborted before close( ) is called or you simply forgot to close( ) the file. Therefore, if the file was locked, it will be closed and unlocked. Example 6-37 is a good version of the code.

Example 6-37. flock4.pl

use Fcntl qw(:flock);
use Symbol ( );
my $fh = Symbol::gensym( );
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# do something
close $fh;

If you use this approach, please don't conclude that you don't have to close files anymore because they are automatically closed for you. Not closing files is bad style and should be avoided.

Note also that Perl 5.6 provides a Symbol.pm-like functionality as a built-in feature, so you can write:

open my $fh, ">/tmp/foo" or die $!;

and $fh will be automatically vivified as a valid filehandle. You don't need to use Symbol::gensym and Apache::gensym anymore, if backward compatibility is not a requirement.

You can also use IO::* modules, such as IO::File or IO::Dir. These are much bigger than the Symbol module (as a matter of fact, these modules use the Symbol module themselves) and are worth using for files or directories only if you are already using them for the other features they provide. Here is an example of their usage:

use IO::File;
use IO::Dir;
my $fh = IO::File->new(">filename");
my $dh = IO::Dir->new("dirname");

Alternatively, there are also the lighter FileHandle and DirHandle modules.

If you still have to use global filehandles, there are a few approaches you can take to clean up in the case of abnormal script termination.

If you are running under Apache::Registry and friends, the END block will perform the cleanup work for you. You can use END in the same way for scripts running under mod_cgi, or in plain Perl scripts. Just add the cleanup code to this block, and you are safe.

For example, if you work with DBM files, it's important to flush the DBM buffers by calling a sync( ) method:

END {
    # make sure that the DB is flushed
    $dbh->sync( );
}

Under mod_perl, the above code will work only for Apache::Registry and Apache::PerlRunscripts. Otherwise, execution of the END block is postponed until the process terminates. If you write a handler in the mod_perl API, use the register_cleanup( ) method instead. It accepts a reference to a subroutine as an argument. You can rewrite the DBM synchronization code in this way:

$r->register_cleanup(sub { $dbh->sync( ) });

This will work under Apache::Registry as well.

Even better would be to check whether the client connection has been aborted. Otherwise, the cleanup code will always be executed, and for normally terminated scripts, this may not be what you want. To perform this check, use:

$r->register_cleanup(
  # make sure that the DB is flushed
  sub {
      $dbh->sync( ) if Apache->request->connection->aborted( );
  }
);

Or, if using an END block, use:

END {
    # make sure that the DB is flushed
    $dbh->sync( ) if Apache->request->connection->aborted( );
}

Note that if you use register_cleanup( ), it should be called at the beginning of the script or as soon as the variables you want to use in this code become available. If you use it at the end of the script, and the script happens to be aborted before this code is reached, no cleanup will be performed.

For example, CGI.pm registers a cleanup subroutine in its new( ) method:

sub new {
  # code snipped
  if ($MOD_PERL) {
      Apache->request->register_cleanup(\&CGI::_reset_globals);
      undef $NPH;
  }
  # more code snipped
}

Another way to register a section of cleanup code for mod_perl API handlers is to use PerlCleanupHandler in the configuration file:

<Location /foo>
    SetHandler perl-script
    PerlHandler        Apache::MyModule
    PerlCleanupHandler Apache::MyModule::cleanup( )
    Options ExecCGI
</Location>

Apache::MyModule::cleanup performs the cleanup.