10.2. Forking and Executing Subprocessesfrom mod_perl
When you fork Apache, you are forking the entire Apache server, lock,
stock and barrel. Not only are you duplicating your Perl code and the
Perl interpreter, but you are also duplicating all the core routines
and whatever modules you have used in your server—for example,
mod_ssl, mod_rewrite, mod_log, mod_proxy, and mod_speling (no,
that's not a typo!). This can be a large overhead on
some systems, so wherever possible, it's desirable
to avoid forking under mod_perl.
Modern operating systems
have a light version of fork(
), optimized to do the absolute minimum of memory-page
duplication, which adds little overhead when called. This fork relies
on the copy-on-writetechnique. The gist of this technique is
as follows: the parent process's memory pages
aren't all copied immediately to the
child's space on fork( ) ing; this
is done later, when the child or the parent modifies the data in the
shared memory pages.
If you need to call a Perl program from your mod_perl code,
it's better to try to convert the program into a
module and call it as a function without spawning a special process
to do that. Of course, if you cannot do that or the program is not
written in Perl, you have to call the program via system(
) or an equivalent function, which spawns a new process. If
the program is written in C, you can try to write some Perl glue code
with help of the Inline, XS, or SWIG architectures. Then the program
will be executed as a Perl subroutine and avoid a fork(
) call.
Also by trying to spawn a subprocess, you might be trying to do the
wrong thing. If you just want to do some post-processing after
sending a response to the browser, look into the
PerlCleanupHandler directive. This
allows
you to do exactly that. If you just
need to run some cleanup code, you may want to register this code
during the request processing via:
my $r = shift;
$r->register_cleanup(\&do_cleanup);
sub do_cleanup{ #some clean-up code here }
But when a lengthy job needs to be done, there is not much choice but
to use fork( ). You cannot just run such a job
within an Apache process, since firstly it will keep the Apache
process busy instead of letting it do the job it was designed for,
and secondly, unless it is coded so as to detach from the Apache
processes group, if Apache should happen to be stopped the lengthy
job might be terminated as well.
In the following sections, we'll discuss how to
properly spawn new processes under mod_perl.
10.2.1. Forking a New Process
The typical way to call fork( ) under mod_perl is
illustrated
in Example 10-13.
Example 10-13. fork1.pl
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
# Parent runs this block
}
else {
# Child runs this block
# some code comes here
CORE::exit(0);
}
# possibly more code here usually run by the parent
When using fork( ), you should check its return
value, since a return of undef it means that the
call was unsuccessful and no process was spawned. This can happen for
example, when the system is already running too many processes and
cannot spawn new ones.
When the process is successfully forked, the parent receives the PID
of the newly spawned child as a returned value of the fork(
) call and the child receives 0. Now the
program splits into two. In the above example, the code inside the
first block after if will be executed by the
parent, and the code inside the first block after
else will be executed by the child.
It's important not to forget to explicitly call
exit( ) at the end of the child code when forking.
If you don't and there is some code outside the
if...else block, the child process will execute it
as well. But under mod_perl there is another nuance—you must
use CORE::exit( ) and not exit(
), which would be automatically overriden by
Apache::exit( ) if used in conjunction with
Apache::Registry and similar modules. You want the
spawned process to quit when its work is done, or
it'll just stay alive, using resources and doing
nothing.
The parent process usually completes its execution and returns to the
pool of free servers to wait for a new assignment. If the execution
is to be aborted earlier for some reason, you should use
Apache::exit( ) or die( ). In
the case of Apache::Registry or
Apache::PerlRun handlers, a simple exit(
) will do the right thing.
10.2.2. Freeing the Parent Process
In the child code, you must also close all the
pipes to the connection socket that were opened by the parent process
(i.e., STDIN and STDOUT) and
inherited by the child, so the parent will be able to complete the
request and free itself for serving other requests. If you need the
STDIN and/or STDOUTstreams,
you should reopen them. You may need to close or reopen the
STDERR file handle, too. As inherited from its
parent, it's opened to append to the
error_log file, so the chances are that you will
want to leave it untouched.
Under mod_perl, the spawned process also inherits the file descriptor
that's tied to the socket through which all the
communications between the server and the client pass. Therefore, you
need to free this stream in the forked process. If you
don't, the server can't be
restarted while the spawned process is still running. If you attempt
to restart the server, you will get the following error:
[Mon May 20 23:04:11 2002] [crit]
(98)Address already in use: make_sock:
could not bind to address 127.0.0.1 port 8000
Apache::SubProcess comes to help,
providing a method called
cleanup_for_exec( ) that takes care of closing
this file descriptor.
The simplest way to free
the parent process is to
close the STDIN, STDOUT, and
STDERRstreams (if you don't need
them) and untie the Apache socket. If the mounted partition is to be
unmounted at a later time, in addition you may want to change the
current directory of the forked process to / so
that the forked process won't keep the mounted
partition busy.
To summarize all these issues, here is an example of a fork that
takes care of freeing the parent process (Example 10-14).
Example 10-14. fork2.pl
use Apache::SubProcess;
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
# Parent runs this block
}
else {
# Child runs this block
$r->cleanup_for_exec( ); # untie the socket
chdir '/' or die "Can't chdir to /: $!";
close STDIN;
close STDOUT;
close STDERR;
# some code goes here
CORE::exit(0);
}
# possibly more code here usually run by the parent
Of course, the real code should be placed between freeing the parent
code and the child process termination.
10.2.3. Detaching the Forked Process
Now what happens if the forked
process is running and we
decide that we need to restart the web server? This forked process
will be aborted, because when the parent process dies during the
restart, it will kill its child processes as well. In order to avoid
this, we need to detach the process from its parent session by
opening a new session with help of a setsid( )
system call (provided by the POSIX module). This
is demonstrated in Example 10-15.
Example 10-15. fork3.pl
use POSIX 'setsid';
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
# Parent runs this block
}
else {
# Child runs this block
setsid or die "Can't start a new session: $!";
# ...
}
Now the spawned child process has a life of its own, and it
doesn't depend on the parent any more.
10.2.4. Avoiding Zombie Processes
Normally, every process has a
parent. Many processes are children of the
init process, whose PID is 1. When you fork a
process, you must wait( ) or waitpid(
) for it to finish. If you don't
wait( ) for it, it becomes a zombie.
A zombie is a process that doesn't have a parent.
When the child quits, it reports the termination to its parent. If no
parent wait( )s to collect the exit status of the
child, it gets confused and becomes a ghost process that can be seen
as a process but not killed. It will be killed only when you stop the
parent process that spawned it.
Generally, the ps(1) utility displays these
processes with the <defunc> tag, and you may
see the zombies counter increment when using top(
). These zombie processes can take up system resources and
are generally undesirable.
The proper way to do a fork, to avoid zombie processes, is shown in
Example 10-16.
Example 10-16. fork4.pl
my $r = shift;
$r->send_http_header('text/plain');
defined (my $kid = fork) or die "Cannot fork: $!";
if ($kid) {
waitpid($kid,0);
print "Parent has finished\n";
}
else {
# do something
CORE::exit(0);
}
In most cases, the only reason you would want to fork is when you
need to spawn a process that will take a long time to complete. So if
the Apache process that spawns this new child process has to wait for
it to finish, you have gained nothing. You can neither wait for its
completion (because you don't have the time to) nor
continue, because if you do you will get yet another zombie process.
This is called a blocking call, since the
process is blocked from doing anything else
until this call gets completed.
The simplest solution is to ignore your dead children. Just add this
line before the fork( ) call:
$SIG{CHLD} = 'IGNORE';
When you set the CHLD (SIGCHLD
in C) signal handler to 'IGNORE', all the
processes will be collected by the init process
and therefore will be prevented from becoming zombies. This
doesn't work everywhere, but it has been proven to
work at least on Linux.
Note that you cannot localize this setting with local(
). If you try, it won't have the desired
effect.
The latest version of the code is shown in Example 10-17.
Example 10-17. fork5.pl
my $r = shift;
$r->send_http_header('text/plain');
$SIG{CHLD} = 'IGNORE';
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
print "Parent has finished\n";
}
else {
# do something time-consuming
CORE::exit(0);
}
Note that the waitpid( ) call is gone. The
$SIG{CHLD} = 'IGNORE'; statement protects us from
zombies, as explained above.
Another solution (more portable, but slightly more expensive) is to
use a double fork approach, as shown in Example 10-18.
Example 10-18. fork6.pl
my $r = shift;
$r->send_http_header('text/plain');
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
waitpid($kid,0);
}
else {
defined (my $grandkid = fork) or die "Kid cannot fork: $!\n";
if ($grandkid) {
CORE::exit(0);
}
else {
# code here
# do something long lasting
CORE::exit(0);
}
}
Grandkid becomes a child of init—i.e., a
child of the process whose PID is 1.
Note that the previous two solutions do allow you to determine the
exit status of the process, but in our example, we
don't care about it.
Yet another solution is to use a different SIGCHLD
handler:
use POSIX 'WNOHANG';
$SIG{CHLD} = sub { while( waitpid(-1,WNOHANG)>0 ) { } };
This is useful when you fork( ) more than one
process. The handler could call wait( ) as well,
but for a variety of reasons involving the handling of stopped
processes and the rare event in which two children exit at nearly the
same moment, the best technique is to call waitpid(
) in a tight loop with a first argument of
-1 and a second argument of
WNOHANG. Together these arguments tell
waitpid( ) to reap the next child
that's available and prevent the call from blocking
if there happens to be no child ready for reaping. The handler will
loop until waitpid( ) returns a negative number or
zero, indicating that no more reapable children remain.
While testing and debugging code that uses one of the above examples,
you might want to write debug information to the
error_log file so that you know
what's happening.
Read the perlipc manpage for more information
about signal handlers.
10.2.5. A Complete Fork Example
Now let's put all the bits
of
code together and show a well-written example that solves all the
problems discussed so
far. We will use an Apache::Registryscript for
this purpose. Our script is shown in Example 10-19.
Example 10-19. proper_fork1.pl
use strict;
use POSIX 'setsid';
use Apache::SubProcess;
my $r = shift;
$r->send_http_header("text/plain");
$SIG{CHLD} = 'IGNORE';
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
print "Parent $$ has finished, kid's PID: $kid\n";
}
else {
$r->cleanup_for_exec( ); # untie the socket
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
setsid or die "Can't start a new session: $!";
my $oldfh = select STDERR;
local $| = 1;
select $oldfh;
warn "started\n";
# do something time-consuming
sleep 1, warn "$_\n" for 1..20;
warn "completed\n";
CORE::exit(0); # terminate the process
}
The script starts with the usual declaration of
strict mode, then loads the
POSIX and Apache::SubProcess
modules and imports the setsid( )symbol from the
POSIX package.
The HTTP header is sent next, with the
Content-Type of text/plain. To
avoid zombies, the parent process gets ready to ignore the child, and
the fork is called.
The if condition evaluates to a true value for the
parent process and to a false value for the child process; therefore,
the first block is executed by the parent and the second by the
child.
The parent process announces its PID and the PID of the spawned
process, and finishes its block. If there is any code outside the
ifstatement, it will be executed by the parent as
well.
The child process starts its code by disconnecting from the socket,
changing its current directory to /, and opening
the STDIN and STDOUTstreams to
/dev/null (this has the effect of closing them
both before opening them). In fact, in this example we
don't need either of these, so we could just
close( ) both. The child process completes its
disengagement from the parent process by opening the
STDERRstream to /tmp/log, so
it can write to that file, and creates a new session with the help of
setsid( ). Now the child process has nothing to do
with the parent process and can do the actual processing that it has
to do. In our example, it outputs a series of warnings, which are
logged to /tmp/log:
my $oldfh = select STDERR;
local $| = 1;
select $oldfh;
warn "started\n";
# do something time-consuming
sleep 1, warn "$_\n" for 1..20;
warn "completed\n";
We set $|=1 to unbuffer the
STDERRstream, so we can immediately see the debug
output generated by the program. We use the keyword
localso that buffering in other processes is not
affected. In fact, we don't really need to unbuffer
output when it is generated by warn( ). You want
it if you use print( ) to debug.
Finally, the child process terminates by calling:
CORE::exit(0);
which makes sure that it terminates at the end of the block and
won't run some code that it's not
supposed to run.
This code example will allow you to verify that indeed the spawned
child process has its own life, and that its parent is free as well.
Simply issue a request that will run this script, see that the
process starts writing warnings to the file
/tmp/log, and issue a complete server stop and
start. If everything is correct, the server will successfully restart
and the long-term process will still be running. You will know that
it's still running if the warnings are still being
written into /tmp/log. If Apache takes a long
time to stop and restart, you may need to raise the number of
warnings to make sure that you don't miss the end of
the run.
If there are only five warnings to be printed, you should see the
following output in the /tmp/log file:
started
1
2
3
4
5
completed
10.2.6. Starting a Long-Running External Program
What happens if we cannot just run
Perl code from the spawned process? We may have a compiled utility,
such as a program written in C, or a Perl program that cannot easily
be converted into a module and thus called as a function. In this
case, we have to use system( ), exec(
), qx( ) or ``
(backticks) to start it.
When using any of these methods, and when taint mode is enabled, we
must also add the following code to untaint the
PATH environment variable and delete a few other
insecure environment variables. This information can be found in the
perlsec manpage.
Now all we have to do is reuse the code from the previous section.
First we move the core program into the
external.pl file, then we add the shebang line
so that the program will be executed by Perl, tell the program to run
under taint mode (-T), possibly enable
warnings mode (-w), and
make it executable. These changes are shown in Example 10-20.
Example 10-20. external.pl
#!/usr/bin/perl -Tw
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: $!";
my $oldfh = select STDERR;
local $| = 1;
select $oldfh;
warn "started\n";
# do something time-consuming
sleep 1, warn "$_\n" for 1..20;
warn "completed\n";
Now we replace the code that we moved into the external program with
a call to exec( ) to run it, as shown in Example 10-21.
Example 10-21. proper_fork_exec.pl
use strict;
use POSIX 'setsid';
use Apache::SubProcess;
$ENV{'PATH'} = '/bin:/usr/bin';
delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
my $r = shift;
$r->send_http_header("text/html");
$SIG{CHLD} = 'IGNORE';
defined (my $kid = fork) or die "Cannot fork: $!\n";
if ($kid) {
print "Parent has finished, kid's PID: $kid\n";
}
else {
$r->cleanup_for_exec( ); # untie the socket
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
setsid or die "Can't start a new session: $!";
exec "/home/httpd/perl/external.pl" or die "Cannot execute exec: $!";
}
Notice that exec( ) never returns unless it fails
to start the process. Therefore you shouldn't put
any code after exec( )—it will not be
executed in the case of success. Use system( ) or
backticks instead if you want to continue doing other things in the
process. But then you probably will want to terminate the process
after the program has finished, so you will have to write:
system "/home/httpd/perl/external.pl"
or die "Cannot execute system: $!";
CORE::exit(0);
Another important nuance is that we have to close all
STDstreams in the forked process, even if the
called program does that.
If the external program is written in Perl, you can pass complicated
data stuctures to it using one of the methods to serialize and then
restore Perl data. The Storable and
FreezeThaw modules come in handy.
Let's say that we have a program called
master.pl (Example 10-22) calling
another program called slave.pl (Example 10-23).
Example 10-22. master.pl
# we are within the mod_perl code
use Storable ( );
my @params = (foo => 1, bar => 2);
my $params = Storable::freeze(\@params);
exec "./slave.pl", $params or die "Cannot execute exec: $!";
Example 10-23. slave.pl
#!/usr/bin/perl -w
use Storable ( );
my @params = @ARGV ? @{ Storable::thaw(shift)||[ ] } : ( );
# do something
As you can see, master.pl serializes the
@params data structure with
Storable::freeze and passes it to
slave.pl as a single \argument.
slave.pl recovers it with
Storable::thaw, by shifting the first value of the
@ARGV array (if available). The
FreezeThaw module does a very similar thing.
10.2.7. Starting a Short-Running External Program
Sometimes you need to call an external program
and you cannot continue before this program completes its run (e.g.,
if you need it to return some result). In this case, the fork
solution doesn't help. There are a few ways to
execute such a program. First, you could use system(
):
system "perl -e 'print 5+5'"
You would never call the Perl interperter for doing a simple
calculation like this, but for the sake of a simple example
it's good enough.
The problem with this approach is that we cannot get the results
printed to STDOUT. That's where
backticks or qx( ) can help. If you use either:
my $result = `perl -e 'print 5+5'`;
or:
my $result = qx{perl -e 'print 5+5'};
the whole output of the external program will be stored in the
$result variable.
Of course, you can use other solutions, such as opening a pipe
(|) to the program if you need to submit many
arguments. And there are more evolved solutions provided by other
Perl modules, such as IPC::Open2 and
IPC::Open3, that allow you to open a process for
reading, writing, and error handling.
10.2.8. Executing system( ) or exec( ) in the Right Way
The Perl exec( ) and system( )
functions behave identically in the way they spawn a
program. Let's use system( ) as
an example. Consider the following code:
system("echo", "Hi");
Perl will use the first argument as a program to execute, find the
echo executable along the search path, invoke it
directly, and pass the string "Hi"
as an argument.
Note that Perl's system( ) is not
the same as the standard libcsystem(3) call.
If there is more than one argument to system( ) or
exec( ), or the argument is an array with more
than one element in it, the arguments are passed directly to the
C-level functions. When the argument is a single scalar or an array
with only a single scalar in it, it will first be checked to see if
it contains any shell metacharacters (e.g., *,
?). If there are any, the Perl interpreter invokes
a real shell program (/bin/sh -c on Unix
platforms). If there are no shell metacharacters in the argument, it
is split into words and passed directly to the C level, which is more
efficient.
In other words, only if you do:
system "echo *"
will Perl actually exec( ) a copy of
/bin/sh to parse your command, which may incur a
slight overhead on certain OSes.
It's especially important to remember to run your
code with taint mode enabled when system( ) or
exec( ) is called using a single argument. There
can be bad consequences if user input gets to the shell without
proper laundering first. Taint mode will alert you when such a
condition happens.
Perl will try to do the most efficient thing no matter how the
arguments are passed, and the additional overhead may be incurred
only if you need the shell to expand some metacharacters before doing
the actual call.
10. Improving Performance with Shared Memory and Proper Forking