Driver Hardening Test Harness
The driver hardening test harness tests that the I/O fault services and defensive
programming requirements have been correctly fulfilled. Hardened device drivers are resilient to potential
hardware faults. You must test the resilience of device drivers as part of
the driver development process. This type of testing requires that the driver handle
a wide range of typical hardware faults in a controlled and repeatable way.
The driver hardening test harness enables you to simulate such hardware faults in
software.
The driver hardening test harness is a Solaris device driver development tool. The
test harness injects a wide range of simulated hardware faults when the driver
under development accesses its hardware. This section describes how to configure the test
harness, create error-injection specifications (referred to as errdefs), and execute the tests on your
device driver.
The test harness intercepts calls from the driver to various DDI routines, then
corrupts the result of the calls as if the hardware had caused
the corruption. In addition, the harness allows for corruption of accesses to specific
registers as well as definition of more random types of corruption.
The test harness can generate test scripts automatically by tracing all register accesses
as well as direct memory access (DMA) and interrupt usage during the running
of a specified workload. A script is generated that reruns that workload while
injecting a set of faults into each access.
The driver tester should remove duplicate test cases from the generated scripts.
The test harness is implemented as a device driver called bofi, which
stands for bus_ops fault injection, and two user-level utilities, th_define(1M) and th_manage(1M).
The test harness does the following tasks:
Validates compliant use of Solaris DDI services
Facilitates controlled corruption of programmed I/O (PIO) and DMA requests and interference with interrupts, thus simulating faults that occur in the hardware managed by the driver
Facilitates simulation of failures in the data path between the CPU and the device, which are reported from parent nexus drivers
Monitors a driver's access during a specified workload and generates fault-injection scripts
Fault Injection
The driver hardening test harness intercepts and, when requested, corrupts each access a
driver makes to its hardware. This section provides information you should understand to
create faults to test the resilience of your driver.
Solaris devices are managed inside a tree-like structure called the device tree (devinfo
tree). Each node of the devinfo tree stores information that relates to a
particular instance of a device in the system. Each leaf node corresponds to
a device driver, while all other nodes are called nexus nodes. Typically, a
nexus represents a bus. A bus node isolates leaf drivers from bus dependencies,
which enables architecturally independent drivers to be produced.
Many of the DDI functions, particularly the data access functions, result in upcalls
to the bus nexus drivers. When a leaf driver accesses its hardware, it
passes a handle to an access routine. The bus nexus understands how
to manipulate the handle and fulfill the request. A DDI-compliant driver only accesses hardware
through use of these DDI access routines. The test harness intercepts these upcalls
before they reach the specified bus nexus. If the data access matches the
criteria specified by the driver tester, the access is corrupted. If the data
access does not match the criteria, it is given to the bus nexus
to handle in the usual way.
A driver obtains an access handle by using the ddi_regs_map_setup(9F) function:
ddi_regs_map_setup(dip, rset, ma, offset, size, handle)
The arguments specify which “offboard” memory is to be mapped. The driver must
use the returned handle when it references the mapped I/O addresses, since handles
are meant to isolate drivers from the details of bus hierarchies. Therefore, do
not directly use the returned mapped address, ma. Direct use of the
mapped address destroys the current and future uses of the data access function
mechanism.
For programmed I/O, the suite of data access functions is:
I/O to Host:
ddi_getX(handle, ma)
ddi_rep_getX(handle, buf, ma, repcnt, flag)
Host to I/O:
ddi_putX(handle, ma, value)
ddi_rep_putX()
X and repcnt are the number of bytes to be transferred. X is
the bus transfer size of 8, 16, 32, or 64 bytes.
DMA has a similar, yet richer, set of data access functions.
Setting Up the Test Harness
The driver hardening test harness is part of the Solaris Developer Cluster. If
you have not installed this Solaris cluster, you must manually install the test
harness packages appropriate for your platform.
Installing the Test Harness
To install the test harness packages (SUNWftduu and SUNWftdur), use the pkgadd(1M) command.
As superuser, go to the directory in which the packages are located and
type:
# pkgadd -d . SUNWftduu SUNWftdur
Configuring the Test Harness
After the test harness is installed, set the properties in the /kernel/drv/bofi.conf
file to configure the harness to interact with your driver. When the harness
configuration is complete, reboot the system to load the harness driver.
The test harness behavior is controlled by boot-time properties that are set in
the /kernel/drv/bofi.conf configuration file.
When the harness is first installed, enable the harness to intercept the DDI
accesses to your driver by setting these properties:
- bofi-nexus
Bus nexus type, such as the PCI bus
- bofi-to-test
Name of the driver under test
For example, to test a PCI bus network driver called xyznetdrv, set the
following property values:
bofi-nexus="pci"
bofi-to-test="xyznetdrv"
Other properties relate to the use and harness checking of the Solaris DDI
data access mechanisms for reading and writing from peripherals that use PIO and
transferring data to and from peripherals that use DMA.
- bofi-range-check
When this property is set, the test harness checks the consistency of the arguments that are passed to PIO data access functions.
- bofi-ddi-check
When this property is set, the test harness verifies that the mapped address that is returned by ddi_map_regs_setup(9F) is not used outside of the context of the data access functions.
- bofi-sync-check
When this property is set, the test harness verifies correct usage of DMA functions and ensures that the driver makes compliant use of ddi_dma_sync(9F).
Testing the Driver
This section describes how to create and inject faults by using the
th_define(1M) and th_manage(1M) commands.
Creating Faults
The th_define utility provides an interface to the bofi device driver for
defining errdefs. An errdef corresponds to a specification for how to corrupt a
device driver's accesses to its hardware. The th_define command-line arguments determine the precise nature
of the fault to be injected. If the supplied arguments define a
consistent errdef, the th_define process stores the errdef with the bofi driver. The
process suspends itself until the criteria given by the errdef becomes satisfied. In
practice, the suspension ends when the access counts go to zero (0).
Injecting Faults
The test harness operates at the level of data accesses. A data
access has the following characteristics:
Type of hardware being accessed (driver name)
Instance of the hardware being accessed (driver instance)
Register set being tested
Subset of the register set that is targeted
Direction of the transfer (read or write)
Type of access (PIO or DMA)
The test harness intercepts data accesses and injects appropriate faults into the driver.
An errdef, specified by the th_define(1M) command, encodes the following information:
The driver instance and register set being tested (-n name, -i instance, and -r reg_number).
The subset of the register set eligible for corruption. This subset is indicated by providing an offset into the register set and a length from that offset (-l offset [len]).
The kind of access to be intercepted: log, pio, dma, pio_r, pio_w, dma_r, dma_w, intr (-a acc_types).
How many accesses should be faulted (-c count [failcount]).
The kind of corruption that should be applied to a qualifying access (-o operator [operand]).
Replace datum with a fixed value (EQUAL)
Perform a bitwise operation on the datum (AND, OR, XOR)
Ignore the transfer (for host to I/O accesses NO_TRANSFER)
Lose, delay, or inject spurious interrupts (LOSE, DELAY, EXTRA)
Use the -a acc_chk option to simulate framework faults in an errdef.
Fault-Injection Process
The process of injecting a fault involves two phases:
Use the th_define(1M) command to create errdefs.
Create errdefs by passing test definitions to the bofi driver, which stores the definitions so they can be accessed by using the th_manage(1M) command.
Create a workload, then use the th_manage command to activate and manage the errdef.
The th_manage command is a user interface to the various ioctls that are recognized by the bofi harness driver. The th_manage command operates at the level of driver names and instances and includes these commands: get_handles to list access handles, start to activate errdefs, and stop to deactivate errdefs.
The activation of an errdef results in qualifying data accesses to be faulted. The th_manage utility supports these commands: broadcast to provide the current state of the errdef and clear_errors to clear the errdef.
See the th_define(1M) and th_manage(1M) man pages for more information.
Test Harness Warnings
You can configure the test harness to handle warning messages in the following
ways:
Use the second method to help pinpoint the root cause of a
problem.
When the bofi-range-check property value is set to warn, the harness prints the
following messages (or panics if set to panic) when it detects a range
violation of a DDI function by your driver:
ddi_getX() out of range addr %x not in %x
ddi_putX() out of range addr %x not in %x
ddi_rep_getX() out of range addr %x not in %x
ddi_rep_putX() out of range addr %x not in %x
X is 8, 16, 32, or 64.
When the harness has been requested to insert over 1000 extra interrupts, the
following message is printed if the driver does not detect interrupt jabber:
undetected interrupt jabber - %s %d
Using Scripts to Automate the Test Process
You can create fault-injection test scripts by using the logging access type of
the th_define(1M) utility:
# th_define -n name -i instance -a log [-e fixup_script]
The th_define command takes the instance offline and brings it back online. Then
th_define runs the workload that is described by the fixup_script and logs I/O accesses
that are made by the driver instance.
The fixup_script is called twice with the set of optional arguments. The script
is called once just before the instance is taken offline, and it is
called again after the instance has been brought online.
The following variables are passed into the environment of the called executable:
- DRIVER_PATH
Device path of the instance
- DRIVER_INSTANCE
Instance number of the driver
- DRIVER_UNCONFIGURE
Set to 1 when the instance is about to be taken offline
- DRIVER_CONFIGURE
Set to 1 when the instance has just been brought online
Typically, the fixup_script ensures that the device under test is in a suitable
state to be taken offline (unconfigured) or in a suitable state for error
injection (for example, configured, error free, and servicing a workload). The following script
is a minimal script for a network driver:
#!/bin/ksh
driver=xyznetdrv
ifnum=$driver$DRIVER_INSTANCE
if [[ $DRIVER_CONFIGURE = 1 ]]; then
ifconfig $ifnum plumb
ifconfig $ifnum ...
ifworkload start $ifnum
elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then
ifworkload stop $ifnum
ifconfig $ifnum down
ifconfig $ifnum unplumb
fi
exit $?
Note - The ifworkload command should initiate the workload as a background task. The fault
injection occurs after the fixup_script configures the driver under test and brings
it online (DRIVER_CONFIGURE is set to 1).
If the -e fixup_script option is present, it must be the last option
on the command line. If the -e option is not present, a default
script is used. The default script repeatedly attempts to bring the device under
test offline and online. Thus the workload consists of the driver's attach() and
detach() paths.
The resulting log is converted into a set of executable scripts that are
suitable for running unassisted fault-injection tests. These scripts are created in a subdirectory
of the current directory with the name driver.test.id. The scripts inject faults,
one at a time, into the driver while running the workload that is
described by the fixup_script.
The driver tester has substantial control over the errdefs that are produced by
the test automation process. See the th_define(1M) man page.
If the tester chooses a suitable range of workloads for the test
scripts, the harness gives good coverage of the hardening aspects of the driver.
However, to achieve full coverage, the tester might need to create additional test cases
manually. Add these cases to the test scripts. To ensure that testing completes
in a timely manner, you might need to manually delete duplicate test cases.
Automated Test Process
The following process describes automated testing:
Identify the aspects of the driver to be tested.
Test all aspects of the driver that interact with the hardware:
A separate workload script (fixup_script) must be generated for each mode of use.
For each mode of use, prepare an executable program (fixup_script) that configures and unconfigures the device, and creates and terminates a workload.
Run the th_define(1M) command with the errdefs, together with an access type of -a log.
Wait for the logs to fill.
The logs contain a dump of the bofi driver's internal buffers. This data is included at the front of the script.
Because it can take from a few seconds to several minutes to create the logs, use the th_manage broadcast command to check the progress.
Change to the created test directory and run the master test script.
The master script runs each generated test script in sequence. Separate test scripts are generated per register set.
Store the results for analysis.
Successful test results, such as success (corruption reported) and success (corruption undetected), show that the driver under test is behaving properly. The results are reported as failure (no service impact reported) if the harness detects that the driver has failed to report the service impact after reporting a fault, or if the driver fails to detect that an access or DMA handle has been marked as faulted.
It is fine for a few test not triggered failures to appear in the output. However, several such failures indicate that the test is not working properly. These failures can appear when the driver does not access the same registers as when the test scripts were generated.
Run the test on multiple instances of the driver concurrently to test the multithreading of error paths.
For example, each th_define command creates a separate directory that contains test scripts and a master script:
# th_define -n xyznetdrv -i 0 -a log -e script
# th_define -n xyznetdrv -i 1 -a log -e script
Once created, run the master scripts in parallel.
Note - The generated scripts produce only simulated fault injections that are based on what was logged during the time the logging errdef was active. When you define a workload, ensure that the required results are logged. Also analyze the resulting logs and fault-injection specifications. Verify that the hardware access coverage that the resulting test scripts created is what is required.