Defensive Programming Techniques for Solaris Device Drivers
This section offers techniques for device drivers to avoid system panics and hangs,
wasting system resources, and spreading data corruption. A driver is considered hardened when
it uses these defensive programming practices in addition to the I/O fault services
framework for error handling and diagnosis.
All Solaris drivers should follow these coding practices:
Each piece of hardware should be controlled by a separate instance of the device driver. See Device Configuration Concepts.
Programmed I/O (PIO) must be performed only through the DDI access functions, using the appropriate data access handle. See Chapter 7, Device Access: Programmed I/O.
The device driver must assume that data that is received from the device might be corrupted. The driver must check the integrity of the data before the data is used.
The driver must avoid releasing bad data to the rest of the system.
Use only documented DDI functions and interfaces in your driver.
The driver must ensure that the device writes only into pages of memory in the DMA buffers (DDI_DMA_READ) that are controlled entirely by the driver. This technique prevents a DMA fault from corrupting an arbitrary part of the system's main memory.
The device driver must not be an unlimited drain on system resources if the device locks up. The driver should time out if a device claims to be continuously busy. The driver should also detect a pathological (stuck) interrupt request and take appropriate action.
The device driver must support hotplugging in the Solaris OS.
The device driver must use callbacks instead of waiting on resources.
The driver must free up resources after a fault. For example, the system must be able to close all minor devices and detach driver instances even after the hardware fails.
Using Separate Device Driver Instances
The Solaris kernel allows multiple instances of a driver. Each instance has its
own data space but shares the text and some global data with
other instances. The device is managed on a per-instance basis. Drivers should use a
separate instance for each piece of hardware unless the driver is designed to
handle any failover internally. Multiple instances of a driver per slot can occur,
for example, with multifunction cards.
Exclusive Use of DDI Access Handles
All PIO access by a driver must use Solaris DDI access functions
from the following families of routines:
ddi_getX
ddi_putX
ddi_rep_getX
ddi_rep_putX
The driver should not directly access the mapped registers by the address that
is returned from ddi_regs_map_setup(9F). Avoid the ddi_peek(9F) and ddi_poke(9F) routines because these routines
do not use access handles.
The DDI access mechanism is important because DDI access provides an opportunity to
control how data is read into the kernel.
Detecting Corrupted Data
The following sections describe where data corruption can occur and how to detect
corruption.
Corruption of Device Management and Control Data
The driver should assume that any data obtained from the device, whether by
PIO or DMA, could have been corrupted. In particular, extreme care should be
taken with pointers, memory offsets, and array indexes that are based on data
from the device. Such values can be malignant, in that these values can
cause a kernel panic if dereferenced. All such values should be checked for
range and alignment (if required) before use.
Even a pointer that is not malignant can still be misleading. For example,
a pointer can point to a valid but not correct instance of an
object. Where possible, the driver should cross-check the pointer with the object to
which it is pointing, or otherwise validate the data obtained through that pointer.
Other types of data can also be misleading, such as packet lengths, status
words, or channel IDs. These data types should be checked to the
extent possible. A packet length can be range-checked to ensure that the length
is neither negative nor larger than the containing buffer. A status word can be
checked for ”impossible” bits. A channel ID can be matched against a list
of valid IDs.
Where a value is used to identify a stream, the driver must
ensure that the stream still exists. The asynchronous nature of processing STREAMS means that
a stream can be dismantled while device interrupts are still outstanding.
The driver should not reread data from the device. The data should
be read once, validated, and stored in the driver's local state. This technique avoids
the hazard of data that is correct when initially read, but is
incorrect when reread later.
The driver should also ensure that all loops are bounded. For example, a
device that returns a continuous BUSY status should not be able to lock
up the entire system.
Corruption of Received Data
Device errors can result in corrupted data being placed in receive buffers. Such
corruption is indistinguishable from corruption that occurs beyond the domain of the device,
for example, within a network. Typically, existing software is already in place to
handle such corruption. One example is the integrity checks at the transport layer
of a protocol stack. Another example is integrity checks within the application that
uses the device.
If the received data is not to be checked for integrity at
a higher layer, the data can be integrity-checked within the driver itself. Methods of
detecting corruption in received data are typically device-specific. Checksums and CRC are examples
of the kinds of checks that can be done.
DMA Isolation
A defective device might initiate an improper DMA transfer over the bus. This
data transfer could corrupt good data that was previously delivered. A device that
fails might generate a corrupt address that can contaminate memory that does not
even belong to its own driver.
In systems with an IOMMU, a device can write only to pages
mapped as writable for DMA. Therefore, such pages should be owned solely by
one driver instance. These pages should not be shared with any other kernel
structure. While the page in question is mapped as writable for DMA, the
driver should be suspicious of data in that page. The page must be
unmapped from the IOMMU before the page is passed beyond the driver, and
before any validation of the data.
You can use ddi_umem_alloc(9F) to guarantee that a whole aligned page is
allocated, or allocate multiple pages and ignore the memory below the first page
boundary. You can find the size of an IOMMU page by using ddi_ptob(9F).
Alternatively, the driver can choose to copy the data into a safe
part of memory before processing it. If this is done, the data must
first be synchronized using ddi_dma_sync(9F).
Calls to ddi_dma_sync() should specify SYNC_FOR_DEV before using DMA to transfer data
to a device, and SYNC_FOR_CPU after using DMA to transfer data from
the device to memory.
On some PCI-based systems with an IOMMU, devices can use PCI dual address
cycles (64-bit addresses) to bypass the IOMMU. This capability gives the device the
potential to corrupt any region of main memory. Device drivers must not attempt
to use such a mode and should disable it.
Handling Stuck Interrupts
The driver must identify stuck interrupts because a persistently asserted interrupt severely affects
system performance, almost certainly stalling a single-processor machine.
Sometimes the driver might have difficulty identifying a particular interrupt as invalid. For
network drivers, if a receive interrupt is indicated but no new buffers have
been made available, no work was needed. When this situation is an isolated
occurrence, it is not a problem, since the actual work might already have
been completed by another routine such as a read service.
On the other hand, continuous interrupts with no work for the driver to
process can indicate a stuck interrupt line. For this reason, platforms allow a
number of apparently invalid interrupts to occur before taking defensive action.
While appearing to have work to do, a hung device might be
failing to update its buffer descriptors. The driver should defend against such repetitive requests.
In some cases, platform-specific bus drivers might be capable of identifying a persistently
unclaimed interrupt and can disable the offending device. However, this relies on the
driver's ability to identify the valid interrupts and return the appropriate value. The
driver should return a DDI_INTR_UNCLAIMED result unless the driver detects that the device
legitimately asserted an interrupt. The interrupt is legitimate only if the device actually
requires the driver to do some useful work.
The legitimacy of other, more incidental, interrupts is much harder to certify. An
interrupt-expected flag is a useful tool for evaluating whether an interrupt is valid.
Consider an interrupt such as descriptor free, which can be generated if all
the device's descriptors had been previously allocated. If the driver detects that it
has taken the last descriptor from the card, it can set an interrupt-expected
flag. If this flag is not set when the associated interrupt is delivered,
the interrupt is suspicious.
Some informative interrupts might not be predictable, such as one that indicates that
a medium has become disconnected or frame sync has been lost. The easiest
method of detecting whether such an interrupt is stuck is to mask this
particular source on first occurrence until the next polling cycle.
If the interrupt occurs again while disabled, the interrupt should be considered false.
Some devices have interrupt status bits that can be read even if the
mask register has disabled the associated source and might not be causing the
interrupt. You can devise a more appropriate algorithm specific to your devices.
Avoid looping on interrupt status bits indefinitely. Break such loops if none of
the status bits set at the start of a pass requires any
real work.
Additional Programming Considerations
In addition to the requirements discussed in the previous sections, consider the following
issues:
Thread Interaction
Kernel panics in a device driver are often caused by unexpected interaction of
kernel threads after a device failure. When a device fails, threads can interact
in ways that you did not anticipate.
If processing routines terminate early, the condition variable waiters are blocked because an
expected signal is never given. Attempting to inform other modules of the failure
or handling unanticipated callbacks can result in undesirable thread interactions. Consider the sequence
of mutex acquisition and relinquishing that can occur during device failures.
Threads that originate in an upstream STREAMS module can become involved in unfortunate
paradoxes if those threads are used to return to that module unexpectedly. Consider
using alternative threads to handle exception messages. For instance, a procedure might use
a read-side service routine to communicate an M_ERROR, rather than handling the error directly
with a read-side putnext(9F).
A failing STREAMS device that cannot be quiesced during close because of a
fault can generate an interrupt after the stream has been dismantled. The interrupt
handler must not attempt to use a stale stream pointer to try to
process the message.
Threats From Top-Down Requests
While protecting the system from defective hardware, you also need to protect against
driver misuse. Although the driver can assume that the kernel infrastructure is
always correct (a trusted core), user requests passed to it can be potentially
destructive.
For example, a user can request an action to be performed upon
a user-supplied data block (M_IOCTL) that is smaller than the block size that is
indicated in the control part of the message. The driver should never trust
a user application.
Consider the construction of each type of ioctl that your driver can receive
and the potential harm that the ioctl could cause. The driver should perform
checks to ensure that it does not process a malformed ioctl.
Adaptive Strategies
A driver can continue to provide service using faulty hardware. The driver can
attempt to work around the identified problem by using an alternative strategy for
accessing the device. Given that broken hardware is unpredictable and given the risk
associated with additional design complexity, adaptive strategies are not always wise. At most,
these strategies should be limited to periodic interrupt polling and retry attempts. Periodically
retrying the device tells the driver when a device has recovered. Periodic polling
can control the interrupt mechanism after a driver has been forced to disable interrupts.
Ideally, a system always has an alternative device to provide a vital system
service. Service multiplexors in kernel or user space offer the best method of
maintaining system services when a device fails. Such practices are beyond the scope
of this section.