Repairing a Damaged Device
This section describes how to determine device failure types, clear transient errors, and
replace a device.
Determining the Type of Device Failure
The term damaged device is rather vague, and can describe a number of possible
situations:
Bit rot – Over time, random events, such as magnetic influences and cosmic rays, can cause bits stored on disk to flip in unpredictable events. These events are relatively rare but common enough to cause potential data corruption in large or long-running systems. These errors are typically transient.
Misdirected reads or writes – Firmware bugs or hardware faults can cause reads or writes of entire blocks to reference the incorrect location on disk. These errors are typically transient, though a large number might indicate a faulty drive.
Administrator error – Administrators can unknowingly overwrite portions of the disk with bad data (such as copying /dev/zero over portions of the disk) that cause permanent corruption on disk. These errors are always transient.
Temporary outage– A disk might become unavailable for a period time, causing I/Os to fail. This situation is typically associated with network-attached devices, though local disks can experience temporary outages as well. These errors might or might not be transient.
Bad or flaky hardware – This situation is a catch-all for the various problems that bad hardware exhibits. This could be consistent I/O errors, faulty transports causing random corruption, or any number of failures. These errors are typically permanent.
Offlined device – If a device is offline, it is assumed that the administrator placed the device in this state because it is presumed faulty. The administrator who placed the device in this state can determine is this assumption is accurate.
Determining exactly what is wrong can be a difficult process. The first step
is to examine the error counts in the zpool status output as follows:
# zpool status -v pool
The errors are divided into I/O errors and checksum errors, both of which
might indicate the possible failure type. Typical operation predicts a very small number
of errors (just a few over long periods of time). If you are
seeing large numbers of errors, then this situation probably indicates impending or complete
device failure. However, the pathology for administrator error can result in large error
counts. The other source of information is the system log. If the log
shows a large number of SCSI or fibre channel driver messages, then this
situation probably indicates serious hardware problems. If no syslog messages are generated, then
the damage is likely transient.
The goal is to answer the following question:
Is another error likely to occur on this device?
Errors that happen only once are considered transient, and do not indicate potential
failure. Errors that are persistent or severe enough to indicate potential hardware failure
are considered “fatal.” The act of determining the type of error is beyond
the scope of any automated software currently available with ZFS, and so much
must be done manually by you, the administrator. Once the determination is made,
the appropriate action can be taken. Either clear the transient errors or replace
the device due to fatal errors. These repair procedures are described in the
next sections.
Even if the device errors are considered transient, it still may have caused
uncorrectable data errors within the pool. These errors require special repair procedures, even
if the underlying device is deemed healthy or otherwise repaired. For more information
on repairing data errors, see Repairing Damaged Data.
Clearing Transient Errors
If the device errors are deemed transient, in that they are unlikely to
effect the future health of the device, then the device errors can
be safely cleared to indicate that no fatal error occurred. To clear error
counters for RAID-Z or mirrored devices, use the zpool clear command. For example:
# zpool clear tank c1t0d0
This syntax clears any errors associated with the device and clears any data
error counts associated with the device.
To clear all errors associated with the virtual devices in the pool, and
clear any data error counts associated with the pool, use the following syntax:
# zpool clear tank
For more information about clearing pool errors, see Clearing Storage Pool Devices.
Replacing a Device in a ZFS Storage Pool
If device damage is permanent or future permanent damage is likely, the device
must be replaced. Whether the device can be replaced depends on the configuration.
Determining if a Device Can Be Replaced
For a device to be replaced, the pool must be in the
ONLINE state. The device must be part of a redundant configuration, or it
must be healthy (in the ONLINE state). If the disk is part of
a redundant configuration, sufficient replicas from which to retrieve good data must exist. If
two disks in a four-way mirror are faulted, then either disk can
be replaced because healthy replicas are available. However, if two disks in a
four-way RAID-Z device are faulted, then neither disk can be replaced because not
enough replicas from which to retrieve data exist. If the device is damaged
but otherwise online, it can be replaced as long as the pool is
not in the FAULTED state. However, any bad data on the device is
copied to the new device unless there are sufficient replicas with good data.
In the following configuration, the disk c1t1d0 can be replaced, and any data
in the pool is copied from the good replica, c1t0d0.
mirror DEGRADED
c1t0d0 ONLINE
c1t1d0 FAULTED
The disk c1t0d0 can also be replaced, though no self-healing of data can
take place because no good replica is available.
In the following configuration, neither of the faulted disks can be replaced. The
ONLINE disks cannot be replaced either, because the pool itself is faulted.
raidz FAULTED
c1t0d0 ONLINE
c2t0d0 FAULTED
c3t0d0 FAULTED
c3t0d0 ONLINE
In the following configuration, either top-level disk can be replaced, though any bad
data present on the disk is copied to the new disk.
c1t0d0 ONLINE
c1t1d0 ONLINE
If either disk were faulted, then no replacement could be performed because the
pool itself would be faulted.
Devices That Cannot be Replaced
If the loss of a device causes the pool to become faulted,
or the device contains too many data errors in an non-redundant configuration, then
the device cannot safely be replaced. Without sufficient redundancy, no good data with which
to heal the damaged device exists. In this case, the only option is
to destroy the pool and re-create the configuration, restoring your data in the
process.
For more information about restoring an entire pool, see Repairing ZFS Storage Pool-Wide Damage.
Replacing a Device in a ZFS Storage Pool
Once you have determined that a device can be replaced, use the
zpool replace command to replace the device. If you are replacing the damaged device
with another different device, use the following command:
# zpool replace tank c1t0d0 c2t0d0
This command begins migrating data to the new device from the damaged device,
or other devices in the pool if it is in a redundant
configuration. When the command is finished, it detaches the damaged device from the
configuration, at which point the device can be removed from the system. If
you have already removed the device and replaced it with a new device
in the same location, use the single device form of the command. For
example:
# zpool replace tank c1t0d0
This command takes an unformatted disk, formats it appropriately, and then begins resilvering
data from the rest of the configuration.
For more information about the zpool replace command, see Replacing Devices in a Storage Pool.
Viewing Resilvering Status
The process of replacing a drive can take an extended period of
time, depending on the size of the drive and the amount of data
in the pool. The process of moving data from one device to another
device is known as resilvering, and can be monitored by using the
zpool status command.
Traditional file systems resilver data at the block level. Because ZFS eliminates the
artificial layering of the volume manager, it can perform resilvering in a much
more powerful and controlled manner. The two main advantages of this feature are
as follows:
ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds, rather than resilvering the entire disk, or complicating matters with “dirty region” logging that some volume managers support. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-Gbyte disk can take seconds if only a few gigabytes of used space is in the pool.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.
To view the resilvering process, use the zpool status command. For example:
# zpool status tank
pool: tank
state: DEGRADED
reason: One or more devices is being resilvered.
action: Wait for the resilvering process to complete.
see: https://www.sun.com/msg/ZFS-XXXX-08
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
replacing DEGRADED 0 0 0 52% resilvered
c1t0d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
In this example, the disk c1t0d0 is being replaced by c2t0d0. This event
is observed in the status output by presence of the replacing virtual device
in the configuration. This device is not real, nor is it possible for
you to create a pool by using this virtual device type. The purpose
of this device is solely to display the resilvering process, and to identify
exactly which device is being replaced.
Note that any pool currently undergoing resilvering is placed in the DEGRADED
state, because the pool cannot provide the desired level of redundancy until the
resilvering process is complete. Resilvering proceeds as fast as possible, though the I/O
is always scheduled with a lower priority than user-requested I/O, to minimize impact
on the system. Once the resilvering is complete, the configuration reverts to the
new, complete, configuration. For example:
# zpool status tank
pool: tank
state: ONLINE
scrub: scrub completed with 0 errors on Thu Aug 31 11:20:18 2006
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
errors: No known data errors
The pool is once again ONLINE, and the original bad disk (c1t0d0) has
been removed from the configuration.