|
|
|
|
21.2 Major File Systems in Linux
Unlike two or three years ago, choosing a file system for a Linux
system is no longer a matter of a few seconds (Ext2 or ReiserFS?). Kernels
starting from 2.4 offer a variety of file systems from which to choose. The
following is an overview of how these file systems basically work and which
advantages they offer.
It is very important to bear in mind that there may be no file system
that best suits all kinds of applications. Each file system has its
particular strengths and weaknesses, which must be taken into account.
Even the most sophisticated file system cannot replace a reasonable
backup strategy, however.
The terms data integrity and data
consistency, when used in this chapter, do not refer to the
consistency of the user space data (the data your application writes to its
files). Whether this data is consistent must be controlled by the
application itself.
IMPORTANT: Setting Up File Systems
Unless stated otherwise in this chapter, all the steps required to
set up or change partitions and file systems can be performed using
YaST.
21.2.1 ReiserFS
Officially one of the key features of the 2.4 kernel release, ReiserFS has
been available as a kernel patch for 2.2.x SUSE kernels since
version 6.4. ReiserFS was designed by Hans Reiser and the Namesys
development team. It has proven itself to be a powerful alternative to
Ext2. Its key assets are better disk space utilization, better disk
access performance, and faster crash recovery.
ReiserFS's strengths, in more detail, are:
- Better Disk Space Utilization
-
In ReiserFS, all data is organized in a structure called
B*-balanced tree. The tree structure
contributes to better disk space utilization because small files can be
stored directly in the B* tree leaf
nodes instead of being stored elsewhere and just maintaining a pointer
to the actual disk location. In addition to that, storage is not
allocated in chunks of 1 or 4 kB, but in portions of the exact size
needed. Another benefit lies in the dynamic allocation of inodes. This
keeps the file system more flexible than traditional file systems, like
Ext2, where the inode density must be specified at file system creation
time.
- Better Disk Access Performance
-
For small files, file data and
stat_data (inode) information are often stored next to each
other. They can be read with a single disk I/O operation, meaning that
only one access to disk is required to retrieve all the information
needed.
- Fast Crash Recovery
-
Using a journal to keep track of recent metadata changes makes a file
system check a matter of seconds, even for huge file systems.
- Reliability through Data Journaling
-
ReiserFS also supports data journaling and ordered data modes similar to
the concepts outlined in the Ext3 section, Section 21.2.3,
Ext3. The default mode is
data=ordered, which ensures both data and metadata
integrity, but uses journaling only for metadata.
21.2.2 Ext2
The origins of Ext2 go back to the early days of Linux history. Its
predecessor, the Extended File System, was implemented in April 1992 and
integrated in Linux 0.96c. The Extended File System underwent a number of
modifications and, as Ext2, became the most popular Linux file system for
years. With the creation of journaling file systems and their astonishingly
short recovery times, Ext2 became less important.
A brief summary of Ext2's strengths might help understand why
it was—and in some areas still is—the favorite Linux file
system of many Linux users.
- Solidity
-
Being quite an old-timer, Ext2 underwent many
improvements and was heavily tested. This may be the reason why people
often refer to it as rock-solid. After a system outage when the file
system could not be cleanly unmounted, e2fsck starts to analyze the file
system data. Metadata is brought into a consistent state and pending
files or data blocks are written to a designated directory (called
lost+found). In contrast to journaling file
systems, e2fsck analyzes the entire file system and not just the
recently modified bits of metadata. This takes significantly longer than
checking the log data of a journaling file system. Depending on file
system size, this procedure can take half an hour or more. Therefore, it
is not desirable to choose Ext2 for any server that needs high
availability. However, because Ext2 does not maintain a journal and uses
significantly less memory, it is sometimes faster than other file
systems.
- Easy Upgradability
-
The code for Ext2 is the strong foundation on which Ext3 could
become a highly-acclaimed next-generation file system. Its reliability
and solidity were elegantly combined with the advantages of a journaling
file system.
21.2.3 Ext3
Ext3 was designed by Stephen Tweedie. Unlike all other next-generation file
systems, Ext3 does not follow a completely new design principle. It is
based on Ext2. These two file systems are very closely related to each
other. An Ext3 file system can be easily built on top of an Ext2 file
system. The most important difference between Ext2 and Ext3 is that Ext3
supports journaling. In summary, Ext3 has three major advantages to offer:
- Easy and Highly Reliable Upgrades from Ext2
-
Because Ext3 is based on the Ext2 code and shares its on-disk format as
well as its metadata format, upgrades from Ext2 to Ext3 are incredibly
easy. Unlike transitions to other journaling file systems, such as
ReiserFS or XFS, which can be quite tedious (making backups of the
entire file system and recreating it from scratch), a transition to Ext3
is a matter of minutes. It is also very safe, because recreating an
entire file system from scratch might not work flawlessly. Considering
the number of existing Ext2 systems that await an upgrade to a
journaling file system, you can easily figure out why Ext3 might be of
some importance to many system administrators. Downgrading from Ext3 to
Ext2 is as easy as the upgrade. Just perform a clean unmount of the Ext3
file system and remount it as an Ext2 file system.
- Reliability and Performance
-
Some other journaling file systems follow the
metadata-only journaling approach. This means your
metadata is always kept in a consistent state, but the same cannot be
automatically guaranteed for the file system data itself. Ext3 is
designed to take care of both metadata and data. The degree of
care can be customized. Enabling Ext3 in the
data=journal mode offers maximum security (data
integrity), but can slow down the system because both metadata and data
are journaled. A relatively new approach is to use the
data=ordered mode, which ensures both data and metadata
integrity, but uses journaling only for metadata. The file system driver
collects all data blocks that correspond to one metadata update. These
data blocks are written to disk before the metadata is updated. As a
result, consistency is achieved for metadata and data without
sacrificing performance. A third option to use is
data=writeback, which allows data to be written into
the main file system after its metadata has been committed to the
journal. This option is often considered the best in performance. It
can, however, allow old data to reappear in files after crash and
recovery while internal file system integrity is maintained. Unless you
specify something else, Ext3 is run with the
data=ordered default.
21.2.4 Converting an File System into
To convert an file system to
, proceed as follows:
-
Create an Ext3 journal by running tune2fs -j
as root. This creates an Ext3 journal with the default
parameters.
To decide yourself how large the
journal should be and on which device it should reside, run
tune2fs -J instead together with the
desired journal options size= and
device=. More information about the tune2fs program is
available in the tune2fs manual page.
-
To ensure that the Ext3 file system is recognized as such, edit
the file /etc/fstab
as root, changing the file system type specified for the corresponding
partition from ext2 to ext3.
The change takes effect after the next reboot.
-
To boot a root file system set up as an Ext3 partition, include the
modules ext3 and jbd in the
initrd. To do this, edit
/etc/sysconfig/kernel as root, adding
ext3 and jbd to the
INITRD_MODULES variable. After saving the changes, run the
mkinitrd command. This builds a new initrd and
prepares it for use.
21.2.5 Reiser4
Right after kernel 2.6 had been released, the family of journaling file
systems was joined by another member: Reiser4. Reiser4 is fundamentally
different from its predecessor ReiserFS (version 3.6). It introduces the
concept of plug-ins to tweak the file system functionality and a finer
grained security concept.
- Fine Grained Security Concept
-
In designing Reiser4, its developers put an emphasis on the
implementation of security-relevant features. Reiser4 therefore comes
with a set of dedicated security plug-ins. The most important one
introduces the concept of file items. Currently, file
access controls are defined per file. If there is a large file
containing information relevant to several users, groups, or applications,
the access rights had be fairly imprecise to include all parties
involved. In Reiser4, you can split those files into smaller portions
(the items ). Access rights can then be set for
each item and each user separately, allowing a much more precise file
security management. A perfect example would be
/etc/passwd. To date, only root can read and edit the file while
non-root users only get read
access to this file. Using the item concept of Reiser4, you could split
this file in a set of items (one item per user) and allow users or
applications to modify their own data but not
access other users' data. This concept adds both to security and
flexibility.
- Extensibility through Plug-Ins
-
Many file system functions and external functions normally used by a
file system are implemented as plug-ins in Reiser4. These plug-ins can
easily be added to the base system. You no longer need to recompile the
kernel or reformat the hard disk to add new functionalities to your
file system.
- Better File System Layout through Delayed Allocation
-
Like XFS, Reiser4 supports delayed allocation. See Section 21.2.6,
XFS. Using delayed allocation even
for metadata can result in better overall layout.
21.2.6 XFS
Originally intended as the file system for their IRIX OS, SGI
started XFS
development in the early 1990s. The idea behind XFS was to create a
high-performance 64-bit journaling file system to meet the extreme
computing challenges of today. XFS is very good at manipulating large files
and performs well on high-end hardware. However, even XFS has a
drawback. Like ReiserFS, XFS takes great care of metadata
integrity, but less of data integrity.
A quick review of XFS's key features explains why it may prove a
strong competitor for other journaling file systems in high-end
computing.
- High Scalability through the Use of Allocation Groups
-
At the creation time of an XFS file system, the block device underlying
the file system is divided into eight or more linear regions of equal
size. Those are referred to as allocation groups.
Each allocation group manages its own inodes and free disk space.
Practically, allocation groups can be seen as file systems in a file
system. Because allocation groups are rather independent of each other,
more than one of them can be addressed by the kernel simultaneously.
This feature is the key to XFS's great scalability. Naturally, the
concept of independent allocation groups suits the needs of
multiprocessor systems.
- High Performance through Efficient Management of Disk Space
-
Free space and inodes are handled by B+ trees inside the allocation groups. The
use of B+ trees greatly contributes
to XFS's performance and scalability. XFS uses delayed
allocation. It handles allocation by breaking the process
into two pieces. A pending transaction is stored in RAM and the
appropriate amount of space is reserved. XFS still does not decide where
exactly (speaking of file system blocks) the data should be stored. This
decision is delayed until the last possible moment. Some short-lived
temporary data may never make its way to disk, because it may be
obsolete by the time XFS decides where actually to save it. Thus XFS
increases write performance and reduces file system fragmentation.
Because delayed allocation results in less frequent write events than in
other file systems, it is likely that data loss after a crash during a
write is more severe.
- Preallocation to Avoid File System Fragmentation
-
Before writing the data to the file system, XFS
reserves (preallocates) the free space needed for a
file. Thus, file system fragmentation is greatly reduced. Performance is
increased because the contents of a file are not distributed all over
the file system.
|
|
|