Now that you have a basic understanding of files, it is time to learn more advanced
things about them.
The Real Nature of Files: Hard Links and Inodes
Each file on your system is represented by an inode (for Information
Node; pronounced ``eye-node''). An inode contains all the information about
the file. However, the inode is not directly visible. Instead, each inode is
linked into the filesystem by one or more hard links. Hard links contain
the name of the file and the inode number. The inode contains the file itself,
i.e., the location of the information being stored on disk, its access permissions,
the file type, and so on. The system can find any inode if it has the inode
number.
A single file can have more than one hard link. What this means is that multiple
filenames refer to the same file (that is, they are associated with the same
inode number). However, you can't make hard links across filesystems: All hard
references to a particular file (inode) must be on the same filesystem. This
is because each filesystem has its own set of inodes, and there can be duplicate
inode numbers on different filesystems.
Because all hard links to a given inode refer to the same file, you
can make changes to the file, referring to it by one name, and then see those
changes when referring to it by a different name. Try this:
cd; echo "hello" > firstlink
cd to your home directory and create a file called firstlink
containing the word ``hello.'' What you've actually done is redirect the output
of echo (echo just echoes back what you give to it), placing
the output in firstlink. See the chapter on shells for a full explanation.
cat firstlink
Confirms the contents of firstlink.
ln firstlink secondlink
Creates a hard link: secondlink now points to the same inode as firstlink.
cat secondlink
Confirms that secondlink is the same as firstlink.
ls -l
Notice that the number of hard links listed for firstlink and secondlinkfiles!inodes
is 2.
echo "change" >> secondlink
This is another shell redirection trick (don't worry about the details). You've
appended the word ``change'' to secondlink. Confirm this with cat
secondlink.
cat firstlink
firstlink also has the word ``change'' appended! That's because firstlink
and secondlink refer to the same file. It doesn't matter what
you call it when you change it.
chmod a+rwx firstlink
Changes permissions on firstlink. Enter the command ls -l
to confirm that permissions on secondlink were also changed. This means
that permissions information is stored in the inode, not in links.
rm firstlink
Deletes this link. This is a subtlety of rm. It really removes links,
not files. Now type ls -l and notice that secondlink is still
there. Also notice that the number of hard links for secondlink has
been reduced to one.
rm secondlink
Deletes the other link. When there are no more links to a file, Linux deletes
the file itself, that is, its inode.
All files work like this - even special types of files such as devices (e.g.
/dev/hda).
A directory is simply a list of filenames and inode numbers, that is, a list
of hard links. When you create a hard link, you're just adding a name-number
pair to a directory. When you delete a file, you're just removing a hard link
from a directory.
Types of Files
One detail we've been concealing up to now is that the Linux kernel considers
nearly everything to be a file. That includes directories and devices: They're
just special kinds of files.
As you may remember, the first character of an ls -l display represents
the type of the file. For an ordinary file, this will be simply -.
Other possibilities include the following:
Symbolic links (also called ``symlinks'' or ``soft links'') are the other
kind of link besides hard links. A symlink is a special file that ``points
to'' a hard link on any mounted filesystem. When you try to read the contents
of a symlink, it gives the contents of the file it's pointing to rather than
the contents of the symlink itself. Because directories, devices, and other
symlinks are types of files, you can point a symlink at any of those things.
So a hard link is a filename and an inode number. A file is really an inode:
a location on disk, file type, permissions mode, etc. A symlink is an inode
that contains the name of a hard link. A symlink pairs one filename with a second
filename, whereas a hard link pairs a filename with an inode number.
All hard links to the same file have equal status. That is, one is as good as
another; if you perform any operation on one, it's just the same as performing
that operation on any of the others. This is because the hard links all refer
to the same inode. Operations on symlinks, on the other hand, sometimes affect
the symlink's own inode (the one containing the name of a hard link) and sometimes
affect the hard link being pointed to.
There are a number of important differences between symlinks and hard links.
Symlinks can cross filesystems. This is because they contain complete filenames,
starting with the root directory, and all complete filenames are unique. Because
hard links point to inode numbers, and inode numbers are unique only within
a single filesystem, they would be ambiguous if the filesystem wasn't known.
You can make symlinks to directories, but you can't make hard links to them.
Each directory has hard links - its listing in its parent directory, its .
entry, and the .. entry in each of its subdirectories - but to impose
order on the filesystem, no other hard links to directories are allowed. Consequently,
the number of files in a directory is equal to the number of hard links to that
directory minus two (you subtract the directory's name and the . link).
comparing!hard links and symlinks
You can only make a hard link to a file that exists, because there must be an
inode number to refer to. However, you can make a symlink to any filename, whether
or not there actually is such a filename.
Removing a symlink removes only the link. It has no effect on the linked-to
file. Removing the only hard link to a file removes the file.
Try this:
cd; ln -s /tmp/me MyTmp
cd to your home directory. ln with the -s option
makes a symbolic link - in this case, one called MyTmp that points
to the filename /tmp/me.
The date and user/group names will be different for you, of course. Notice that
the file type is l, indicating that this is a symbolic link. Also notice
the permissions: Symbolic links always have these permissions. If you attempt
to chmod a symlink, you'll actually change the permissions on the file
being pointed to.
chmod 700 MyTmp
You will get a No such file or directory error, because the file /tmp/me
doesn't exist. Notice that you could create a symlink to it anyway.
mkdir /tmp/me
Creates the directory /tmp/me.
chmod 700 MyTmp
Should work now.
touch MyTmp/myfile
Creates a file in MyTmp.
ls /tmp/me
The file is actually created in /tmp/me.
rm MyTmp
Removes the symbolic link. Notice that this removes the link, not what it points
to. Thus you use rm not rmdir.
rm /tmp/me/myfile; rmdir /tmp/me
Lets you clean up after yourself. symlinks!removing
Device files refer to physical or virtual devices on your system, such as your
hard disk, video card, screen, and keyboard. An example of a virtual device
is the console, represented by /dev/console.
There are two kinds of devices:character and block. Character devices
can be accessed one character at a time. Remember the smallest unit of data
that can be written to or read from the device is a character (byte).
Block devices must be accessed in larger units called blocks, which
contain a number of characters. Your hard disk is a block device.
You can read and write device files just as you can from other kinds of files,
though the file may well contain some strange incomprehensible-to-humans gibberish.
Writing random data to these files is probably a bad idea. Sometimes it's useful,
though. For example, you can dump a postscript file into the printer device
/dev/lp0 or send modem commands to the device file for the appropriate
serial port.
/dev/null is a special device file that discards anything you write
to it. If you don't want something, throw it in /dev/null. It's essentially
a bottomless pit. If you read /dev/null, you'll get an end-of-file
(EOF) character immediately. /dev/zero is similar, except that you
read from it you get the \0 character (not the same
as the number zero).
A named pipe is a file that acts like a pipe. You put something into the file,
and it comes out the other end. Thus it's called a FIFO, or First-In-First-Out,
because the first thing you put in the pipe is the first thing to come out the
other end.
If you write to a named pipe, the process that is writing to the pipe doesn't
terminate until the information being written is read from the pipe. If you
read from a named pipe, the reading process waits until there's something to
read before terminating. The size of the pipe is always zero: It doesn't store
data, it just links two processes like the shell |. However, because
this pipe has a name, the two processes don't have to be on the same command
line or even be run by the same user.
You can try it by doing the following:
cd; mkfifo mypipe
Makes the pipe.
echo "hello" > mypipe &
Puts a process in the background that tries to write ``hello'' to the pipe.
Notice that the process doesn't return from the background; it is waiting for
someone to read from the pipe.
cat mypipe
At this point, the echo process should return, because cat
read from the pipe, and the cat process will print hello.
Sockets are similar to pipes, only they work over the network. This is how your
computer does networking. You may have heard of ``WinSock,'' which is sockets
for Windows.
We won't go into these further because you probably won't have occasion to use
them unless you're programming. However, if you see a file marked with type
son your computer, you know what it is.
The Linux kernel makes a special filesystem available, which is mounted under
/proc on Debian systems. This is a ``pseudo-filesystem'' because
it doesn't really exist on any of your physical devices.
The proc filesystem contains information about the system and running
processes. Some of the ``files'' in /proc are reasonably understandable
to humans (try typing cat /proc/meminfo or cat /proc/cpuinfo);
others are arcane collections of numbers. Often, system utilities use these
to gather information and present it to you in a more understandable way.
People frequently panic when they notice one file in particular - /proc/kcore
- which is generally huge. This is (more or less) a copy of the contents of
your computer's memory. It's used to debug the kernel. It doesn't actually exist
anywhere, so don't worry about its size.
If you want to know about all the things in /proc, type man
5 proc.
Sometimes you may want to copy one directory to another location. Maybe you're
adding a new hard disk and you want to copy /usr/local to it. There
are several ways you can do this.
The first is to use cp. The command cp -a will tell cp
to do a copy preserving all the information it can. So, you might use
cp -a /usr/local /destination
However, there are some things that cp -a won't catch13.1. So, the best way to do a large copy job is to chain two tar commands
together, like so:
tar -cSpf - /usr/local | tar -xvSpf - -C /destination
The first tar command will archive the existing directory and pipe
it to the second. The second command will unpack the archive into the location
you specify with -C.