Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Chapter 19. Files

Programs often deal with external data; data outside of volatile primary memory. This external data could be persistent data on a file system or transient data on an input-output device. Most operating systems provide a simple, uniform interface to external data via files. In the section called “File Semantics”, we provide an overview of the semantics of files. We cover the most important of Python's built-in functions for working with files in the section called “Built-in Functions”. In the section called “File Methods”, we describe some method functions of file objects.

Files are a deep, deep subject. We'll touch on several modules that are related to managing files in Part IV, “Components, Modules and Packages”. These include Chapter 33, File Handling Modules and Chapter 34, File Formats: CSV, Tab, XML, Logs and Others .

File Semantics

In one sense a file is a container for a sequence of bytes. A more useful view, however, is that a file is a container of data objects, encoded as a sequence of bytes. Files can be kept on persistent but slow devices like disks. Files can also be presented as a stream of bytes flowing through a network interface. Even the user's keyboard can be processed as if it was a file; in this case the file forces our software to wait until the person types something.

Our operating systems use the abstraction of file as a way to unify access to a large number of devices and operating system services. In the Linux world, all external devices, plus a large number of in-memory data structures are accessible through the file interface. The wide variety of things with file-like interfaces is a consequence of how Unix was originally designed. Since the number and types of devices that will be connected to a computer is essentially infinite, device drivers were designed as a simple, flexible plug-in to the operating system. For more information on the ubiquity of files, see the section called “Additional Background”.

Files include more than disk drives and network interfaces. Kernel memory, random data generators, semaphores, shared memory blocks, and other things have file interfaces, even though they aren't — strictly speaking — devices. Our OS applies the file abstraction to many things. Python, similarly, extends the file interface to include certain kinds of in-memory buffers.

All GNU/Linux operating systems make all devices available through a standard file-oriented interface. Windows makes most devices available through a reasonably consistent file interface. Python's file class provides access to the OS file API's, giving our applications the same uniform access to a variety of devices.

Important

The terminology is sometimes confusing. We have physical files on our disk, the file abstraction in our operating system, and file objects in our Python program. Our Python file object makes use of the operating system file API's which, in turn, manipulate the files on a disk.

We'll try to be clear, but with only one overloaded word for three different things, this chapter may sometimes be confusing.

We rarely have a reason to talk about a physical file on a disk. Generally we'll talk about the OS abstraction of file and the Python class of file.

Standard Files. Consistent with POSIX standards, all Python programs have three files available: sys.stdin, sys.stdout, sys.stderr. These files are used by certain built-in statements and functions. The print statement, for example, writes to sys.stdout. The input and raw_input functions both write their prompt to sys.stdout and read their input from sys.stdin.

These standard files are always available, and Python assures that they are handled consistently by all operating systems. The sys module makes these files available for explict use. Newbies may want to check File Redirection for Newbies for some additional notes on these standard files.

File Organization and Structure. Some operating systems provide support for a large variety of file organizations. Different file organizations include different record termination rules, possibly with keys, and possibly fixed length records. The POSIX standard, however, considers a file to be nothing more than a sequence of bytes. It becomes entirely the job of the application program, or libraries outside the operating system to impose any organization on those bytes.

The basic file objects in Python consider a file to be a sequence of characters. (These can be ASCII or Unicode characters.) The characters can be processed as a sequence of variable length lines; each line terminated with a newline character. Files moved from a Windows environment may contain lines with an extraneous ASCII carriage return character (\r), which is easily removed with the string strip method.

Ordinary text files can be managed directly with the built-in file objects and their methods for reading and writing lines of data. We will cover this basic text file processing in the rest of this chapter.


 
 
  Published under the terms of the Open Publication License Design by Interspire