Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




Chapter 33. File Handling Modules

There are a number of operations closely related to file processing. Deleting and renaming files are examples of operations that change the directory information that the operating system maintains to describe a file. Python provides numerous modules for these operating system operations.

We can't begin to cover all of the various ways in which Python supports file handling. However, we can identify the essential modules that may help you avoid reinventing the wheel. Further, these modules can provide you a view of the Pythonic way of working with data from files.

The following modules have features that are essential for supporting file processing. We'll cover selected features of each module that are directly relevant to file processing. We'll present these in the order you'd find them in the Python library documentation.

Chapter 11 - File and Directory Access. Chapter 11 of the Library reference covers many modules which are essential for reliable use of files and directories. We'll look closely at the following modules.


Common pathname manipulations. Use this to split and join full directory path names. This is operating-system neutral, with a correct implementation for all operating systems.


Miscellaneous OS interfaces. This includes parameters of the current process, additional file object creation, manipluations of file descriptors, managing directories and files, managing subprocesses, and additional details about the current operating system.


This module has functions which will iterate over lines from multiple input streams. This allows you to write a single, simple loop that processes lines from any number of input files.


Generate temporary files and temporary file names.


UNIX shell style pathname pattern expansion. Unix shells translate name patterns like *.py into a list of files. This is called globbing. The glob module implements this within Python, which allows this feature to work even in Windows where it isn't supported by the OS itself.


UNIX shell style filename pattern matching. This implements the glob-style rules using *, ? and []. * matches any number of characters, ? matches any single character, [ chars ] encloses a list of allowed characters, [! chars ] encloses a list of disallowed characters.


High-level file operations, including copying and removal. The kinds of things that the shell handles with simple commands like cp or rm become available to a Python program, and are just as simple in Python as they are in the shell.

Chapter 12 - Data Compression and Archiving. Data Compression is covered in Chapter 12 of the Library referece. We'll look closely at the following modules.

tarfile, zipfile

These modules helps you read and write archive files; files which are an archive of a complex directory structure. This includes GNU/Linux tape archive (.tar) files, compressed GZip tar files (.tgz files or .tar.gz files) sometimes called tarballs, and ZIP files.

zlib, gzip, bz2

These modules are all variations on a common theme of reading and writing files which are compressed to remove redundant bytes of data. The zlib and bz2 modules have a more sophisticated interface, allowing you to use compression selectively within a more complex application. The gzip module has a different (and simpler) interface that only applies only to complete files.

Chapter 26 - Python Runtime Services. These modules described in Chapter 26 of the Library reference include some that are used for handling various kinds of files. We'll look closely as just one.


This module has several system-specific parameters and functions, including definitions of the three standard files that are available to every program.

The os.path Module

The os.path module contains more useful functions for managing path and directory names. A serious mistake is to use ordinary string functions with literal strings for the path separators. A Windows program using \ as the separator won't work anywhere else. A less serious mistake is to use os.pathsep instead of the routines in the os.path module.

The os.path module contains the following functions for completely portable path and filename manipulation.

os.path.basename ( path ) → fileName

Return the base filename, the second half of the result created by os.path.split( path )

os.path.dirname ( path ) → dirName

Return the directory name, the first half of the result created by os.path.split( path )

os.path.exists ( path ) → boolean

Return True if the pathname refers to an existing file or directory.

os.path.getatime ( path ) → time

Return the last access time of a file, reported by os.stat. See the time module for functions to process the time value.

os.path.getmtime ( path ) → time

Return the last modification time of a file, reported by os.stat. See the time module for functions to process the time value.

os.path.getsize ( path ) → int

Return the size of a file, in bytes, reported by os.stat.

os.path.isdir ( path ) → boolean

Return True if the pathname refers to an existing directory.

os.path.isfile ( path ) → boolean

Return True if the pathname refers to an existing regular file.

os.path.join ( string , ... ) → path

Join path components using the appropriate path separator.

os.path.split ( path ) → tuple

Split a pathname into two parts: the directory and the basename (the filename, without path separators, in that directory). The result (s, t) is such that os.path.join( s , t ) yields the original path.

os.path.splitdrive ( path ) → tuple

Split a pathname into a drive specification and the rest of the path. Useful on DOS/Windows/NT.

os.path.splitext ( path ) → tuple

Split a path into root and extension. The extension is everything starting at the last dot in the last component of the pathname; the root is everything before that. The result (r, e) is such that r+e yields the original path.

The following example is typical of the manipulations done with os.path.

import sys, os.path
def process( oldName, newName ):
Some Processing...

for oldFile in sys.argv[1:]:
    dir, fileext= os.path.split(oldFile)
    file, ext= os.path.splitext( fileext )
    if ext.upper() == '.RST':
        newFile= os.path.join( dir, file ) + '.HTML'
        print oldFile, '->', newFile
        process( oldFile, newFile )

This program imports the sys and os.path modules.


The process function does something interesting and useful to the input file. It is the real heart of the program.


The for statement sets the variable oldFile to each string (after the first) in the sequence sys.argv.


Each file name is split into the path name and the base name. The base name is further split to separate the file name from the extension. The os.path does this correctly for all operating systems, saving us having to write platform-specific code. For example, splitext correctly handles the situation where a linux file has multiple '.'s in the file name.


The extension is tested to be '.RST'. A new file name is created from the path, base name and a new extension ('.HTML'). The old and new file names are printed and some processing, defined in the process, uses the oldFile and newFile names.

  Published under the terms of the Open Publication License Design by Interspire