Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Most Useful Library Sections

This section will overview about 50 of the most useful libary modules. These modules are proven technology, widely used, heavily tested and constantly improved. The time spent learning these modules will reduce the time it takes you to build an application that does useful work.

We'll dig more deeply into just a few of these modules in subsequent chapters.

Lessons Learned

As a consultant, we've seen far too many programmers writing modules which overlap these. There are two causes: ignorance and hubris. In this section, we hope to tackle the ignorance cause.

Python includes a large number of pre-built modules. The more you know about these, the less programming you have to do.

Hubris sometimes comes from the feeling that the library module doesn't fit our unique problem well-enough to justify studying the library module. In many cases you can't read the library module to see what it really does. In Python, the documentation is only an introduction; you're encouraged to actually read the library module.

We find that hubris is most closely associated with calendrical calcuations. It isn't clear why programmers invest so much time and effort writing buggy calendrical calculations. Python provides many modules for dealing with times, dates and the calendar.

4. String Services. The String Services modules contains string-related functions or classes. See Chapter 12, Strings for more information on strings.

re

The re module is the core of text pattern recognition and processing. A regular expression is a formula that specifies how to recognize and parse strings. The re module is described in detail in Chapter 31, Complex Strings: the re Module .

struct

The avowed purpose of the struct module is to allow a Python program to access C-language API's; it packs and unpacks C-language struct object. It turns out that this module can also help you deal with files in packed binary formats.

difflib

The difflib module contains the essential algorithms for comparing two sequences, usually sequences of lines of text. This has algorithms similar to those used by the Unix diff command (the Window COMP command).

StringIO, cStringIO

There are two variations on StringIO which provide file-like objects that read from or write to a string buffer. The StringIO module defines the class StringIO, from which subclasses can be derived. The cStringIO module provides a high-speed C-language implementation that can't be subclassed.

Note that these modules have atypical mixed-case names.

textwrap

This is a module to format plain text. While the word-wrapping task is sometimes handled by word processors, you may need this in other kinds of programs. Plain text files are still the most portable, standard way to provide a document.

codecs

This module has hundreds of text encodings. This includes the vast array of Windows code pages and the Macintosh code pages. The most commonly used are the various Unicode schemes (utf-16 and utf-8). However, there are also a number of codecs for translating between strings of text and arrays of bytes. These schemes include base-64, zip compression, bz2 compression, various quoting rules, and even the simple rot_13 substitution cipher.

5. Data Types. The Data Types modules implement a number of widely-used data structures. These aren't as useful as sequences, dictionaries or strings -- which are built-in to the language. These data types include dates, general collections, arrays, and schedule events. This module includes modules for searching lists, copying structures or producing a nicely formatted output for a complex structure.

datetime

The datetime handles details of the calendar, including dates and times. Additionally, the time module provides some more basic functions for time and date processing. We'll cover both modules in detail in Chapter 32, Dates and Times: the time and datetime Modules .

These modules mean that you never need to attempt your own calendrical calculations. One of the important lessons learned in the late 90's was that many programmers love to tackle calendrical calculations, but their efforts had to be tested and reworked prior to January 1, 2000, because of innumerable small problems.

calendar

This module contains routines for displaying and working with the calendar. This can help you determine the day of the week on which a month starts and ends; it can count leap days in an interval of years, etc.

collections

This package contains two data types, and is likely to grow with future releases of Python. One tye is the deque -- a "double-ended queue" -- that can be used as stack (LIFO) or queue (FIFO). The other class is a specialized dictionary, defaultdict, which can return a default value instead of raising an exception for missing keys.

bisect

The bisect module contains the bisect function to search a sorted list for a specific value. It also contains the insort fucntion to insert an item into a list maintaining the sorted order. This module performs faster than simply appending values to a list and calling the sort method of a list. This module's source is instructive as a lesson in well-crafted algorithms.

array

The array module gives you a high-performance, highly compact collection of values. It isn't as flexible as a list or a tuple, but it is fast and takes up relatively little memory. This is helpful for processing media like image or sound files.

sched

The sched module contains the definition for the scheduler class that builds a simple task scheduler. When a scheduler is contructed, it is given two user-supplied functions: one returns the “time” and the other executes a “delay” waiting for the time to arrive. For real-time scheduling, the time module time and sleep functions can be used. The scheduler has a main loop that calls the supplied time function and compares the current time with the time for scheduled tasks; it then calls the supplied a delay function for the difference in time. It runs the scheduled task, and calls the delay function with a duration of zero to release any resources.

Clearly, this simple algorithm is very versatile. By supplying custom time functions that work in minutes instead of seconds, and a delay function that does additional background processing while waiting for the scheduled time, a flexible task manager can be constructed.

copy

The copy module contains functions for making copies of complex objects. This module contains a function to make a shallow copy of an object, where any objects contained within the parent are not copied, but references are inserted in the parent. It also contains a function to make a deep copy of an object, where all objects contained within the parent object are duplicated.

Note that Python's simple assignment only creates a variable which is a label (or reference) to an object, not a duplicate copy. This module is the easiest way to create an independent copy.

pprint

The pprint module contains some useful functions like pprint.pprint for printing easy-to-read representations of nested lists and dictionaries. It also has a PrettyPrinter class from which you can make subclasses to customize the way in which lists or dictionaries or other objects are printed.

6. Numeric and Mathematical Modules. These modules include more specialized mathemathical functions and some additional numeric data types.

decimal

The decimal module provides decimal-based arithmetic which correctly handles significant digits, rounding and other features common to currency amounts.

math

The math module was covered in the section called “The math Module”. It contains the math functions like sine, cosine and square root.

random

The random module was covered in the section called “The math Module”.

7. Internet Data Handling. The Internet Data Handling modules contain a number of handy algorithms. A great deal of data is defined by the Internet Request for Comments (RFCs). Since these effectively standardize data on the Internet, it helps to have modules already in place to process this standardized data. Most of these modules are specialized, but a few have much wider application.

mimify, base64, binascii, binhex, quopri, uu

These modules all provide various kinds of conversions, ecapes or quoting so that binary data can be manipulated as safe, universal ASCII text. The number of these modules reflects the number of different clever solutions to the problem of packing binary data into ordinary email messages.

8. Structured Markup Processing Tools. The following modules contain algorithms for working with structured markup: Standard General Markup Lanaguage (SGML), Hypertext Markup Language (HTML) and Extensible Markup Language (XML). These modules simplify the parsing and analysis of complex documents. In addition to these modules, you may also need to use the CSV module for processing files; that's in chapter 9, File Formats.

htmllib

Ordinary HTML documents can be examined with the htmllib module. This module based on the sgmllib module. The basic HTMLParser class definition is a superclass; you will typically override the various functions to do the appropriate processing for your application.

One problem with parsing HTML is that browsers — in order to conform with the applicable standards — must accept incorrect HTML. This means that many web sites publish HTML which is tolerated by browsers, but can't easily be parsed by htmllib. When confronted with serious horrows, consider downloading the Beautiful Soup module. This handles erroneous HTML more gracefully than htmllib.

xml.sax, xml.dom, xml.dom.minidom

The xml.sax and xml.dom modules provide the classes necessary to conveniently read and process XML documents. A SAX parser separates the various types of content and passes a series of events the handler objects attached to the parser. A DOM parser decomposes the document into the Document Object Model (DOM).

The xml.dom module contains the classes which define an XML document's structure. The xml.dom.minidom module contains a parser which creates a DOM object.

Additionally, there is a Miscellaneous Module (in chapter 33) that goes along with these.

formatter

The formatter module can be used in conjunction with the HTML and XML parsers. A formatter instance depends on a writer instance that produces the final (formatted) output. It can also be used on its own to format text in different ways.

9. File Formats. These are modules for reading and writing files in a few of the amazing variety of file formats that are in common use. In addition to these common formats, modules in chapter 8, Structured Markup Processig Tools are also important.

csv

The csv module helps you parse and create Comma-Separated Value (CSV) data files. This helps you exchange data with many desktop tools that produce or consume CSV files. We'll look at this in the section called “Comma-Separated Values: The csv Module”.

ConfigParser

Configuration files can take a number of forms. The simplest approach is to use a Python module as the configuration for a large, complex program. Sometimes configurations are encoded in XML. Many Windows legacy programs use .INI files. The ConfigParser can gracefully parse these files. We'll look at this in the section called “Property Files and Configuration (or.INI) Files: The ConfigParser Module”.

10. Cryptographic Services. These modules aren't specifically encryption modules. Many popular encryption algorithms are protected by patents. Often, encryption requires compiled modules for performance reasons. These modules compute secure digests of messages using a variety of algorithms.

hashlib, hmac, md5, sha

Compute a secure hash or digest of a message to ensure that it was not tampered with. MD5, for example, is often used for validating that a downloaded file was recieved correctly and completely.

11. File and Directory Access. We'll look at many of these modules in Chapter 33, File Handling Modules . These are the modules which are essential for handling data files.

os, os.path

The os and os.path modules are critical for creating portable Python programs. The popular operating systems (Linux, Windows and MacOS) each have different approaches to the common services provided by an operating system. A Python program can depend on os and os.path modules behaving consistently in all environments.

One of the most obvious differences among operating systems is the way that files are named. In particular, the path separator can be either the POSIX standard /, or the windows \. Additionally, the Mac OS Classic mode can also use :. Rather than make each program aware of the operating system rules for path construction, Python provides the os.path module to make all of the common filename manipulations completely consistent.

fileinput

The fileinput module helps your progam process a large number of files smoothly and simply.

glob, fnmatch

The glob and fnmatch modules help a Windows program handle wild-card file names in a manner consistent with other operating systems.

shutil

The shutil module provides shell-like utilities for file copy, file rename, directory moves, etc. This module lets you write short, effective Pytthon programs that do things that are typically done by shell scripts.

Why use Python instead of the shell? Python is far easier to read, far more efficient, and far more capable of writing moderately sophisticated programs. Using Python saves you from having to write long, painful shell scripts.

12. Data Compression and Archiving. These modules handle the various file compression algorithms that are available. We'll look at these modules in Chapter 33, File Handling Modules .

tarfile, zipfile

These two modules create archive files, which contain a number of files that are bound together. The TAR format is not compressed, where the ZIP format is compressed. Often a TAR archive is compressed using GZIP to create a .tar.gz archive.

zlib, gzip, bz2

These modules are different compression algorithms. They all have similar features to compress or uncompress files.

13. Data Persistence. There are several issues related to making objects persistent. In Chapter 9 of the Python Reference, there are several modules that help deal with files in various kinds of formats. We'll talk about these modules in detail in Chapter 34, File Formats: CSV, Tab, XML, Logs and Others .

There are several additional techniques for managing persistence. We can "pickle" or "shelve" an object. In this case, we don't define our file format in detail, instead we leave it to Python to persist our objects.

We can map our objects to a relational database. In this case, we'll use the SQL language to define our storage, create and retrieve our objects.

pickle, shelve

The pickle and shelve modules are used to create persistent objects; objects that persist beyond the one-time execution of a Python program. The pickle module produces a serial text representation of any object, however complex; this can reconstitute an object from its text representation. The shelve module uses a dbm database to store and retrieve objects. The shelve module is not a complete object-oriented database, as it lacks any transaction management capabilities.

sqlite3

This module provides access to the SQLite relational database. This database provides a significant subset of SQL language features, allowing us to build a relational database that's compatible with products like MySQL or Postgres.

14. Generic Operating System Services. The following modules contain basic features that are common to all operating systems. Most of this commonality is acheived by using the C standard libraries. By using this module, you can be assured that your Python application will be portable to almost any operating system.

os, os.path

These modules provide access to a number of operating system features. The os module provides control over Processes, Files and Directories. We'll look at os and os.path in the section called “The os Module” and the section called “The os.path Module”.

time

The time module provides basic functions for time and date processing. Additionally datetime handles details of the calendar more gracefully than time does. We'll cover both modules in detail in Chapter 32, Dates and Times: the time and datetime Modules .

Having modules like datetime and time mean that you never need to attempt your own calendrical calculations. One of the important lessons learned in the late 90's was that many programmers love to tackle calendrical calculations, but their efforts had to be tested and reworked because of innumerable small problems.

getopt, optparse

A well-written program makes use of the command-line interface. It is configured through options and arguments, as well as properties files. We'll cover the getopt, optparse and glob modules in Chapter 35, Programs: Standing Alone .

logging

Often, you want a simple, standardized log for errors as well as debugging information. We'll look at logging in detail in the section called “Log Files: The logging Module”.

18. Internet Protocols and Support. The following modules contain algorithms for responding the several of the most common Internet protocols. These modules greatly simplify developing applications based on these protocols.

cgi

The cgi module is used for web server applications invoked as CGI scripts. This allows you to put Python programming in the cgi-bin directory. When the web server invokes the CGI script, the Python interpreter is started and the Python script is executed.

urllib, urllib2, urlparse

These modules allow you to write relatively simple application programs which open a URL as if it were a standard Python file. The content can be read and perhaps parsed with the HTML or XML parser modules, described below. The urllib module depends on the httplib, ftplib and gopherlib modules. It will also open local files when the scheme of the URL is file:. The urlparse module includes the functions necessary to parse or assemble URL's. The urllib2 module handles more complex situations where there is authentication or cookies involved.

httplib, ftplib, gopherlib

The httplib, ftplib and gopherlib modules include relatively complete support for building client applications that use these protocols. Between the html module and httplib module, a simple character-oriented web browser or web content crawler can be built.

poplib, imaplib

The poplib and imaplib modules allow you to build mail reader client applications. The poplib module is for mail clients using the Post-Office Protocol, POP3 (RFC 1725), to extract mail from a mail server. The imaplib module is for mail servers using the Internet Message Access Protocol, IMAP4 (RFC 2060) to manage mail on an IMAP server.

nntplib

The nntplib module allows you to build a network news reader. The newsgroups, like comp.lang.python, are processed by NNTP servers. You can build special-purpose news readers with this module.

SocketServer

The SocketServer module provides the relatively advanced programming required to create TCP/IP or UDP/IP server applications. This is typically the core of a stand-alone application server.

SimpleHTTPServer, CGIHTPPServer, BaseHTTPServer

The SimpleHTTPServer and CGIHTTPServer modules rely on the basic BaseHTTPServer and SocketServer modules to create a web server. The SimpleHTTPServer module provides the programming to handle basic URL requests. The CGIHTTPServer module adds the capability for running CGI scripts; it does this with the fork and exec functions of the os module, which are not necessarily supported on all platforms.

asyncore, asynchat

The asyncore (and asynchat) modules help to build a time-sharing application server. When client requests can be handled quickly by the server, complex multi-threading and multi-processing aren't really necessary. Instead, this module simply dispatches each client communication to an appropriate handler function.

22. Program Frameworks. We'll talk about a number of program-related issues in Chapter 35, Programs: Standing Alone and Chapter 36, Programs: Clients, Servers, the Internet and the World Wide Web . Much of this goes beyond the standard Python library. Within the library are two modules that can help you create large, sophisticated command-line application programs.

cmd

The cmd module contains a superclass useful for building the main command-reading loop of an interactive program. The standard features include printing a prompt, reading commands, providing help and providing a command history buffer. A subclass is expected to provide functions with names of the form do_command. When the user enters a line beginning with command, the appropriate do_command function is called.

shlex

The shlex module can be used to tokenize input in a simple language similar to the Linux shell languages. This module defines a basic shlex class with parsing methods that can separate words, quotes strings and comments, and return them to the requesting program.

26. Python Runtime Services. The Python Runtime Services modules are considered to support the Python runtime environment. These can be divided into two groups: those that are an interface into the Python interpreter, and those that are generally useful for programming. The interpreter interface allows us to peer under the hood at how Python works internally. The programming category is more generally useful, and includes sys, pickle, and shelve.

sys

The sys module contains execution context information. It has the command-line arguments (in sys.argv) used to start the Python interpreter. It has the standard input, output and error file definitions. It has functions for retrieving exception information. It defines the platform, byte order, module search path and other basic facts. This is typically used by a main program to get run-time environment information.


 
 
  Published under the terms of the Open Publication License Design by Interspire