Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions

  




 

 

Mid-Level Protocols: The urllib2 Module

A central piece of the design for the World-Wide Web is the concept of a Uniform Resource Locator (URL) and Uniform Resource Identifier (URI). A URL provides several pieces of information for getting at a piece of data located somewhere on the internet. A URL has several data elements. Here's an example URL: http://www.python.org/download/.

  • A protocol (http)

  • A server (www.python.org)

  • A port number (80 is implied if no other port number is given)

  • A path (download)

  • An operation (browsers use GET or POST, some web services use PUT and DELETE, also)

It turns out that we have a choice of several protocols, making it very pleasant to use URL's. The protocols include

  • FTP - the File Transfer Protocol. This will send a single file from an FTP server to our client. For example, ftp://aeneas.mit.edu/pub/gnu/dictionary/cide.a is the identifier for a specific file.

  • HTTP - the Hypertext Transfer Protocol. Amongst other things that HTTP can do, it can send a single file from a web server to our client. For example, http://www.crummy.com/software/BeautifulSoup/download/BeautifulSoup.py retrieves the current release of the Beautiful Soup module.

  • FILE - the local file protocol. We can use a URL beginning with file:/// to access files on our local computer.

HTTP Interaction. A great deal of information on the World Wide Web is available using simple URI's. In any well-design web site, we can simply GET the resource that the URL identifies.

A large number of transactions are available through HTTP requests. Many web pages provide HTML that will be presented to a person using a browser.

In some cases, a web page provides an HTML form to a person. The person may fill in a form and click a button. This executes an HTTP POST transaction. The urllib2 module allows us to write Python programs which, in effect, fill in the blanks on a form and submit that request to a web server.

Example. By using URL's in our programs, we can write software that reads local files as well as it reads remote files. We'll show just a simple situation where a file of content can be read by our application. In this case, we located a file provided by an HTTP server and an FTP server. We can download this file and read it from our own local computer, also.

As an example, we'll look at the Collaborative International Dictionary of English, CIDE. Here are three places that these files can be found, each using different protocols. However, using the urrllb2 module, we can read and process this file using any protocol and any server.

FTP

ftp://aeneas.mit.edu/pub/gnu/dictionary/cide.a This URL describes the aeneas.mit.edu server that has the CIDE files, and will respond to the FTP protocol.

HTTP

http://ftp.gnu.org/gnu/gcide/gcide-0.46/cide.a This URL names the ftp.gnu.org server that has the CIDE files, and responds to the HTTP protocol.

FILE

file:///Users/slott/Documents/dictionary/cide.a This URL names a file on my local computer.

Example 36.4. urlreader.py

#!/usr/bin/env python
"""Get the "A" section of the GNU CIDE Collaborative International Dictionary of English
"""
import urllib2

#baseURL= "ftp://aeneas.mit.edu/pub/gnu/dictionary/cide.a"
baseURL= "http://ftp.gnu.org/gnu/gcide/gcide-0.46/cide.a"
#baseURL= "file:///Users/slott/Documents/dictionary/cide.a"

dictXML= urllib2.urlopen( baseURL, "r" )
print len(dictXML.read())
dictXML.close()
1

We import the urllib2 module.

2

We name the URL's we'll be reading. In this case, any of these URL's will provide the file.

3

When we open the URL, we can read the file.


 
 
  Published under the terms of the Open Publication License Design by Interspire