Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions

  




 

 

Several Examples

We'll look at four examples of file processing. In all cases, we'll read simple text files. We'll show some traditional kinds of file processing programs and how those can be implemented using Python.

Reading a Text File

The following program will examine a standard unix password file. We'll use the explicit readline method to show the processing in detail. We'll use the split method of the input string as an example of parsing a line of input.

Example 19.1. readpswd.py

pswd = file( "/etc/passwd", "r" )
for aLine in pswd
    fields= aLine.split( ":" )
    print fields[0], fields[1]
pswd.close()
1

This program creates a file object, pswd, that represents the /etc/passwd file, opened for reading.

2

A file is a sequence of lines. We can use a file in the for statement, and the file object will return each individual line in response to the next method.

3

The input string is split into individual fields using ":" boundaries. Two particular fields are printed. Field 0 is the username and field 1 is the password.

4

Closing the file releases any resources used by the file processing.

For non-unix users, a password file looks like the following:

root:q.mJzTnu8icF.:0:10:God:/:/bin/csh
fred:6k/7KCFRPNVXg:508:10:% Fredericks:/usr2/fred:/bin/csh

Reading a File as a Sequence of Strings

This program shows us that a file is a sequence of individual lines. Because it is an iterable object, the for statement will provide the individual lines.

This file will have a CSV (Comma-Separated Values) file format that we will parse. The csv module does a far better job than this little program. We'll look at that module in the section called “Comma-Separated Values: The csv Module”.

A popular stock quoting service on the Internet will provide CSV files with current stock quotes. The files have comma-separated values in the following format:

stock, lastPrice, date, time, change, openPrice, daysHi, daysLo, volume

The stock, date and time are typically quoted strings. The other fields are numbers, typically in dollars or percents with two digits of precision. We can use the Python eval function on each column to gracefully evaluate each value, which will eliminate the quotes, and transform a string of digits into a floating-point price value. We'll look at dates in Chapter 32, Dates and Times: the time and datetime Modules .

This is an example of the file:

"^DJI",10623.64,"6/15/2001","4:09PM",-66.49,10680.81,10716.30,10566.55,N/A
"AAPL",20.44,"6/15/2001","4:01PM",+0.56,20.10,20.75,19.35,8122800
"CAPBX",10.81,"6/15/2001","5:57PM",+0.01,N/A,N/A,N/A,N/A

The first line shows a quote for an index: the Dow-Jones Industrial average. The trading volume doesn't apply to an index, so it is "N/A". The second line shows a regular stock (Apple Computer) that traded 8,122,800 shares on June 15, 2001. The third line shows a mutual fund. The detailed opening price, day's high, day's low and volume are not reported for mutual funds.

After looking at the results on line, we clicked on the link to save the results as a CSV file. We called it quotes.csv. The following program will open and read the quotes.csv file after we download it from this service.

Example 19.2. readquotes.py

qFile= file( "quotes.csv", "r" )
for q in qFile:
    try:
        stock, price, date, time, change, opPrc, dHi, dLo, vol\
        = q.strip().split( "," )
        print eval(stock), float(price), date, time, change, vol
    except ValueError:
        pass
qFile.close()
1

We open our quotes file, quotes.csv, for reading, creating an object named qFile.

2

We use a for statement to iterate through the sequence of lines in the file.

3

The quotes file typically has an empty line at the end, which splits into zero fields, so we surround this with a try statement. The empty line will raise a ValueError exception, which is caught in the except clause and ignored.

4

Each stock quote, q, is a string. By using the strip operation of the string, we create a new string with excess whitespace characters removed. The string which is created then performs the split ( ',' ) operation to separate the fields into a list. We use multiple assignment to assign each field to a relevant variable. Note that we strip this file into nine fields, leading to a long statement. We put a \ to break the statement into two lines.

5

The name of the stock is a string which includes quotes. In order to gracefully remove the quotes, we use the eval function. The price is a string. We use the float function to convert this string to a proper numeric value for further processing.

Read, Sort and Write

For COBOL expatriates, here's an example that shows a short way to read a file into an in-memory sequence, sort that sequence and print the results. This is a very common COBOL design pattern, and it tends to be rather long and complex in COBOL.

This example looks forward to some slightly more advanced techniques like list sorting. We'll delve into sorting in Chapter 20, Advanced Sequences .

Example 19.3. sortquotes.py

data= []
qFile= file( "quotes.csv", "r" )
for q in qFile:
    fields= tuple( q.strip().split( "," ) )
    if len(fields) == 9: data.append( fields )
qFile.close()
def priceVolume(a,b):
    return cmp(a[1],b[1]) or cmp(a[8],b[8])
data.sort( priceVolume )
for stock, price, date, time, change, opPrc, dHi,  dLo, vol in data:
    print stock, price, date, time, change, volume
1

We create an empty sequence, data, to which we will append tuples created from splitting each line into fields.

2

We create file object that will read all the lines of our CSV-format file.

3

This for loop will set q to each line in the file.

4

The variable field is created by stripping whitespace from the line, q, breaking it up on the "," boundaries into separate fields, and making the resulting sequence of field values into a tuple.

If the line has the expected nine fields, the tuple of fields is appended to the data sequence. Lines with the wrong number of fields are typically the blank lines at the beginning or end of the file.

5

To prepare for the sort, we define a comparison function. This will compare fields 1 and 8, price and volume. This relies on the behavior of the or operator: if the comparison of field 1 is equal, the value of cmp will be 0, which is equivalent to False; so field 8 must be compared.

6

We can then sort the data sequence. The sort function will use our priceVolume function to compare records. This kind of sort is covered in depth in the section called “Advanced List Sorting”.

7

Once the sequence of data elements is sorted, we can then print a report showing our stocks ranked by price, and for stocks of the same price, ranked by volume. We could expand on this by using the % operator to provide a nicer-looking report format.

Reading "Records"

In languages like C or COBOL a "record" or "struct" that describe the contents of a file. The advantage of a record is that the fields have names instead of numeric positions. In Python, we can acheive the same level of clarity using a dict for each line in the file.

For this, we'll download files from a web-based portfolio manager. This portfolio manager gives us stock information in a file called display.csv. Here is an example.

+/-,Ticker,Price,Price Change,Current Value,Links,# Shares,P/E,Purchase Price,
-0.0400,CAT,54.15,-0.04,2707.50,CAT,50,19,43.50,
-0.4700,DD,45.76,-0.47,2288.00,DD,50,23,42.80,
0.3000,EK,46.74,0.30,2337.00,EK,50,11,42.10,
-0.8600,GM,59.35,-0.86,2967.50,GM,50,16,53.90,

This file contains a header line that names the data columns, making processing considerably more reliable. We can use the column titles to create a dict for each line of data. By using each data line along with the column titles, we can make our program quite a bit more flexible. This shows a way of handling this kind of well-structured information.

Example 19.4. readportfolio.py

quotes=open( "display.csv", "rU" )
titles= quotes.next().strip().split( ',' )
invest= 0
current= 0
for q in quotes:
    values= q.strip().split( ',' )
    data= dict( zip(titles,values) )
    print data
    invest += float(data["Purchase Price"])*float(data["# Shares"])
    current += float(data["Price"])*float(data["# Shares"])
print invest, current, (current-invest)/invest
1

We open our portfolio file, display.csv, for reading, creating a file object named quotes.

2

The first line of input, quotes. next, is the set of column titles. We strip any extraneous whitespace characters from this line, creating a new string. We perform a split ( ',' ) to create a list of individual column title strings. This list is saved in the variable titles.

3

We also initialize two counters, invest and current to zero. These will accumulate our initial investment and the current value of this portfolio.

4

We use a for statement to iterate through the remaining lines in quotes file. Each line is assigned to q.

5

Each stock quote, q, is a string. We use the strip operation to remove excess whitespace characters; the string which is created then performs the split ( ',' ) operation to separate the fields into a list. We assign this list to the variable values.

6

We create a dict, data; the column titles in the titles list are the keys. The data fields from the current record, in values are used to fill this dict. The built-in zip function is designed for precisely this situation. This function interleaves values from each list to create a new list of tuples. In this case, we will get a sequence of tuples, each tuple will be a value from titles and the corresponding value from values. This list of 2-tuples creates the dict.

Now, we have access to each piece of data using it's proper column tile. The number of shares is in the column titled "# Shares". We can find this information in data["# Shares"].

7

We perform some simple calculations on each dict. In this case, we convert the purchase price to a number, convert the number of shares to a number and multiply to determine how much we spent on this stock. We accumulate the sum of these products into invest.

We also convert the current price to a number and multiply this by the number of shares to get the current value of this stock. We accumulate the sum of these products into current.

8

When the loop has terminated, we can write out the two numbers, and compute the percent change.


 
 
  Published under the terms of the Open Publication License Design by Interspire