Python - Comma-Separated Values: The csv Module

Comma-Separated Values: The `csv` Module
	Chapter 34. File Formats: CSV, Tab, XML, Logs and Others

Comma-Separated Values: The `csv` Module

Often, we have data that is in Comma-Separated Value (CSV) format. This used by many spreadsheets and is a widely-used standard for data files.

In the section called “Several Examples” we parsed CSV files using simple string manipulations. The csv module does a far better job at parsing and creating CSV files than the programming we showed in those examples.

About CSV Files

CSV files are text files organized around data that has rows and columns. This format is used to exchange data between spread-sheet programs or databases. A CSV file uses a number of punctuation rules to encode the data.

Each row is delimited by a line-ending sequence of characters. This is usually the ASCII sequence \r\n. Since this may not be the default way to process text files on your platform, you have to open files using the "rb" and "wb" modes.
Within a row, columns are delimited by a ,. To handle the situation where a column's data contains a ,, the column data may be quoted; surrounded by "'s. If the column contains a ", there are two common rules used. One CSV dialect uses an escape character, usually \ for \". The other dialect uses double ""'s.

In the ideal case, a CSV file will have the same number of columns in each row, and the first row will be column titles. Almost as pleasant is a file without column titles, but with a known sequence of columns. In the more complex cases, the number of columns per row varies.

The CSV Module

The CSV module provides you with readers or writers; these are objects which use an existing file object, created with the file or open function. A CSV reader will read a file, parsing the commas and quotes, delivering you the data elements of each row in a sequence or mapping. A CSV writer will create a file, adding the necessary commas and quotes to create a valid CSV file.

Module-Level Constructors. The following constructors within the csv module are used to create a reader, DictReader, writer or DictWriter.

csv. reader( csvfile ) → reader: Creates a reader object which can parse the given file, returning a sequence of values for each line of the file. This can be used as follows: rdr= csv.reader( open( "file.csv", "rb" ) ). The csvfile can be any iterable object.
csv. writer( csvfile ) → writer: Creates a writer object which can format a sequence of values and write them to a line of the file. This can be used as follows: wtr= csv.writer( open( "file.csv", "wb" ) ). The csvfile can be any object which supports write.
csv. DictReader( csvfile , 〈 fieldnames 〉) → DictReader: Creates a DictReader object which can parse the given file, returning a dictionary of values for each line of the file. The dictionary keys are typically the first line of the file. You can, optionally, provide the field names if they are not the first line of the file. The csvfile can be any iterable object.
csv. DictWriter( csvfile , fieldnames ) → DictWriter: Creates a DictWriter object which can format a dictionary of values and write them to a line of the file. You must provide a sequence of field names which is used to format each individual dictionary entry. The csvfile can be any object which supports write.

Reader Functions. The following functions within a reader (or DictReader) object will read and parse the CSV file. In these function definitions cr is a csv.reader or csv.DictReader.

cr.(next) → sequence: Reads the next line of the source file, parses it, and returns a sequence (for reader) or dictionary (for DictReader) of the individual column values.

cr. line_num → number: Returns the line number of the source file.

Writer Functions. The following functions with a writer (or DictWriter) object will format and write a CSV file. In these function definitions cw is a csv.writer or csv.DictWriter.

cw.(writerow row ): Writes the next lines of the destination file from the given sequence (for writer) or dictionary (for DictWriter).
cw. ,(writerows rowList ): Writes the next lines of the destination file with each sequence (for writer) or dictionary (for DictWriter) from the list, rowList.

Basic CSV Reading

The basic CSV reader processing treats each line of the file as data. This is typical for files which lack column titles, or files which have such a complex format that special parsing and analysis is required. In some cases, a file has a simple, regular format with a single row of column titles, which can be processed by a special reader we'll look at below.

We'll revise the readquotes.py program from the section called “Reading a File as a Sequence of Strings”. This will properly handle all of the quoting rules, eliminating a number of irritating problems with the example in the previous chapter.

Example 34.1. readquotes2.py

import csv
qFile= file( "quotes.csv", "rb" )
csvReader= csv.reader( qFile )
for q in csvReader:
    try:
        stock, price, date, time, change, opPrc, dHi, dLo, vol = q
        print stock, float(price), date, time, change, vol
    except ValueError:
        pass
qFile.close()

	We open our quotes file, `quotes.csv`, for reading, creating an object named `qFile`.
	We create a `csv` reader object which will parse this file for us, transforming each line into a sequence of individual column values.
	We use a for statement to iterate through the sequence of lines in the file.
	In the unlikely event of an invalid number for the price, we surround this with a try statement. The invalid number line will raise a `ValueError` exception, which is caught in the except clause and quietly ignored.
	Each stock quote, `q`, is a sequence of column values. We use multiple assignment to assign each field to a relevant variable. We don't need to strip whitespace, split the string, or handle quotes; the reader already did this.
	Since the price is a string, we use the `float` function to convert this string to a proper numeric value for further processing.

Consistent Columns as Dictionaries

In some cases, you have a simple, regular file with a single line of column titles. In this case, you can transform each line of the file into a dictionary. The key for each field is the column title. This can lead to programs which are more clear, and more flexible. The flexibility comes from not assuming a specific order to the columns.

We'll revise the readportfolio.py program from the section called “Reading "Records"”. This will properly handle all of the quoting rules, eliminating a number of irritating problems with the example in the previous chapter. It will make use of the column titles in the file.

Example 34.2. readportfolio2.py

import csv
quotes=open( "display.csv", "rb" )
csvReader= csv.DictReader( quotes )
invest= 0
current= 0
for data in csvReader:
    print data
    invest += float(data["Purchase Price"])*float(data["# Shares"])
    current += float(data["Price"])*float(data["# Shares"])
print invest, current, (current-invest)/invest

	We open our portfolio file, `display.csv`, for reading, creating a file object named `quotes`.
	We create a `csv` `DictReader` object from our `quotes` file. This will read the first line of the file to get the column titles; each subsequent line will be parsed and transformed into a dictionary.
	We initialize two counters, `invest` and `current` to zero. These will accumulate our initial investment and the current value of this portfolio.
	We use a for statement to iterate through the lines in `quotes` file. Each line is parsed, and the column titles are used to create a dictionary, which is assigned to `data`.
	Each stock quote, `q`, is a `string`. We use the `strip` operation to remove excess whitespace characters; the `string` which is created then performs the `split`( `','` ) operation to separate the fields into a `list`. We assign this `list` to the variable `values`.
	We perform some simple calculations on each `dict`. In this case, we convert the purchase price to a number, convert the number of shares to a number and multiply to determine how much we spent on this stock. We accumulate the sum of these products into `invest`. We also convert the current price to a number and multiply this by the number of shares to get the current value of this stock. We accumulate the sum of these products into `current`.
	When the loop has terminated, we can write out the two numbers, and compute the percent change.

Writing CSV Files

The most general case for writing CSV is shown in the following example. Assume we've got a list of objects, named someList. Further, let's assume that each object has three attributes: this, that and aKey.

import csv
myFile= open( "
result
", "wb" )
wtr= csv.writer( myFile )
for someObject in 
someList
:
    aRow= [ someData.this, someData.that, someData.aKey, ]
    wtr.writerow( aRow )
myFile.close()

In this case, we assemble the list of values that becomes a row in the CSV file.

In some cases we can provide two methods to allow our classes to participate in CSV writing. We can define a csvRow method as well as a csvHeading method. These methods will provide the necessary tuples of heading or data to be written to the CSV file.

For example, let's look at the following class definition for a small database of sailboats. This class shows how the csvRow and csvHeading methods might look.

class Boat( object ):
    csvHeading= [ "name", "rig", "sails" ]
    def __init__( aBoat, name, rig, sails ):
        self.name= name
        self.rig= rig
        self.sails= sails
    def __str__( self ):
        return "%s (%s, %r)" % ( self.name, self.rig, self.sails )
    def csvRow( self ):
        return [ self.name, self.rig, self.sails ]

Including these methods in our class definitions simplifies the loop that writes the objects to a CSV file. Instead of building each row as a list, we can do the following: wtr.writerow( someData.csvRow() ).

Here's an example that leverages each object's internal dictionary (__dict__) to dump objects to a CSV file.

db= [
    Boat( "KaDiMa", "sloop", ( "main", "jib" ) ),
    Boat( "Glinda", "sloop", ( "main", "jib", "spinnaker" ) ),
    Boat( "Eillean Glas", "sloop", ( "main", "genoa" ) ),
    ]

test= file( "boats.csv", "wb" )
wtr= csv.DictWriter( test, Boat.csvHeading )
wtr.writerow( dict( zip( Boat.csvHeading, Boat.csvHeading ) ) )
for d in db:
    wtr.writerow( d.__dict__ )
test.close()


Chapter 34. File Formats: CSV, Tab, XML, Logs and Others		Tab Files: Nothing Special