File Module Exercises

  1. Source Lines of Code. One measure of the complexity of an application is the count of the number of lines of source code. Often, this count discards comment lines. We'll write an application to read Python source files, discarding blank lines and lines beginning with #, and producing a count of source lines.

    We'll develop a function to process a single file. We'll use the glob module to locate all of the *.py files in a given directory.

    Develop a fileLineCount( name ) which opens a file with the given name and examines all of the lines of the file. Each line should have strip applied to remove leading and trailing spaces. If the resulting line is of length zero, it was effectively blank, and can be skipped. If the resulting line begins with # the line is entirely a comment, and can be skipped. All remaining lines should be counted, and fileLineCount( name ) returns this count.

    Develop a directoryLineCount( path ) function which uses the path with the glob.glob to expand all matching file names. Each file name is processed with fileLineCount( name ) to get the number of non-comment source lines. Write this to a tab-delimited file; each line should have the form “ filename \t lines ”.

    For a sample application, look in your Python distribution for Lib/idelib/*.py.

  2. Summarize a Tab-Delimited File. The previous exercise produced a file where each line has the form “ filename \t lines ”. Read this tab-delimited file, producing a nicer-looking report that has column titles, file and line counts, and a total line count at the end of the report.

  3. File Processing Pipeline. The previous two exercises produced programs which can be part of a processing pipeline. The first exercise should p should produce it's output on sys.stdout. The second exercise should gather it's input from sys.stdin. Once this capability is in place, the pipeline can be invoked using a command like the following:

    $ python lineCounter.py | python lineSummary.py