Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Chapter 31. Complex Strings: the re Module

There are a number of related problems when processing strings. When we get strings as input from files, we need to recognize the input as meaningful. Once we're sure it's in the right form, we need to parse the inputs, sometimes we'll ahve to convert some parts into numbers (or other objects) for further use.

For example, a file may contain lines which are supposed to be like "Birth Date: 3/8/85". We may need to determine if a given string has the right form. Then, we may need to break the string into individual elements for date processing.

We can accomplish these recognition, parsing and conversion operations with the re module in Python. A regular expression (RE) is a rule or pattern used for matching strings. It differs from the fairly simple “wild-card” rules used by many operating systems for naming files with a pattern. These simple operating system file-name matching rules are embodied in two simpler packages: fnmatch and glob.

We'll look at the semantics of a regular expression in the section called “Semantics”. We'll look at the syntax for defining a RE in the section called “Creating a Regular Expression”. In the section called “Using a Regular Expression” we'll put the regular expression to use.

Semantics

One way to look at regular expressions is as a production rule for constructing strings. In principle, such a rule could describe an infinite number of strings. The real purpose is not to enumerate all of the strings described by the production rule, but to match a candidate string against the production rule to see if the rule could have constructed the given string.

For example, a rule could be "aba". All strings of the form "aba" would match this simple rule. This rule produces only a single string. Determining a match between a given string and the one string produced by this rule is pretty simple.

A more complex rule could be "ab*a". The b* means zero or more copies of b. This rule produces an infinite set of strings including "aa", "aba", "abba", etc. It's a little more complex to see if a given string could have been produced by this rule.

The Python re module includes Python constructs for creating regular expressions (REs), matching candidate strings against RE's, and examining the details of the substrings that match. There is a lot of power and subtlety to this package. A complete treatment is beyond the scope of this book.


 
 
  Published under the terms of the Open Publication License Design by Interspire