On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com

How To Guides
Virtualization
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions

## 12. Matching

Matching involves use of patterns called "regular expressions". This, as you will see, leads to Perl Paradox Number Four: Regular expressions aren't. See sections 13 and 14 of the Quick Reference.

The =~ operator performs pattern matching and substitution. For example, if:

`    \$s = 'One if by land and two if by sea';`
then:
```    if (\$s =~ /if by la/) {print "YES"}

else {print "NO"}```
prints "YES", because the string \$s matches the simple constant pattern "if by la".
```    if (\$s =~ /one/) {print "YES"}

else {print "NO"}```
prints "NO", because the string does not match the pattern. However, by adding the "i" option to ignore case, we would get a "YES" from the following:
```    if (\$s =~ /one/i) {print "YES"}

else {print "NO"}```

Patterns can contain a mind-boggling variety of special directions that facilitate very general matching. See Perl Reference Guide section 13, Regular Expressions. For example, a period matches any character (except the "newline" \n character).

`    if (\$x =~ /l.mp/) {print "YES"}`
would print "YES" for \$x = "lamp", "lump", "slumped", but not for \$x = "lmp" or "less amperes".

Parentheses () group pattern elements. An asterisk * means that the preceding character, element, or group of elements may occur zero times, one time, or many times. Similarly, a plus + means that the preceding element or group of elements must occur at least once. A question mark ? matches zero or one times. So:

```    /fr.*nd/  matches "frnd", "friend", "front and back"

/fr.+nd/  matches "frond", "friend", "front and back"

but not "frnd".

/10*1/    matches "11", "101", "1001", "100000001".

/b(an)*a/ matches "ba", "bana", "banana", "banananana"

/flo?at/  matches "flat" and "float"

but not "flooat"```

Square brackets [ ] match a class of single characters.

```    [0123456789] matches any single digit

[0-9]        matches any single digit

[0-9]+       matches any sequence of one or more digits

[a-z]+       matches any lowercase word

[A-Z]+       matches any uppercase word

[ab n]*      matches the null string "", "b",

any number of blanks, "nab a banana"```

[^...] matches characters that are not "...":

`    [^0-9]       matches any non-digit character.`

Curly braces allow more precise specification of repeated fields. For example `[0-9]{6}` matches any sequence of 6 digits, and `[0-9]{6,10}` matches any sequence of 6 to 10 digits.

Patterns float, unless anchored. The caret ^ (outside [ ]) anchors a pattern to the beginning, and dollar-sign \$ anchors a pattern at the end, so:

```    /at/         matches "at", "attention", "flat", & "flatter"

/^at/        matches "at" & "attention" but not "flat"

/at\$/        matches "at" & "flat", but not "attention"

/^at\$/       matches "at" and nothing else.

/^at\$/i      matches "at", "At", "aT", and "AT".

/^[ \t]*\$/   matches a "blank line", one that contains nothing

or any combination of blanks and tabs.```

The Backslash. Other characters simply match themselves, but the characters `+?.*^\$()[]{}|\` and usually `/` must be escaped with a backslash `\` to be taken literally. Thus:

```    /10.2/       matches "10Q2", "1052", and "10.2"

/10\.2/      matches "10.2" but not "10Q2" or "1052"

/\*+/        matches one or more asterisks

/A:\\DIR/    matches "A:\DIR"

/\/usr\/bin/ matches "/usr/bin"```
If a backslash preceeds an alphanumeric character, this sequence takes a special meaning, typically a short form of a [ ] character class. For example, \d is the same as the `[0-9]` digits character class.
```    /[-+]?\d*\.?\d*/      is the same as

/[-+]?[0-9]*\.?\d*/```
Either of the above matches decimal numbers: "-150", "-4.13", "3.1415", "+0000.00", etc.

A simple `\s` specifies "white space", the same as the character class `[ \t\n\r\f]` (blank, tab, newline, carriage return,form-feed). A character may be specified in hexadecimal as a `\x` followed by two hexadecimal digits; \x1b is the ESC character.

A vertical bar | specifies "or".

```    if (\$answer =~ /^y|^yes|^yeah/i ) {

print "Affirmative!";

}```
prints "Affirmative!" for \$answer equal to "y" or "yes" or "yeah" (or "Y", "YeS", or "yessireebob, that's right").

[an error occurred while processing this directive]