
| Linuxtopia Contents |
Perl Tutorial - An Introduction to Perl - Pattern Matching and Regex |
| [an error occurred while processing this directive] |
Contents12. MatchingMatching involves use of patterns called "regular expressions". This, as you will see, leads to Perl Paradox Number Four: Regular expressions aren't. See sections 13 and 14 of the Quick Reference.The =~ operator performs pattern matching and substitution. For example, if: $s = 'One if by land and two if by sea';then: if ($s =~ /if by la/) {print "YES"}
else {print "NO"}prints "YES", because the string $s matches the
simple constant pattern "if by la". if ($s =~ /one/) {print "YES"}
else {print "NO"}prints "NO", because the string does not match the
pattern. However, by adding the "i" option to ignore case, we would get a "YES"
from the following: if ($s =~ /one/i) {print "YES"}
else {print "NO"}
Patterns can contain a mind-boggling variety of special directions that facilitate very general matching. See Perl Reference Guide section 13, Regular Expressions. For example, a period matches any character (except the "newline" \n character). if ($x =~ /l.mp/) {print "YES"}would print "YES" for $x = "lamp",
"lump", "slumped", but not for $x = "lmp" or "less amperes".
Parentheses () group pattern elements. An asterisk * means that the preceding character, element, or group of elements may occur zero times, one time, or many times. Similarly, a plus + means that the preceding element or group of elements must occur at least once. A question mark ? matches zero or one times. So: /fr.*nd/ matches "frnd", "friend", "front and back"
/fr.+nd/ matches "frond", "friend", "front and back"
but not "frnd".
/10*1/ matches "11", "101", "1001", "100000001".
/b(an)*a/ matches "ba", "bana", "banana", "banananana"
/flo?at/ matches "flat" and "float"
but not "flooat"
Square brackets [ ] match a class of single characters. [0123456789] matches any single digit
[0-9] matches any single digit
[0-9]+ matches any sequence of one or more digits
[a-z]+ matches any lowercase word
[A-Z]+ matches any uppercase word
[ab n]* matches the null string "", "b",
any number of blanks, "nab a banana"
[^...] matches characters that are not "...": [^0-9] matches any non-digit character. Curly braces allow more precise specification of repeated fields. For example
Patterns float, unless anchored. The caret ^ (outside [ ]) anchors a pattern to the beginning, and dollar-sign $ anchors a pattern at the end, so: /at/ matches "at", "attention", "flat", & "flatter"
/^at/ matches "at" & "attention" but not "flat"
/at$/ matches "at" & "flat", but not "attention"
/^at$/ matches "at" and nothing else.
/^at$/i matches "at", "At", "aT", and "AT".
/^[ \t]*$/ matches a "blank line", one that contains nothing
or any combination of blanks and tabs.
The Backslash. Other characters simply match themselves, but the
characters /10.2/ matches "10Q2", "1052", and "10.2"
/10\.2/ matches "10.2" but not "10Q2" or "1052"
/\*+/ matches one or more asterisks
/A:\\DIR/ matches "A:\DIR"
/\/usr\/bin/ matches "/usr/bin"If a backslash preceeds an
alphanumeric character, this sequence takes a special meaning, typically a short
form of a [ ] character class. For example, \d is the same as the
[0-9] digits character class. /[-+]?\d*\.?\d*/ is the same as
/[-+]?[0-9]*\.?\d*/Either of the above matches decimal numbers:
"-150", "-4.13", "3.1415", "+0000.00", etc.
A simple A vertical bar | specifies "or". if ($answer =~ /^y|^yes|^yeah/i ) {
print "Affirmative!";
}prints "Affirmative!" for $answer equal to "y" or "yes" or "yeah" (or
"Y", "YeS", or "yessireebob, that's right").
|