When we specified the pattern that split the song list line,
/\s*\|\s*/
, we said we wanted to match a vertical bar surrounded by
an arbitrary amount of whitespace. We now know that the
\s
sequences match a single whitespace character, so it seems likely that
the asterisks somehow mean ``an arbitrary amount.'' In fact, the
asterisk is one of a number of modifiers that allow you to match multiple
occurrences of a pattern.
If
r stands for the immediately preceding regular expression
within a pattern, then:
r
*
|
matches zero or more occurrences of r. |
r
+
|
matches one or more occurrences of r. |
r
?
|
matches zero or one occurrence of r. |
r
{m,n}
|
matches at least ``m'' and at most ``n'' occurrences of r. |
r
{m,}
|
matches at least ``m'' occurrences of r. |
These repetition constructs have a high precedence---they bind only to
the immediately preceding regular expression in the
pattern.
/ab+/
matches an ``a'' followed by one or more ``b''s,
not a sequence of ``ab''s. You have to be careful with the
*
construct too---the pattern /a*/ will match any string; every string
has zero or more ``a''s.
These patterns are called
greedy,
because by default they will
match as much of the string as they can. You can alter this
behavior, and have them match the minimum, by adding a question mark
suffix.
a = "The moon is made of cheese"
|
showRE(a, /\w+/)
|
� |
<<The>> moon is made of cheese
|
showRE(a, /\s.*\s/)
|
� |
The<< moon is made of >>cheese
|
showRE(a, /\s.*?\s/)
|
� |
The<< moon >>is made of cheese
|
showRE(a, /[aeiou]{2,99}/)
|
� |
The m<<oo>>n is made of cheese
|
showRE(a, /mo?o/)
|
� |
The <<moo>>n is made of cheese
|