Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Thinking in Java
Prev Contents / Index Next

Creating regular expressions

You can begin learning regular expressions with a useful subset of the possible constructs. A complete list of constructs for building regular expressions can be found in the javadocs for the Pattern class for package java.util.regex.

Characters

B

The specific character B

\xhh

Character with hex value 0xhh

\uhhhh

The Unicode character with hex representation 0xhhhh

\t

Tab

\n

Newline

\r

Carriage return

\f

Form feed

\e

Escape

The power of regular expressions begins to appear when defining character classes. Here are some typical ways to create character classes, and some predefined classes:

Character Classes

.

Represents any character

[abc]

Any of the characters a, b, or c (same as a|b|c)

[^abc]

Any character except a, b, and c (negation)

[a-zA-Z]

Any character a through z or A through Z (range)

[abc[hij]]

Any of a,b,c,h,i,j (same as a|b|c|h|i|j) (union)

[a-z&&[hij]]

Either h, i, or j (intersection)

\s

A whitespace character (space, tab, newline, formfeed, carriage return)

\S

A non-whitespace character ([^\s])

\d

A numeric digit [0-9]

\D

A non-digit [^0-9]

\w

A word character [a-zA-Z_0-9]

\W

A non-word character [^\w]

If you have any experience with regular expressions in other languages, you’ll immediately notice a difference in the way backslashes are handled. In other languages, “\\” means “I want to insert a plain old (literal) backslash in the regular expression. Don’t give it any special meaning.” In Java, “\\” means “I’m inserting a regular expression backslash, so the following character has special meaning.” For example, if you want to indicate one or more word characters, your regular expression string will be “\\w+”. If you want to insert a literal backslash, you say “\\\\”. However, things like newlines and tabs just use a single backslash: “\n\t”.

What’s shown here is only a sampling; you’ll want to have the java.util.regex.Pattern JDK documentation page bookmarked or on your “Start” menu so you can easily access all the possible regular expression patterns.

Logical Operators

XY

X followed by Y

X|Y

X or Y

(X)

A capturing group. You can refer to the ith captured group later in the expression with \i


Boundary Matchers

^

Beginning of a line

$

End of a line

\b

Word boundary

\B

Non-word boundary

\G

End of the previous match

As an example, each of the following represent valid regular expressions, and all will successfully match the character sequence "Rudolph":

Rudolph
[rR]udolph
[rR][aeiou][a-z]ol.*
R.*


Thinking in Java
Prev Contents / Index Next

 
 
   Reproduced courtesy of Bruce Eckel, MindView, Inc. Design by Interspire