Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




The sed FAQ
Prev Home Next

3.7. GNU/POSIX extensions to regular expressions

GNU sed supports "character classes" in addition to regular character sets, such as [0-9A-F]. Like regular character sets, character classes represent any single character within a set.

"Character classes are a new feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but where the actual characters themselves can vary from country to country and/or from character set to character set. For example, the notion of what is an alphabetic character differs in the USA and in France." [quoted from the docs for GNU awk v3.1.0.]

Though character classes don't generally conserve space on the line, they help make scripts portable for international use. The equivalent character sets for U.S. users follows:

     [[:alnum:]]  - [A-Za-z0-9]     Alphanumeric characters
     [[:alpha:]]  - [A-Za-z]        Alphabetic characters
     [[:blank:]]  - [ \x09]         Space or tab characters only
     [[:cntrl:]]  - [\x00-\x19\x7F] Control characters
     [[:digit:]]  - [0-9]           Numeric characters
     [[:graph:]]  - [!-~]           Printable and visible characters
     [[:lower:]]  - [a-z]           Lower-case alphabetic characters
     [[:print:]]  - [ -~]           Printable (non-Control) characters
     [[:punct:]]  - [!-/:[email protected][-`{-~]  Punctuation characters
     [[:space:]]  - [ \t\v\f]       All whitespace chars
     [[:upper:]]  - [A-Z]           Upper-case alphabetic characters
     [[:xdigit:]] - [0-9a-fA-F]     Hexadecimal digit characters

Note that [[:graph:]] does not match the space " ", but [[:print:]] does. Some character classes may (or may not) match characters in the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on which C library was used to compile sed. For non-English languages, [[:alpha:]] and other classes may also match high ASCII characters.

The sed FAQ
Prev Home Next

   Reprinted courtesy of Eric Pement. Also available at Design by Interspire