Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




The sed FAQ
Prev Home Next

6.7.4. Word boundaries

GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define the boundary between a "word character" and a nonword character. A word character fits the regex "[A-Za-z0-9_]". Note: a word character includes the underscore "_" but not the hyphen, probably because the underscore is permissible as a label in sed and in other scripting languages. (In gsed103, a word character did NOT include the underscore; it included alphanumerics only.)

These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16, sedmod) and '\b' and '\B' (gsed only). Note that the boundary symbols do not represent a character, but a position on the line. Word boundaries are used with literal characters or character sets to let you match (and delete or alter) whole words without affecting the spaces or punctuation marks outside of those words. They can only be used in a "/pattern/" address or in the LHS of a 's/LHS/RHS/' command. The following table shows how these symbols may be used in HHsed and GNU sed. Sedmod matches the syntax of HHsed.

      Match position      Possible word boundaries   HHsed   GNU sed
      start of word    [nonword char]^[word char]      \<    \< or \b
      end of word         [word char]^[nonword char]   \>    \> or \b
      middle of word      [word char]^[word char]     none      \B
      outside of word  [nonword char]^[nonword char]  none      \B

In ssed, the symbols '\<' and '\>' lose their special meaning when the -R switch is used to invoke Perl-style expressions. However, the identical meaning of '\<' and '\>' can be obtained through these nonmatching, zero-width assertions:

       (?<!\w)(?=\w)  and   (?<=\w)(?!\w)
The sed FAQ
Prev Home Next

   Reprinted courtesy of Eric Pement. Also available at Design by Interspire