The sed (Stream Editor) FAQ - 6.7.3. Special syntax in REs

The sed FAQ
Prev	Home	Next

6.7.3. Special syntax in REs

A. HHsed v1.5 (by Howard Helman)

The following expressions can be used for /RE/ addresses or in the LHS side of a substitution:

      +    - 1 or more occurrences of previous RE: same as \{1,\}
      \<   - boundary between nonword and word character
      \>   - boundary between word and nonword character

The following expressions can be used for /RE/ addresses or on either side of a substitution:

      \a   - bell         (ASCII 07, 0x07)
      \b   - backspace    (ASCII 08, 0x08)
      \e   - escape       (ASCII 27, 0x1B)
      \f   - formfeed     (ASCII 12, 0x0C)
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
      \r   - return       (ASCII 13, 0x0D)
      \t   - tab          (ASCII 09, 0x09)
      \v   - vertical tab (ASCII 11, 0x0B)
      \xHH - the ASCII character corresponding to 2 hex digits HH.

B. sed v1.6 (by Walter Briscoe)

sed v1.6 accepts every expression supported by sed v1.5 (above), plus the following elements, which can also used in the RHS of a substitution (in addition to those listed above):

      \\~  - insert replacement pattern defined in last s/// command
             (must be used alone in the RHS)
      \l   - change next element to lower case
      \L   - change remaining elements to lower case
      \u   - change next element to upper case
      \U   - change remaining elements to upper case
      \e   - end case conversion of next element
      \E   - end case conversion of remaining elements
      $0   - insert pattern space BEFORE the substitution
      $1-$9 - match Nth word on the pattern space

C. sedmod v1.0 (by Hern Chen)

The following expressions can be used for /RE/ addresses in the LHS of a substitution:

      +    - 1 or more occurrences of previous RE: same as \{1,\}
      \a   - any alphanumeric: same as [a-zA-Z0-9]
      \A   - 1 or more alphas: same as \a+
      \d   - any digit: same as [0-9]
      \D   - 1 or more digits: same as \d+
      \h   - any hex digit: same as [0-9a-fA-F]
      \H   - 1 or more hexdigits: same as \h+
      \l   - any letter: same as [A-Za-z]
      \L   - 1 or more letters: same as \l+
      \n   - newline      (read as 2 bytes, 0D 0A or ^M^J, in DOS)
      \s   - any whitespace character: space, tab, or vertical tab
      \S   - 1 or more whitespace chars: same as \s+
      \t   - tab          (ASCII 09, 0x09)
      \<   - boundary between nonword and word character
      \>   - boundary between word and nonword character

The following expressions can be used in the RHS of a substitution. "Elements" refer to \1 .. \9, &, $0, or $1 .. $9:

      &    - insert regexp defined on LHS
      \e   - end case conversion of next element
      \E   - end case conversion of remaining elements
      \l   - change next element to lower case
      \L   - change remaining elements to lower case
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
      \t   - tab          (ASCII 09, 0x09)
      \u   - change next element to upper case
      \U   - change remaining elements to upper case
      $0   - insert the original pattern space
      $1-$9 - match Nth word on the pattern space

D. UnixDos sed

The following expressions can be used in text, LHS, and RHS:

      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)

E. GNU sed v1.03 (by Frank Whaley)

When used with the -x (extended) switch on the command line, or when '#x' occurs as the first line of a script, Whaley's gsed103 supports the following expressions in both the LHS and RHS of a substitution:

      \|      matches the expression on either side
      ?       0 or 1 occurrences of previous RE: same as \{0,1\}
      +       1 or more occurrence of previous RE: same as \{1,\}
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
      \b      backspace        (BS, Ctrl-H, 0x08)
      \f      formfeed         (FF, Ctrl-L, 0x0C)
      \n      newline          (LF, Ctrl-J, 0x0A)
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
      \bBBB   binary char, where BBB are 1-8 binary digits, [0-1]
      \dDDD   decimal char, where DDD are 1-3 decimal digits, [0-9]
      \oOOO   octal char, where OOO are 1-3 octal digits, [0-7]
      \xHH    hex char, where HH are 1-2 hex digits, [0-9A-F]

In normal mode, with or without the -x switch, the following escape sequences are also supported in regex addressing or in the LHS of a substitution:

      \`      matches beginning of pattern space: same as /^/
      \'      matches end of pattern space: same as /$/
      \B      boundary between 2 word or 2 nonword characters
      \w      any nonword character [*BUG!* should be a word char]
      \W      any nonword character: same as /[^A-Za-z0-9]/
      \<      boundary between nonword and word char
      \>      boundary between word and nonword char

F. GNU sed v2.05 and higher versions

The following expressions can be used for /RE/ addresses or in the LHS side of a substitution:

      \`  - matches the beginning of the pattern space (same as "^")
      \'  - matches the end of the pattern space (same as "$")
      \?  - 0 or 1 occurrence of previous character: same as \{0,1\}
      \+  - 1 or more occurrences of previous character: same as \{1,\}
      \|  - matches the string on either side, e.g., foo\|bar
      \b  - boundary between word and nonword chars (reversible)
      \B  - boundary between 2 word or between 2 nonword chars
      \n  - embedded newline (usable after N, G, or similar commands)
      \w  - any word character: [A-Za-z0-9_]
      \W  - any nonword char: [^A-Za-z0-9_]
      \<  - boundary between nonword and word character
      \>  - boundary between word and nonword character

On \b, \B, \<, and \>, see section 6.7.4 ("Word boundaries"), below.

Undocumented -r switch:

Beginning with version 3.02, GNU sed has an undocumented -r switch (undocumented till version 4.0), activating Extended Regular Expressions in the following manner:

       ?      -  0 or 1 occurrence of previous character
       +      -  1 or more occurrences of previous character
       |      -  matches the string on either side, e.g., foo|bar
       (...)  -  enable grouping without backslash
       {...}  -  enable interval expression without backslash

When the -r switch (mnemonic: "regular expression") is used, prefix these symbols with a backslash to disable the special meaning.

Escape sequences:

Beginning with version 3.02.80, the following escape sequences can now be used on both sides of a "s///" substitution:

      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
      \f      formfeed         (FF, Ctrl-L, 0x0C)
      \n      newline          (LF, Ctrl-J, 0x0A)
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
      \oNNN   a character with the octal value NNN
      \dNNN   a character with the decimal value NNN
      \xHH    a character with the hexadecimal value HH

Note that GNU sed also supports "character classes", a POSIX extension to regexes, described in section 3.7, above.

G. sed 4.0 and higher versions

The following expressions can be used in the RHS of a substitution.

      \e   - end case conversion
      \l   - change next character to lower case
      \L   - change remaining text to lower case
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
      \t   - tab          (ASCII 09, 0x09)
      \u   - change next character to upper case
      \U   - change remaining text to upper case

In addition, GNU sed 4.0 can modify the way ^ and $ are interpreted, so that ^ can also match an empty string after a newline character, and $ can also match an empty string before a newline character (to do this, add an "M" after the regular expression terminator, like /^>/M -- see section 3.1.1). Even if you use this feature, \` and \' still match the beginning and the end of the pattern space, respectively.

H. ssed

Everything that was said for GNU sed applies to ssed as well. In addition, in Perl-mode (-R switch), these become active or inactive:

      .     - no longer matches new-line characters
      \A    - matches beginning of pattern space
      \Z    - matches end of pattern space or last newline in the PS
      \z    - matches end of pattern space
      \d    - matches any digit: same as [0-9]
      \D    - matches any non-digit: same as [^0-9]
      \`    - no longer matches beginning of pattern space
      \'    - no longer matches end of pattern space
      \<    - no longer matches boundary between nonword & word char
      \>    - no longer matches boundary between word & nonword char
      \oNNN - no longer matches char with octal value NNN
      \dNNN - no longer matches char with decimal value NNN
      \NNN  - matches char with octal value NNN

Perl mode supports lookahead (?=match) and lookbehind (?<=match) pattern matching. The matched text is NOT captured in "&" for s/// replacements!

      foo(?=bar)   - match "foo" only if "bar" follows it
      foo(?!bar)   - match "foo" only if "bar" does NOT follow it
      (?<=foo)bar  - match "bar" only if "foo" precedes it
      (?<!foo)bar  - match "bar" only if "foo" does NOT precede it

      (?<!in|on|at)foo
                  - match "foo" only if NOT preceded by "in", "on" or "at"
      (?<=\d{3})(?<!999)foo
                  - match "foo" only if preceded by 3 digits other than "999"

In Perl mode, there are two new switches in /addressing/ or s/// commands. Switches may be lowercase in s/// commands, but must be uppercase in /addressing/:

       /S  - lets "." match a newline also
       /X  - extra whitespace is ignored. See below, for sample usage.

Here are some examples of Perl-style regular expressions. Use the -R switch.

     (?i)abc    - case-insensitive match of abc, ABC, aBc, ABc, etc.
     ab(?i)c    - same as above; the (?i) applies throughout the pattern
     (ab(?i)c)  - matches abc or abC; the outer parens make the difference!
     (?m)       - multi-line pattern space: same as "s/FIND/REPL/M"
     (?s)       - set "." to match newline also: same as "s/FIND/REPL/S"
     (?x)       - ignore whitespace and #comments; see section (9) below.

     (?:abc)foo    - match "abcfoo", but do not capture 'abc' in \1
     (?:ab|cd)ef   - match "abef" or "cdef"; only 'cd' is captured in \1
     (?#remark)xy  - match "xy"; remarks after "#" are ignored.

And here are some sample uses of /X switch to add comments to complex expressions. To embed literal spaces, precede with \ or put inside [brackets].

     # ssed script to change "(123) 456-7890" into "[ac123] 456-7890"
     #
     s/ # BACKSLASH IS NEEDED AT END OF EACH LINE!   \
     \(                   # literal left paren, (    \
     (\d{3})              # 3 digits                 \
     \)                   # literal right paren, )   \
     [ \t]*               # zero or more spaces or tabs  \
     (\d{3}-\d{4})        # 3 digits, hyphen, 4 digits   \
     /[ac\1] \2/gx;       # replace g(lobally), with e(x)tended spacing

The sed FAQ
Prev	Home	Next