The sed (Stream Editor) FAQ - 3.3. Addressing and address ranges

The sed FAQ
Prev	Home	Next

3.3. Addressing and address ranges

Sed commands may have an optional "address" or "address range" prefix. If there is no address or address range given, then the command is applied to all the lines of the input file or text stream. Three commands cannot take an address prefix:

labels, used to branch or jump within the script
the close brace, '}', which ends the '{' "command"
the '#' comment character, also technically a "command"

An address can be a line number (such as 1, 5, 37, etc.), a regular expression (written in the form /RE/ or \xREx where 'x' is any character other than '\' and RE is the regular expression), or the dollar sign ($), representing the last line of the file. An exclamation mark (!) after an address or address range will apply the command to every line EXCEPT the ones named by the address. A null regex ("//") will be replaced by the last regex which was used. Also, some seds do not support \xREx as regex delimiters.

     5d               # delete line 5 only
     5!d              # delete every line except line 5
     /RE/s/LHS/RHS/g  # substitute only if RE occurs on the line
     /^$/b label      # if the line is blank, branch to ':label'
     /./!b label      # ... another way to write the same command
     \%.%!b label     # ... yet another way to write this command
     $!N              # on all lines but the last, get the Next line

Note that an embedded newline can be represented in an address by the symbol \n, but this syntax is needed only if the script puts 2 or more lines into the pattern space via the N, G, or other commands. The \n symbol does not match the newline at an end-of-line because when sed reads each line into the pattern space for processing, it strips off the trailing newline, processes the line, and adds a newline back when printing the line to standard output. To match the end-of-line, use the '$' metacharacter, as follows:

     /tape$/       # matches the word 'tape' at the end of a line
     /tape$deck/   # matches the word 'tape$deck' with a literal '$'
     /tape\ndeck/  # matches 'tape' and 'deck' with a newline between

The following sed commands usually accept only a single address. All other commands (except labels, '}', and '#') accept both single addresses and address ranges.

     =       print to stdout the line number of the current line
     a       after printing the current line, append "text" to stdout
     i       before printing the current line, insert "text" to stdout
     q       quit after the current line is matched
     r file  prints contents of "file" to stdout after line is matched

Note that we said "usually." If you need to apply the '=', 'a', 'i', or 'r' commands to each and every line within an address range, this behavior can be coerced by the use of braces. Thus, "1,9=" is an invalid command, but "1,9{=;}" will print each line number followed by its line for the first 9 lines (and then print the rest of the rest of the file normally).

Address ranges occur in the form

       <address1>,<address2>    or    <address1>,<address2>!

where the address can be a line number or a standard /regex/. <address2> can also be a dollar sign, indicating the end of file. Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a notation of the form +num, indicating the next num lines after <address1> is matched.

Address ranges are:

(1) Inclusive. The range "/From here/,/eternity/" matches all the lines containing "From here" up to and including the line containing "eternity". It will not stop on the line just prior to "eternity". (If you don't like this, see section 4.24.)

(2) Plenary. They always match full lines, not just parts of lines. In other words, a command to change or delete an address range will change or delete whole lines; it won't stop in the middle of a line.

(3) Multi-linear. Address ranges normally match 2 lines or more. The second address will never match the same line the first address did; therefore a valid address range always spans at least two lines, with these exceptions which match only one line:

if the first address matches the last line of the file
if using the syntax "/RE/,3" and /RE/ occurs only once in the file at line 3 or below
if using HHsed v1.5. See section 3.4.

(4) Minimalist. In address ranges with /regex/ as <address2>, the range "/foo/,/bar/" will stop at the first "bar" it finds, provided that "bar" occurs on a line below "foo". If the word "bar" occurs on several lines below the word "foo", the range will match all the lines from the first "foo" up to the first "bar". It will not continue hopping ahead to find more "bar"s. In other words, address ranges are not "greedy," like regular expressions.

(5) Repeating. An address range will try to match more than one block of lines in a file. However, the blocks cannot nest. In addition, a second match will not "take" the last line of the previous block. For example, given the following text,

       start
       stop  start
       stop

the sed command '/start/,/stop/d' will only delete the first two lines. It will not delete all 3 lines.

(6) Relentless. If the address range finds a "start" match but doesn't find a "stop", it will match every line from "start" to the end of the file. Thus, beware of the following behaviors:

     /RE1/,/RE2/  # If /RE2/ is not found, matches from /RE1/ to the
                  # end-of-file.

     20,/RE/      # If /RE/ is not found, matches from line 20 to the
                  # end-of-file.

     /RE/,30      # If /RE/ occurs any time after line 30, each
                  # occurrence will be matched in sed15+, sedmod, and
                  # GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
                  # from the 2nd occurrence of /RE/ to the end-of-file.

If these behaviors seem strange, remember that they occur because sed does not look "ahead" in the file. Doing so would stop sed from being a stream editor and have adverse effects on its efficiency. If these behaviors are undesirable, they can be circumvented or corrected by the use of nested testing within braces. The following scripts work under GNU sed 3.02:

     # Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
     # not found, do nothing.
     /RE1/{:a;N;/RE2/!ba;your_commands;}

     # Execute your_commands on range "20,/RE/", but if /RE/ is not
     # found, do nothing.
     20{:a;N;/RE/!ba;your_commands;}

As a side note, once we've used N to "slurp" lines together to test for the ending expression, the pattern space will have gathered many lines (possibly thousands) together and concatenated them as a single expression, with the \n sequence marking line breaks. The REs within the pattern space may have to be modified (e.g., you must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead of '/.*/') and other standard sed commands will be unavailable or difficult to use.

     # Execute your_commands on range "/RE/,30", but if /RE/ occurs
     # on line 31 or later, do not match it.
     1,30{/RE/,$ your_commands;}

For related suggestions on using address ranges, see sections 4.2, 4.15, and 4.19 of this FAQ. Also, note the following section.

The sed FAQ
Prev	Home	Next