Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Ruby Programming
Previous Page Home Next Page

Backslash Sequences in the Substitution

Earlier we noted that the sequences \1, \2, and so on are available in the pattern, standing for the nth group matched so far. The same sequences are available in the second argument of sub and gsub.

"fred:smith".sub(/(\w+):(\w+)/, '\2, \1') "smith, fred"
"nercpyitno".gsub(/(.)(.)/, '\2\1') "encryption"

There are additional backslash sequences that work in substitution strings: \& (last match), \+ (last matched group), \` (string prior to match), \' (string after match), and \\ (a literal backslash). It gets confusing if you want to include a literal backslash in a substitution. The obvious thing is to write

str.gsub(/\\/, '\\\\')

Clearly, this code is trying to replace each backslash in str with two. The programmer doubled up the backslashes in the replacement text, knowing that they'd be converted to ``\\'' in syntax analysis. However, when the substitution occurs, the regular expression engine performs another pass through the string, converting ``\\'' to ``\'', so the net effect is to replace each single backslash with another single backslash. You need to write gsub(/\\/, '\\\\\\\\')!

str = 'a\b\c' "a\b\c"
str.gsub(/\\/, '\\\\\\\\') "a\\b\\c"

However, using the fact that \& is replaced by the matched string, you could also write

str = 'a\b\c' "a\b\c"
str.gsub(/\\/, '\&\&') "a\\b\\c"

If you use the block form of gsub, the string for substitution is analyzed only once (during the syntax pass) and the result is what you intended.

str = 'a\b\c' "a\b\c"
str.gsub(/\\/) { '\\\\' } "a\\b\\c"

Finally, as an example of the wonderful expressiveness of combining regular expressions with code blocks, consider the following code fragment from the CGI library module, written by Wakou Aoyama. The code takes a string containing HTML escape sequences and converts it into normal ASCII. Because it was written for a Japanese audience, it uses the ``n'' modifier on the regular expressions, which turns off wide-character processing. It also illustrates Ruby's case expression, which we discuss starting on page 81.

def unescapeHTML(string)
  str = string.dup
  str.gsub!(/&(.*?);/n) {
    match = $1.dup
    case match
    when /\Aamp\z/ni           then '&'
    when /\Aquot\z/ni          then '"'
    when /\Agt\z/ni            then '>'
    when /\Alt\z/ni            then '<'
    when /\A#(\d+)\z/n         then Integer($1).chr
    when /\A#x([0-9a-f]+)\z/ni then $1.hex.chr
    end
  }
  str
end

puts unescapeHTML("1&lt;2 &amp;&amp; 4&gt;3") puts unescapeHTML("&quot;A&quot; = &#65; = &#x41;")
produces:
1<2 && 4>3
"A" = A = A
Ruby Programming
Previous Page Home Next Page

 
 
  Published under the terms of the Open Publication License Design by Interspire