String
is probably the largest built-in Ruby class, with over 75
standard methods. We won't go through them all here; the library
reference has a complete list. Instead, we'll look at
some common string idioms---things that are likely to pop up during
day-to-day programming.
Let's get back to our jukebox. Although it's designed to be connected
to the Internet, it also holds copies of some popular songs on a local
hard drive. That way, if a squirrel chews through our 'net connection
we'll still be able to entertain the customers.
For historical reasons (are there any other kind?), the list of songs
is stored as rows in a flat file. Each row holds the name of the file
containing the song, the song's duration, the artist, and the title,
all in vertical-bar-separated fields. A typical file might
start:
/jazz/j00132.mp3 | 3:45 | Fats Waller | Ain't Misbehavin'
/jazz/j00319.mp3 | 2:58 | Louis Armstrong | Wonderful World
/bgrass/bg0732.mp3| 4:09 | Strength in Numbers | Texas Red
: : : :
|
Looking at the data, it's clear that we'll be using some of class
String
's many methods to extract and clean up the fields before we
create
Song
objects based on them. At a minimum, we'll need to:
- break the line into fields,
- convert the running time from mm:ss to seconds, and
- remove those extra spaces from the artist's name.
Our first task is to split each line into fields, and
String#split
will do the job nicely. In
this case, we'll pass
split
a regular expression,
/\s*\|\s*/
, which splits the line into tokens
wherever
split
finds a vertical bar, optionally surrounded by
spaces. And, because the line read from the file has a trailing
newline, we'll use
String#chomp
to strip it off just before we
apply the split.
songs = SongList.new
songFile.each do |line|
file, length, name, title = line.chomp.split(/\s*\|\s*/)
songs.append Song.new(title, name, length)
end
puts songs[1]
|
produces:
Song: Wonderful World--Louis Armstrong (2:58)
|
Unfortunately, whoever created the original file entered the artists'
names in columns, so some of them contain extra spaces. These will
look ugly on our high-tech, super-twist, flat-panel Day-Glo display, so
we'd better remove these extra spaces before we go much further.
There are many ways of doing this, but
probably the simplest is
String#squeeze
, which trims runs of
repeated characters. We'll use the
squeeze!
form of the
method, which alters the string in place.
songs = SongList.new
songFile.each do |line|
file, length, name, title = line.chomp.split(/\s*\|\s*/)
name.squeeze!(" ")
songs.append Song.new(title, name, length)
end
puts songs[1]
|
produces:
Song: Wonderful World--Louis Armstrong (2:58)
|
Finally, there's the minor matter of the time format: the file says
2:58, and we want the number of seconds, 178. We could use
split
again, this time splitting the time field around the
colon character.
mins, secs = length.split(/:/)
|
Instead, we'll use a related method.
String#scan
is similar to
split
in that it breaks a string into chunks based on a
pattern. However, unlike
split
, with
scan
you
specify the pattern that you want the chunks to match. In this case,
we want to match one or more digits for both the minutes and seconds
component. The pattern for one or more digits is
/\d+/
.
songs = SongList.new
songFile.each do |line|
file, length, name, title = line.chomp.split(/\s*\|\s*/)
name.squeeze!(" ")
mins, secs = length.scan(/\d+/)
songs.append Song.new(title, name, mins.to_i*60+secs.to_i)
end
puts songs[1]
|
produces:
Song: Wonderful World--Louis Armstrong (178)
|
Our jukebox has a keyword search capability. Given a word from a song
title or an artist's name, it will list all matching tracks. Type in
``fats,'' and it might come back with songs by Fats Domino, Fats
Navarro, and Fats Waller, for example. We'll implement this by
creating an indexing class. Feed it an object and some strings,
and it will index that object under every word (of two or more
characters) that occurs in those strings. This will illustrate a few
more of class
String
's many methods.
class WordIndex
def initialize
@index = Hash.new(nil)
end
def index(anObject, *phrases)
phrases.each do |aPhrase|
aPhrase.scan /\w[-\w']+/ do |aWord| # extract each word
aWord.downcase!
@index[aWord] = [] if @index[aWord].nil?
@index[aWord].push(anObject)
end
end
end
def lookup(aWord)
@index[aWord.downcase]
end
end
|
The
String#scan
method extracts elements from a string that
match a regular expression. In this case, the pattern
``
\w[-\w']+
'' matches any character that can
appear in a word, followed by one or more of the things specified in
the brackets (a hyphen, another word character, or a single quote). We'll talk
more about regular expressions beginning on page 56. To make our
searches case insensitive, we map both the words we extract and the
words used as keys during the lookup to lowercase. Note the
exclamation mark at the end of the first
downcase!
method
name. As with the
squeeze!
method we used previously, this is
an indication that the method will modify the receiver in place, in this
case converting the string to lowercase.
[There's a minor bug
in this code example: the song ``Gone, Gone, Gone'' would get
indexed three times. Can you come up with a fix?]
We'll extend our
SongList
class to index songs as they're added,
and add a method to look up a song given a word.
class SongList
def initialize
@songs = Array.new
@index = WordIndex.new
end
def append(aSong)
@songs.push(aSong)
@index.index(aSong, aSong.name, aSong.artist)
self
end
def lookup(aWord)
@index.lookup(aWord)
end
end
|
Finally, we'll test it all.
songs = SongList.new
songFile.each do |line|
file, length, name, title = line.chomp.split(/\s*\|\s*/)
name.squeeze!(" ")
mins, secs = length.scan(/\d+/)
songs.append Song.new(title, name, mins.to_i*60+secs.to_i)
end
puts songs.lookup("Fats")
puts songs.lookup("ain't")
puts songs.lookup("RED")
puts songs.lookup("WoRlD")
|
produces:
Song: Ain't Misbehavin'--Fats Waller (225)
Song: Ain't Misbehavin'--Fats Waller (225)
Song: Texas Red--Strength in Numbers (249)
Song: Wonderful World--Louis Armstrong (178)
|
We could spend the next 50 pages looking at all the methods in class
String
. However, let's move on instead to look at a simpler
datatype: ranges.