The map, filter, and
reduce built-in functions are handy functions for
processing sequences. These owe much to the world of functional
programming languages. The idea is to take a small function you write
and apply it to all the elements of a sequence. This saves you writing
an explicit loop. The implicit loop within each of these functions may
be faster than an explicit
for
or
while
loop.
Additionally, each of these is a pure function, returning a result value. This allows the results of the functions to be combined into complex expressions relatively easily.
It is common to have multiple-step processes on
lists of values. For instance, filtering a large
set of data to locate useful samples, transforming those samples and
then computing a sum or average. Rather than write three explicit loops,
the result can be computed in a single expression.
Let's say we have two sequences and we have a multi-step process like this:
Instead of these lengthy explicit loops, we can take a functional approach that look like this:
f= reduce( f3, zip( map(f1,seq1), map(f2,seq2) )
)
Where f1, f2, and
f3 are functions that define the body of the each
of the above loops.
Definitions. These functions transform lists. The map
and filter each apply some function to a sequence
to create a new sequence. The reduce function
applies a function which will reduce the sequence to a single value.
The zip function interleaves values from lists to
create a list of tuples.
The map, zip and
filter functions have no internal state, they
simply apply the function to each individual value of the sequence. The
reduce function, in contrast, maintains an internal
state which is seeded from an initial value, passed to the function
along with each value of the sequence and returned as the final
result.
Here are the formal definitions.
map (
function
,
sequence
, [
sequence...
]
) → list
Create a new list from the results of
applying the given
function
to the items of
the the given
sequence
. If more than one
sequence is given, the function is called with multiple arguments,
consisting of the corresponding item of each sequence. If any
sequence is too short, None is used for missing
value. If the
function
is
None, map will create
tuples from corresponding items in each list, much like the
zip function.
filter (
function
,
sequence
) → list
Return a list containing those items
of
sequence
for which function
(
item
) is true. If
function
is None, return
a list of items that are equivalent to
True.
reduce (
function
,
sequence
,
[
initial
]) → value
Apply a
function
of two arguments
cumulatively to the items of a sequence, from left to right, so as
to reduce the sequence to a single value. For example, reduce(
lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5).
If
initial
is present, it is placed before
the items of the sequence in the calculation, and serves as a
default when the sequence is empty.
zip (
seq1
,
[
seq2,...
] ) → [
(
seq1
[0],
seq2
[0],...), ... ]
Return a list of
tuples, where each
tuple contains the matching element from
each of the argument sequences. The returned
list is truncated in length to the length
of the shortest argument sequence.
Costs and Benefits. What are the advantages? First, the functional version can be clearer. It's a single line of code that summarizes the processing. Second, and more important, Python can execute the sequence processing functions far faster than the equivalent explicit loop.
You can see that map and
filter are equivalent to simple list
comprehensions. This gives you two ways to specify these operations,
both of which have approximately equivalent performance. This means that
map and filter aren't
essential
to Python, but hey are widely
used.
The reduce function is a bit of a problem. It
can have remarkably bad performance if it is misused. Consequently,
there is some debate about the value of having this function. We'll
present the function along with the caveat that it can lead to
remarkable slowness.
The map Function. The map function transforms a sequence into
another sequence by applying a function to each item in the sequence.
The idea is to apply a mapping tranformation to a sequence. This is is
a common design pattern within numerous kinds of programs. Generally,
the transformation is the interesting part of the programming, and the
loop is just another boring old loop.
The function call
map(
aFunction
,
aSequence
) behaves as if it had the
following definition.
def map( aFunction, aSequence ):
return [ aFunction(v) for v in aSequence ]
For example:
>>>map( int, [ "10", "12", "14", 3.1415926, 5L ] )[10, 12, 14, 3, 5]
This applies the int function to each element
of the input sequence (a list that contains some
strings, a floating point value and a long
integer value) to create the output sequence (a
list of integers).
The function used in map can be a built-in
function, or a user-defined function created with the
def
statement (see Chapter 9, Functions
).
>>>def oddn(x):...return x*2+1...>>>map( oddn, range(6) )[1, 3, 5, 7, 9, 11]
This example defines a function oddn, which
creates an odd number from the input. The map
function applies our oddn function to each value of
the sequence created by range(
6
). We get the first 6 odd numbers.
The filter Function. The filter function chooses elements from
the input sequence where a supplied function is
True. Elements for which the supplied function is
False are discarded.
The function call filter
(
aFunction
,
aSequence
) behaves as if it had
the following definition.
def filter( aFunction, aSequence ):
return [ v for v in aSequence if aFunction(v) ]
For example:
>>>def gt2( a ):...return a > 2...>>>filter( gt2, range(8) )[3, 4, 5, 6, 7]
This example uses a function, gt2, which
returns True for inputs greater than 2. We create a
range from 0 to 7, but only keep the values for which the filter
function returns True.
Here's another example that keeps all numbers that are evenly divisible by 3.
>>>def div3( a ):...return a % 3 == 0...>>>filter( div3, range(10) )[0, 3, 6, 9]
Our function, div3, returns
True when the remainder of dividing a number by 3 is
exactly 0. This will keep numbers that are evenly divisible by 3. We
create a range, apply the function to each value in the range, and keep
only the values where the filter is True.
The reduce Function. The reduce function can be used to
implement the common spread-sheet functions that compute sums and
products. This function works by seeding a result with an initial
value. It then calls the user-supplied function with this result and
each value of the sequence. This is remarkably common, but it can't be
done as a list comprehension.
The function call reduce
(
aFunction
,
aSequence
, [,
init
]) behaves as if it had the
following definition.
def reduce( aFunction, aSequence, init= 0 ):
r= init
for s in aSequence:
r= aFunction( r, s )
return r
The important thing to note is that the function is applied to the
internal value, r, and each element of the list to
compute a new internal value.
For example:
>>>def add(a,b):...return a+b...>>>reduce( add, range(10) )45
This expression computes the sum of the 10 numbers from zero
through nine. The function we defined, add, adds
the previous result and the next sequence value together.
Here's an interesting example that combines
reduce and map. This uses two
functions defined in earlier examples, add and
oddn.
for i in range(10):
sq=reduce( add, map(oddn, range(i)), 0 )
print i, sq
Let's look at the evaluation from innermost to outermost. The
range(
i
) generates a
list of numbers from 0 to i-1.
The map applies the oddn
function form to create a sequence of i odd numbers
from 1 to 2i+1. The reduce then
adds this sequence of odd numbers. Interestingly, these sums add to the
square of i.
The zip Function. The zip function interleaves values from
two sequences to create a new sequence. The new sequence is a sequence
of tuples. Each item of a
tuple is the corresponding values from from
each sequence.
>>>zip( range(5), range(1,20,2) )[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9)]
In this example, we zipped two sequences together. The first
sequence was range(5), which has five values. The second
sequence was range(1,20,2) which has 10 odd numbers from 1 to 19. Since
zip truncates to the shorter list, we get five
tuples, each of which has the matching values from both lists.
The map function behaves a little like
zip when there is no function provided, just
sequences. However, map does not truncate, it fills
the shorter list with None values.
>>>map( None, range(5), range(1,20,2) )[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9), (None, 11), (None, 13), (None, 15), (None, 17), (None, 19)]
| Published under the terms of the Open Publication License | Design by Interspire |