Sequence Processing Functions: map, filter, reduce and zip

The map, filter, and reduce built-in functions are handy functions for processing sequences. These owe much to the world of functional programming languages. The idea is to take a small function you write and apply it to all the elements of a sequence. This saves you writing an explicit loop. The implicit loop within each of these functions may be faster than an explicit for or while loop.

Additionally, each of these is a pure function, returning a result value. This allows the results of the functions to be combined into complex expressions relatively easily.

It is common to have multiple-step processes on lists of values. For instance, filtering a large set of data to locate useful samples, transforming those samples and then computing a sum or average. Rather than write three explicit loops, the result can be computed in a single expression.

Let's say we have two sequences and we have a multi-step process like this:

  1. for i in seq1:.... apply function f1 and create sequence r1
  2. for i in seq2:.... apply function f2 and create sequence r2
  3. for i in len(r1):.... apply function f3 to r1[i] and r2[i]

Instead of these lengthy explicit loops, we can take a functional approach that look like this:

f= reduce( f3, zip( map(f1,seq1), map(f2,seq2) ) )

Where f1, f2, and f3 are functions that define the body of the each of the above loops.

Definitions. These functions transform lists. The map and filter each apply some function to a sequence to create a new sequence. The reduce function applies a function which will reduce the sequence to a single value. The zip function interleaves values from lists to create a list of tuples.

The map, zip and filter functions have no internal state, they simply apply the function to each individual value of the sequence. The reduce function, in contrast, maintains an internal state which is seeded from an initial value, passed to the function along with each value of the sequence and returned as the final result.

Here are the formal definitions.

map ( function , sequence , [ sequence... ] ) → list

Create a new list from the results of applying the given function to the items of the the given sequence . If more than one sequence is given, the function is called with multiple arguments, consisting of the corresponding item of each sequence. If any sequence is too short, None is used for missing value. If the function is None, map will create tuples from corresponding items in each list, much like the zip function.

filter ( function , sequence ) → list

Return a list containing those items of sequence for which function ( item ) is true. If function is None, return a list of items that are equivalent to True.

reduce ( function , sequence , [ initial ]) → value

Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce( lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.

zip ( seq1 , [ seq2,... ] ) → [ ( seq1 [0], seq2 [0],...), ... ]

Return a list of tuples, where each tuple contains the matching element from each of the argument sequences. The returned list is truncated in length to the length of the shortest argument sequence.

Costs and Benefits. What are the advantages? First, the functional version can be clearer. It's a single line of code that summarizes the processing. Second, and more important, Python can execute the sequence processing functions far faster than the equivalent explicit loop.

You can see that map and filter are equivalent to simple list comprehensions. This gives you two ways to specify these operations, both of which have approximately equivalent performance. This means that map and filter aren't essential to Python, but hey are widely used.

The reduce function is a bit of a problem. It can have remarkably bad performance if it is misused. Consequently, there is some debate about the value of having this function. We'll present the function along with the caveat that it can lead to remarkable slowness.

The map Function. The map function transforms a sequence into another sequence by applying a function to each item in the sequence. The idea is to apply a mapping tranformation to a sequence. This is is a common design pattern within numerous kinds of programs. Generally, the transformation is the interesting part of the programming, and the loop is just another boring old loop.

The function call map( aFunction , aSequence ) behaves as if it had the following definition.

def map( aFunction, aSequence ):
    return [ aFunction(v) for v in aSequence ]

For example:

>>> 

map( int, [ "10", "12", "14", 3.1415926, 5L ] )

[10, 12, 14, 3, 5]
            

This applies the int function to each element of the input sequence (a list that contains some strings, a floating point value and a long integer value) to create the output sequence (a list of integers).

The function used in map can be a built-in function, or a user-defined function created with the def statement (see Chapter 9, Functions ).

>>> 
def oddn(x):

... 
    return x*2+1

...
>>> 
map( oddn, range(6) )

[1, 3, 5, 7, 9, 11]

This example defines a function oddn, which creates an odd number from the input. The map function applies our oddn function to each value of the sequence created by range( 6 ). We get the first 6 odd numbers.

The filter Function. The filter function chooses elements from the input sequence where a supplied function is True. Elements for which the supplied function is False are discarded.

The function call filter ( aFunction , aSequence ) behaves as if it had the following definition.

def filter( aFunction, aSequence ):
    return [ v for v in aSequence if aFunction(v) ]

For example:

>>> 
def gt2( a ):

... 
    return a > 2

...
>>> 
filter( gt2, range(8) )

[3, 4, 5, 6, 7]
       

This example uses a function, gt2, which returns True for inputs greater than 2. We create a range from 0 to 7, but only keep the values for which the filter function returns True.

Here's another example that keeps all numbers that are evenly divisible by 3.

>>> 
def div3( a ):

... 
    return a % 3 == 0

...
>>> 
filter( div3, range(10) )

[0, 3, 6, 9]

Our function, div3, returns True when the remainder of dividing a number by 3 is exactly 0. This will keep numbers that are evenly divisible by 3. We create a range, apply the function to each value in the range, and keep only the values where the filter is True.

The reduce Function. The reduce function can be used to implement the common spread-sheet functions that compute sums and products. This function works by seeding a result with an initial value. It then calls the user-supplied function with this result and each value of the sequence. This is remarkably common, but it can't be done as a list comprehension.

The function call reduce ( aFunction , aSequence , [, init ]) behaves as if it had the following definition.

def reduce( aFunction, aSequence, init= 0 ):
    r= init
    for s in aSequence:
        r= aFunction( r, s )
    return r

The important thing to note is that the function is applied to the internal value, r, and each element of the list to compute a new internal value.

For example:

>>> 
def add(a,b):

... 
    return a+b

...
>>> 
reduce( add, range(10) )

45

This expression computes the sum of the 10 numbers from zero through nine. The function we defined, add, adds the previous result and the next sequence value together.

Here's an interesting example that combines reduce and map. This uses two functions defined in earlier examples, add and oddn.

for i in range(10):
    sq=reduce( add, map(oddn, range(i)), 0 )
    print i, sq

Let's look at the evaluation from innermost to outermost. The range( i ) generates a list of numbers from 0 to i-1. The map applies the oddn function form to create a sequence of i odd numbers from 1 to 2i+1. The reduce then adds this sequence of odd numbers. Interestingly, these sums add to the square of i.

The zip Function. The zip function interleaves values from two sequences to create a new sequence. The new sequence is a sequence of tuples. Each item of a tuple is the corresponding values from from each sequence.

>>> 
zip( range(5), range(1,20,2) )

[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9)]

In this example, we zipped two sequences together. The first sequence was range(5), which has five values. The second sequence was range(1,20,2) which has 10 odd numbers from 1 to 19. Since zip truncates to the shorter list, we get five tuples, each of which has the matching values from both lists.

The map function behaves a little like zip when there is no function provided, just sequences. However, map does not truncate, it fills the shorter list with None values.

>>> 
map( None, range(5), range(1,20,2) )

[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9), (None, 11), (None, 13), (None, 15), 
(None, 17), (None, 19)]

   
  Published under the terms of the Open Publication License Design by Interspire  

Quantcast