List Comprehensions and Generator Expressions in Python

List comprehensions and generator expressions perform similar computations and are almost identical in syntax. The difference between them is subtle but very important.

First let's look at their syntax.

lc = [i for i in range(5)]  # list comprehension
ge = (i for i in range(5))  # generator expression

As we can see, their syntax is almost identical. The only difference being the use of square brackets in list comprehensions and parenthesis in generator expressions.

If we were to use the resultant objects of both statements in a for loop or some statement which iterates over their elements one at a time, we won't notice any difference between them.

for n in lc:
    print(n, end=' ')
# prints
# 0 1 2 3 4

for n in ge:
    print(n, end=' ')
# prints
# 0 1 2 3 4

The difference between a list comprehension and a generator expression lies in the manner in which they produce their results. A list comprehension performs its computation and creates a list that contains the resulting data. But a generator expression doesn't perform the computation immediately. Instead it returns a generator object which can produce the result data on demand.

This can be seen in the following example:

def square(n):
    print('square of %s' % n)
    return n * n

# List Comprehension
lc = [square(n) for n in range(5)]
# prints
# square of 0
# square of 1
# square of 2
# square of 3
# square of 4

# Generator Expression
ge = (square(n) for n in range(5))
# prints nothing

# prints
# square of 0 
# and returns 0

# prints
# square of 1 
# and returns 1

# prints
# square of 2
# and returns 4

# and so on...

print(type(lc))  # prints <type 'list'>
print(type(gc))  # prints <type 'generator'>

Advantages of Generators

By not creating a list immediately, generator expressions can conserve memory and greatly improve performance when processing large data sets.

Consider an application that reads a large file line by line and prints lines that contain some keyword.

logfile = open('access.log')
matches = (line for line in logfile if 'python' in line)

for line in matches:

Since we used a generator expression here, the lines of the file are read one by one when the for loop is executed, not when the generator expression is executed. If a list comprehension was used instead, the entire file would have been read and the matching lines would be stored in a list before the for loop was executed.

A generator expression is a highly efficient way to perform this operation. It provides the following advantages over list comprehensions in this scenario.

  1. It conserves memory by not processing the entire file and creating a large list.
  2. It starts printing out the matches immediately without waiting for the entire file to be processed first.
  3. It avoids processing the entire file in case we need to break out of the loop after partially processing the file.

As generator expressions do not create a list, the resulting object cannot be indexed or sliced and none of the standard list operations like sort or append will work. However, if required, a generator expression can be easily converted into a list using the built-in list() function.

matches = list(matches)

When to Use What

Now that we know the difference between list comprehensions and generator expressions, how do we decide when to use what.

Use list comprehensions when ...

  • We want a list as the final result.
  • We want to iterate over the result multiple times.
  • We want to slice or index into the resulting list.
  • We want to use any other list operations on the resulting list.

Use generator expressions when ...

  • The generated result is intermediary.
  • The data being iterated over is large or infinite.