Copyright 2017-2024 Jason Ross, All Rights Reserved

Image by Sergey Isaev from Pixabay

Many languages have the concept of "generators", structures which can be used to produce, or generate, objects of another type. Normally these objects are returned one after another as some sort of iterable collection.

If all of that sounds a little too vague, let's make it more specific: In Python, a generator is a function or object that returns a series of values.

Because the values are returned one after another, they're usually used in loops or, in Python at least, list comprehensions. (Some of us allege that list comprehensions are just syntactical sugar for for loops, but saying that angers people, so we'll steer clear of that for the moment!)

Something that returns a sequence of values might sound familiar, especially if you've already read Iteration and Iterators. If you haven't, I'd recommend it.

In any case, generators look a lot like iterators because generators ARE iterators. Let's take some example code from Iteration and Iterators:


class Squares:

    def __init__(self, max_count):
        """
        Create an instance of the class returning the squares of
        the first max_count natural numbers.

        :param max_count: The number of squares to return.
        """
        self.max_count = max_count
        self.value = 0

    def __iter__(self):
        """
        Return an iterator for this object. This object represents an
        iterable collection, but is also an iterator because it
        implements the __next__() method.

        :return: An iterator for this object.
        """

        return self

    def __next__(self):
        """
        Calculate and return the next value in the series.

        :return: The next value, or raise a StopIteration exception
        if there are no further calls.
        """

        if self.value < self.max_count:
            self.value += 1
            return self.value * self.value
        else:
            raise StopIteration

With iterators, you generally expect to see them running over a collection, but there doesn’t appear to be one in this example. Well, there is, but you can't see it because it uses lazy evaluation, with each value being calculated only when it's requested by the caller. The collection that’s being iterated across is "the set of squares of natural numbers".

So, although the previous article didn't mention this, the Squares iterator class is also a generator!

In a way that seems typical of Python though, that's not quite the whole story. In most languages this example would be a "generator class", but in Python it's a "custom iterator". Both are technically correct descriptions. The thing that should concern you as a software engineer is: that's an awful lot of code for something that returns results from a simple formula. If your generator needs to store a lot of state information, or is generally hideously complex, then using a "custom iterator" is the way to go. What if that's not the case though? There must be a simpler way...

There's A Simpler Way! (in Python at least.)

If all you want to do is write a simple generator that you can instantiate and use once every time you use it, and that doesn’t need to be overly complex, you can use a function. These functions are actually slightly different from regular Python functions, in that they use the yield keyword instead of return, but apart from that they’re the same. Unsurprisingly these functions are called generator functions. An example is shown below:

def square_generator(count):
	for number in range(1, count + 1):
		yield number * number

So isn’t this just a regular function then?

Yes, and no. If you call this function, it looks like you might get the first item in the series, but you don’t:

The result of calling the function is a generator object. That might seem strange, until you remember that the function is used in the same way you’d use an iterator:


>>> for i in square_generator(4):
	print(i)
	
1
4
9
16

What’s going on in this loop is the same as when you use an iterator: an iterator object is created and then the built-in next() function is called on it repeatedly until it raises a StopIteration or StopAsyncIteration exception, or until the generator returns at the end of the function.

So, what is this yield keyword?

The yield keyword is a replacement for the return keyword in generator functions. If a function contains the yield keyword, the Python interpreter treats it as a generator function instead of a regular function. This is all a little weird really – why would the presence of one, or more, instances of a keyword change the behaviour of what looks like a function into something different? Why not specify that a function is a generator in its header? That’s something that you’ll need to ask the designers of the Python language.

You mentioned a difference in behaviour between the function types – what’s that?

A normal Python function is called, executes, and returns a value. The initial call to the generator function doesn’t actually call the function – it just creates and returns a generator that points to the function. It’s only when you call the function again, via the next() keyword, that the code in the function actually executes up to and including the next yield statement.

This is very similar to an iterator, because generators ARE iterators remember: you create the iterator, then repeatedly call next() on it.

In Python, as well as some other language, yield doesn't just mean return the result. It means return this result, freeze the state of this generator, and start at the next instruction the next time that the next() function is called:

>>> g = square_generator(4)
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16
>>> next(g)
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    next(g)
StopIteration

This part is important to remember: once you call yield within the generator, the return value is passed back to the caller and everything just stops in the generator until next() is called by the calling code. Any variables in the generator remain unchanged. If you've opened a file in the generator, it stays open. If you started a separate process, that will carry on because it's independent. The code below illustrates the operation of the yield keyword:


>>> def generate_squares(count):
	for number in range(1, count + 1):
		print(f'Calculating value {number}')
		yield number * number
		print(f'Just after yield of square of {number}')

		
>>> for i in generate_squares(4):
	print(f'Started loop for {i}')

	
Calculating value 1
Started loop for 1
Just after yield of square of 1
Calculating value 2
Started loop for 4
Just after yield of square of 2
Calculating value 3
Started loop for 9
Just after yield of square of 3
Calculating value 4
Started loop for 16
Just after yield of square of 4

List Comprehensions And Generators

Python list comprehensions can be very useful, and it seems you can’t read much about Python without their being mentioned. They have their disadvantages though, especially with large amounts of data.

For example, if you wanted to calculate the sum of the squares of all positive integers less than 500,000,000 that are divisible by 3 or 5, you might decide to write a list comprehension along the lines of:

squares = [s * s for s in range(1, 500000000) if s % 3 == 0 or s % 5 == 0]
sum(squares)

This does the job as you’d expect. It creates a list, puts all of the required numbers in it, then calculates the sum of those numbers.

What could be wrong with that?

Try starting a performance monitor and then running this code interactively, and you’ll soon see exactly what’s wrong with it: memory usage. Every number in the list has to be stored, and as they’re created during the initialization of the list the memory usage increases dramatically. If you’re lucky, your system will have enough memory to hold all of these numbers. If you’re not, the Python process will run out of memory and crash, taking any other data with it.

An alternative way to solve the problem is to use a generator, like this:

squares = (s * s for s in range(1, 500000000) if s % 3 == 0 or s % 5 == 0)
sum(squares)

The only syntactical difference between the examples is that the list comprehension uses square brackets to create a list, whereas the second example uses regular parentheses to create a generator.

Simply changing the types of brackets in this case results in a complete change of behaviour. Instead of using a huge amount of memory, the generator simply returns values as they are requested by the caller. This is much slower than using a list in this example, but it’s balanced by using very little memory. As with any other software, you need to balance out speed vs memory to decide which option to use.

What About Asynchronous Code?

There’s always someone who asks about asynchronous code, but with generators it’s easy. If you want to create an asynchronous generator function, since Python 3.6 all you have to do is use yield in the same way as with a synchronous generator. This means you can write generators that work with asynchronous functions easily and effectively.

Summary

Generators are a very useful concept, especially in Python. They save memory and processing power because you usually use lazy evaluation with them, so values are only calculated when you need them.

They can be implemented using iterators or generator functions – whichever suits your requirements best. Once you start using them, they’ll make your code much simpler and easier to understand.

Made In YYC

Made In YYC
Made In YYC

Hosted in Canada by CanSpace Solutions