If you need to generator example click Generator Examples
I was going to post "read page 19 of Beazley's 'Python: Essential Reference' for a quick description of generators", but so many others have posted good descriptions already.
Also, note that yield can be used in coroutines as the dual of their use in generator functions. Although it isn't the same use as your code snippet, (yield) can be used as an expression in a function. When a caller sends a value to the method using the send() method, then the coroutine will execute until the next (yield) statement is encountered.
Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the yield statement in functions.
It's returning a generator. I'm not particularly familiar with Python, but I believe it's the same kind of thing as C#'s iterator blocks if you're familiar with those.
There's an IBM article which explains it reasonably well (for Python) as far as I can see.
The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values - as if the generator method was paused. Now obviously you can't really "pause" a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.
I feel like I post a link to this presentation every day: David M. Beazly's Generator Tricks for Systems Programmers. If you're a Python programmer and you're not extremely familiar with generators, you should read this. It's a very clear explanation of what generators are, how they work, what the yield statement does, and it answers the question "Do you really want to mess around with this obscure language feature?"
SPOILER ALERT. The answer is: Yes. Yes, you do.
The yield keyword is reduced to two simple facts:
If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. Instead, it immediately returns a lazy "pending list" object called a generator
A generator is iterable. What is an iterable? It's anything like a list or set or range or dict-view, with a built-in protocol for visiting each element in a certain order.
In a nutshell: a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out.
generator = myYieldingFunction(...)
x = list(generator)
[x, ..., ???]
[x, x, ..., ???]
[x, x, x, ..., ???]
[x, x, x] done
list==[x, x, x]
Let's define a function makeRange that's just like Python's range. Calling makeRange(n) RETURNS A GENERATOR:
# return 0,1,2,...,n-1
i = 0
while i < n:
i += 1
<generator object makeRange at 0x19e4aa0>
To force the generator to immediately return its pending values, you can pass it into list() (just like you could any iterable):
[0, 1, 2, 3, 4]
Comparing example to "just returning a list"
The above example can be thought of as merely creating a list which you append to and return:
# list-version # # generator-version
def makeRange(n): # def makeRange(n):
"""return [0,1,2,...,n-1]""" #~ """return 0,1,2,...,n-1"""
TO_RETURN =  #>
i = 0 # i = 0
while i < n: # while i < n:
TO_RETURN += [i] #~ yield i
i += 1 # i += 1
return TO_RETURN #>
[0, 1, 2, 3, 4]
There is one major difference though; see the last section.
How you might use generators
An iterable is the last part of a list comprehension, and all generators are iterable, so they're often used like so:
>>> [x+10 for x in makeRange(5)]
[10, 11, 12, 13, 14]
To get a better feel for generators, you can play around with the itertools module (be sure to use chain.from_iterable rather than chain when warranted). For example, you might even use generators to implement infinitely-long lazy lists like itertools.count(). You could implement your own def enumerate(iterable): zip(count(), iterable), or alternatively do so with the yield keyword in a while-loop.
Please note: generators can actually be used for many more things, such as implementing coroutines or non-deterministic programming or other elegant things. However, the "lazy lists" viewpoint I present here is the most common use you will find.
Behind the scenes
This is how the "Python iteration protocol" works. That is, what is going on when you do list(makeRange(5)). This is what I describe earlier as a "lazy, incremental list".
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
The built-in function next() just calls the objects .next() function, which is a part of the "iteration protocol" and is found on all iterators. You can manually use the next() function (and other parts of the iteration protocol) to implement fancy things, usually at the expense of readability, so try to avoid doing that...
Normally, most people would not care about the following distinctions and probably want to stop reading here.
In Python-speak, an iterable is any object which "understands the concept of a for-loop" like a list [1,2,3], and an iterator is a specific instance of the requested for-loop like [1,2,3].__iter__(). A generator is exactly the same as any iterator, except for the way it was written (with function syntax).
When you request an iterator from a list, it creates a new iterator. However, when you request an iterator from an iterator (which you would rarely do), it just gives you a copy of itself.
Thus, in the unlikely event that you are failing to do something like this...
> x = myRange(5)
[0, 1, 2, 3, 4]
... then remember that a generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. Those who absolutely need to clone a generator (for example, who are doing terrifyingly hackish metaprogramming) can use itertools.tee if absolutely necessary, since the copyable iterator Python PEP standards proposal has been deferred.
There's one extra thing to mention: a function that yields doesn't actually have to terminate. I've written code like this:
last, cur = 0, 1
last, cur = cur, last + cur
Then I can use it in other code like this:
for f in fib():
if some_condition: break
It really helps simplify some problems, and makes some things easier to work with.
yield is just like return. It returns whatever you tell it to. The only difference is that the next time you call the function, execution starts from the last call to the yield statement.
In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.
list.extend calls an iterator until it's exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.
An example in plain language. I will provide a correspondence between high-level human concepts to low-level python concepts.
I want to operate on a sequence of numbers, but I don't want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:
I call you and tell you that I want a sequence of numbers which is produced in a specific way, and I let you know what the algorithm is.This step corresponds to defining the generator function, i.e. the function containing a yield.
Sometime later, I tell you, "ok, get ready to tell me the sequence of numbers".This step corresponds to calling the generator function which returns a generator object. Note that you don't tell me any numbers yet, you just grab your paper and pencil.
I ask you, "tell me the next number", and you tell me the first number; after that, you wait for me to ask you for the next number. It's your job to remember where you were, what numbers you have already said, what is the next number. I don't care about the details.This step corresponds to calling .next() on the generator object.
â¦ repeat previous step, untilâ¦
eventually, you might come to an end. You don't tell me a number, you just shout, "hold your horses! I'm done! No more numbers!"This step corresponds to the generator object ending its job, and raising a StopIteration exception The generator function does not need to raise the exception, it's raised automatically when the function ends or issues a return.
This is what a generator does (a function that contains a yield); it starts executing, pauses whenever it does a yield, and when asked for a .next() value it continues from the point it was last. It fits perfectly by design with the iterator protocol of python, which describes how to sequentially request for values.
The most famous user of the iterator protocol is the for command in python. So, whenever you do a:
for item in sequence:
it doesn't matter if sequence is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.
Note that defining a function which contains a yield keyword is not the only way to create a generator; it's just the easiest way to create one.
For more accurate information, read about iterator types, the yield statement and generators in the Python documentation.
Think of it this way:
An iterator is just a fancy sounding term for an object that has a next() method. So a yield-ed function ends up being something like this:
for i in xrange(4):
for i in some_function():
This is basically what the python interpreter does with the above code:
#start at -1 so that we get 0 when we add 1 below.
self.count = -1
#the __iter__ method will be called once by the for loop.
#the rest of the magic happens on the object returned by this method.
#in this case it is the object itself.
#the next method will be called repeatedly by the for loop
#until it raises StopIteration.
self.count += 1
if self.count < 4:
#a StopIteration exception is raised
#to signal that the iterator is done.
#This is caught implicitly by the for loop.
for i in some_func():
For more insight as to what's happening behind the scenes, the for loop can be rewritten to this:
iterator = some_func()
Does that make more sense or just confuse you more? :)
EDIT: I should note that this IS an oversimplification for illustrative purposes. :)
EDIT 2: Forgot to throw the StopIteration exception
Yield gives you a generator.
return range(1, i, 2)
for x in range(1, i, 2):
foo = get_odd_numbers(10)
bar = yield_odd_numbers(10)
[1, 3, 5, 7, 9]
<generator object yield_odd_numbers at 0x1029c6f50>
As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)
This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.
For those who prefer a minimal working example, meditate on this interactive Python session:
>>> def f():
... yield 1
... yield 2
... yield 3
>>> g = f()
>>> for i in g:
... print i
>>> for i in g:
... print i
>>> # Note that this time nothing was printed
There is one type of answer that I don't feel has been given yet, among the many great answers that describe how to use generators. Here is the PL theory answer:
The yield statement in python returns a generator. A generator in python is a function that returns continuations (and specifically a type of coroutines, but continuations represent the more general mechanism to understand what is going on).
Continuations in programming languages theory are a much more fundamental kind of computation than their utility in practice would give credit for. This is perhaps because they are extremely hard to reason about in the general case and also very difficult to implement. But the idea of what a continuation is, is straightforward: it is the state of a computation that has not yet finished.
To use a continuation basically entails that you start out at one point in your program, where the state of all variables looks like one thing and define the continuation. Then you make some changes, call some functions. Upon applying the continuation, your original state is restored.
Whenever yield is called, what it does is tell the function to return a continuation. When the function is called again, the function starts off in the place that it left off. So the function returns some value, and rebinds the function name to have the value of a continuation that starts off at some arbitrary point in the middle of the function again.
In other words, yield is sort of short for:
where g represents the rest of the generator (e.g. additional instances of yield) and function application actually is syntactic sugar for rebinding the rest of the function to the resulting generator.
as such the function returns both the current value, and the remainder of the computation to be performed. This design pattern is an example of continuation passing style, where the remaining computation is passed as a return argument. I use this notation instead of call/cc and mutation -- where the same behavior is achieved with control flow magic and global variables because it is significantly easier to understand and corresponds more closely to existing language constructs in python.
As a Python generator:
from itertools import islice
a, b = 1, 1
a, b = b, a + b
assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))
Using lexical closures instead of generators
def ftake(fnext, last):
return [fnext() for _ in xrange(last)]
#funky scope due to python2.x workaround
#for python 3.x use nonlocal
_.a, _.b = _.b, _.a + _.b
_.a, _.b = 0, 1
assert [1,1,2,3,5] == ftake(fib_gen2(), 5)
Using object closures instead of generators (because ClosuresAndObjectsAreEquivalent)
self.a, self.b = 1, 1
r = self.a
self.a, self.b = self.b, self.a + self.b
assert [1,1,2,3,5] == ftake(fib_gen3(), 5)
Shortcut to Grokking yield
When you see a function with yield statements, apply this easy trick to understand what will happen:
Insert a line result =  at the start of the function.
Replace each yield expr with result.append(expr).
Insert a line return result at the bottom of the function.
Yay - no more yield statements! Read and figure out code.
Revert function to original definition.
This trick may give you an idea of the logic behind the function, but what actually happens with yield is significantly different that what happens in the list based approach. In many cases the yield approach will be a lot more memory efficient and faster too. In other cases this trick will get you stuck in an infinite loop, even though the original function works just fine. Read on to learn more...
Don't confuse your Iterables, Iterators and Generators
First, the iterator protocol - when you write
for x in mylist:
Python performs the following two steps:
Gets an iterator for mylist:
Call iter(mylist) -> this returns an object with a next() method (or __next__() in Python 3).
[This is the step most people forget to tell you about]
Uses the iterator to loop over items:
Keep calling the next() method on the iterator returned from step 1. The return value from next() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.
The truth is Python performs the above two steps anytime it wants to loop over the contents of an object - so it could be a for loop, but it could also be code like otherlist.extend(mylist) (where otherlist is a Python list).
Here mylist is an iterable because it implements the iterator protocol. In a user defined class, you can implement the __iter__() method to make instances of your class iterable. This method should return an iterator. An iterator is an object with a next() method. It is possible to implement both __iter__() and next() on the same class, and have __iter__() return self. This will work for simple cases, but not when you want two iterators looping over the same object at the same time.
So that's the iterator protocol, many objects implement this protocol:
Built-in lists, dictionaries, tuples, sets, files.
User defined classes that implement __iter__().
Note that a for loop doesn't know what kind of object it's dealing with - it just follows the iterator protocol, and is happy to get item after item as it calls next(). Built-in lists return their items one by one, dictionaries return the keys one by one, files return the lines one by one, etc. And generators return... well that's where yield comes in:
for item in f123():
Instead of yield statements, if you had three return statements in f123() only the first would get executed, and the function would exit. But f123() is no ordinary function. When f123() is called, it does not return any of the values in the yield statements! It returns a generator object. Also, the function does not really exit - it goes into a suspended state. When the for loop tries to loop over the generator object, the function resumes from its suspended state, runs until the next yield statement and returns that as the next item. This happens until the function exits, at which point the generator raises StopIteration, and the loop exits.
So the generator object is sort of like an adapter - at one end it exhibits the iterator protocol, by exposing __iter__() and next() methods to keep the for loop happy. At the other end however, it runs the function just enough to get the next value out of it, and puts it back in suspended mode.
Why Use Generators?
Usually you can write code that doesn't use generators but implements the same logic. One option is to use the temporary list 'trick' I mentioned before. That will not work in all cases, for e.g. if you have infinite loops, or it may make inefficient use of memory when you have a really long list. The other approach is to implement a new iterable class SomethingIter that keeps state in instance members and performs the next logical step in it's next() (or __next__() in Python 3) method. Depending on the logic, the code inside the next() method may end up looking very complex and be prone to bugs. Here generators provide a clean and easy solution.
Here is a mental image of what yield does.
I like to think of a thread as having a stack (even if it's not implemented that way).
When a normal function is called, it puts its local variables on the stack, does some computation, returns and clears the stack. The values of its local variables are never seen again.
With a yield function, when it's called first, it similarly adds its local variables to the stack, but then takes its local variables to a special hideaway instead of clearing them, when it returns via yield. A possible place to put them would be somewhere in the heap.
Note that it's not the function any more, it's a kind of an imprint or ghost of the function that the for loop is hanging onto.
When it is called again, it retrieves its local variables from its special hideaway and puts them back on the stack and computes, then hides them again in the same way.
From a programming viewpoint, the iterators are implemented as thunks
To implement thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to "messages".
"next" is a message sent to a closure, created by "iter" call.
To understand what yield does, you must understand what generators are. And before generators come iterables.
When you create a list, you can read its items one by one, and it's called iteration:
>>> mylist = [1, 2, 3]
>>> for i in mylist:
Mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:
>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
Everything you can use "for... in..." on is an iterable: lists, strings, files...
These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it's not always what you want when you have a lot of values.
Generators are iterators, but you can only iterate over them once. It's because they do not store all the values in memory, they generate the values on the fly:
>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
It is just the same except you used () instead of . BUT, you can not perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.
Yield is a keyword that is used like return, except the function will return a generator.
>>> def createGenerator():
... mylist = range(3)
... for i in mylist:
... yield i*i
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.
To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)
Then, your code will be run each time the for uses the generator.
Now the hard part:
The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.
The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy a "if/else" anymore.
Your code explained
# Here you create the method of the node object that will return the generator
def node._get_child_candidates(self, distance, min_dist, max_dist):
# Here is the code that will be called each time you use the generator object:
# If there is still a child of the node object on its left
# AND if distance is ok, return the next child
if self._leftchild and distance - max_dist < self._median:
# If there is still a child of the node object on its right
# AND if distance is ok, return the next child
if self._rightchild and distance + max_dist >= self._median:
# If the function arrives here, the generator will be considered empty
# there is no more than two values: the left and the right children
# Create an empty list and a list with the current object reference
result, candidates = list(), [self]
# Loop on candidates (they contain only one element at the beginning)
# Get the last candidate and remove it from the list
node = candidates.pop()
# Get the distance between obj and the candidate
distance = node._get_dist(obj)
# If distance is ok, then you can fill the result
if distance <= max_dist and distance >= min_dist:
# Add the children of the candidate in the candidates list
# so the loop will keep running until it will have looked
# at all the children of the children of the children, etc. of the candidate
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
This code contains several smart parts:
The loop iterates on a list but the list expands while the loop is being iterated :-) It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.
The extend() method is a list object method that expects an iterable and adds its values to the list.
Usually we pass a list to it:
>>> a = [1, 2]
>>> b = [3, 4]
[1, 2, 3, 4]
But in your code it gets a generator, which is good because:
You don't need to read the values twice.
You can have a lot of children and you don't want them all stored in memory.
And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples and generators! This is called duck typing and is one of the reason why Python is so cool. But this is another story, for another question...
You can stop here, or read a little bit to see a advanced use of generator:
Controlling a generator exhaustion
>>> class Bank(): # let's create a bank, building ATMs
... crisis = False
... def create_atm(self):
... while not self.crisis:
... yield "$100"
>>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money!
>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs
>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty
>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business
>>> for cash in brand_new_atm:
... print cash
It can be useful for various things like controlling access to a resource.
Itertools, your best friend
The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator?
Chain two generators? Group values in a nested list with a one liner? Map / Zip without creating another list?
Then just import itertools.
An example? Let's see the possible orders of arrival for a 4 horse race:
>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
<itertools.permutations object at 0xb754f1dc>
[(1, 2, 3, 4),
(1, 2, 4, 3),
(1, 3, 2, 4),
(1, 3, 4, 2),
(1, 4, 2, 3),
(1, 4, 3, 2),
(2, 1, 3, 4),
(2, 1, 4, 3),
(2, 3, 1, 4),
(2, 3, 4, 1),
(2, 4, 1, 3),
(2, 4, 3, 1),
(3, 1, 2, 4),
(3, 1, 4, 2),
(3, 2, 1, 4),
(3, 2, 4, 1),
(3, 4, 1, 2),
(3, 4, 2, 1),
(4, 1, 2, 3),
(4, 1, 3, 2),
(4, 2, 1, 3),
(4, 2, 3, 1),
(4, 3, 1, 2),
(4, 3, 2, 1)]
Understanding the inner mechanisms of iteration
Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method).
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.
More about it in this article about how does the for loop work.