yield, trees and recursion
16 May 2010I have been working with trees quite a bit lately, because I am coding something which involves probability trees: based on the state of the system, there is a number of things which can happen, each with a certain probability.
I ended up writing a simple generic Node
class, which can contain anything, and can have multiple children, along these lines:
public class Node<T>
{
public Node()
{
this.Children = new List<Node<T>>();
}
public T Content
{
get;
set;
}
public List<Node<T>> Children
{
get;
private set;
}
public bool IsLeaf
{
get
{
return (this.Children.Count() == 0);
}
}
}
Pretty quickly, I realized I would need to get the list of all nodes under a certain node, as well as the list of its leaves (a leaf being a node that has no children, i.e. an endpoint of the tree). This is a job tailor-made for recursion: if a node is a leaf, return it, otherwise, search further in all his children.
Something like this would get the job done:
public static IEnumerable<Node<T>> ClassicAllNodes(Node<T> root, IList<Node<T>> nodes)
{
nodes.Add(root);
foreach (var child in root.Children)
{
ClassicAllNodes(child, nodes);
}
return nodes;
}
With my recent interest in things LINQ-related, I wondered if there was a way to use the yield keyword here. Yield provides a way to return items as you are iterating through them, without having to explicitly create a collection to collect them, and can make for more elegant code.
public static IEnumerable<int> FactorialsUnder(int max)
{
var i = 1;
var fact = 1;
while (i <= max)
{
yield return fact;
i++;
fact *= i;
}
}
My first attempt looked like this:
public static IEnumerable<Node<T>> AllNodesFails(Node<T> root)
{
yield return root;
foreach (var child in root.Children)
{
yield return child;
AllNodesFails(child);
}
}
I thought this code looked pretty elegant. Unfortunately, it has a major drawback: it completely fails to enumerate all the nodes (as you probably guessed by the name of the method). You get the root, its children, and that’s it.
After some tinkering, I ended up writing this:
public static IEnumerable<Node<T>> AllNodesWorks(Node<T> root)
{
yield return root;
foreach (var node in root.Children)
{
foreach (var further in AllNodesWorks(node))
{
yield return further;
}
}
}
The good news is that as far as I can tell, this works. The bad news is that I don’t find this any clearer or more elegant than the original code; arguably, the nested foreach loops are awkward looking. In fact, while I have empirical reasons to believe it works, I still have a hard time following what’s happening. Furthermore, searching into StackOverflow (another recursion) yielded interesting discussions like this one and that one; the gist of it being that the “straighforward” approach is inefficient. The trees I am working with right now are fairly small, which makes that issue largely irrelevant. However, replacing clear code by something potentially confusing and inefficient doesn’t seem like such a great deal, so I chalked one up for “Sometimes, doing things the old school way is just fine”.