Functional Iteration in Go

Go provides a built in way to iterate easily over built-in collection types: maps, slices and chans. Iterating with these types is much like iterating using for-each type loops in languages like Python, Ruby and Java.

for key, value := range map {
    // key, value
}

for index, item := range slice {
    // index, item
}

for item := range channel {
    // item
}

Unfortunately, Go does not provide a way to iterate over user defined collections using range. I have been experimenting with how to solve this problem since the initial public release of Go and I have finally settled on a solution I like. This new solution uses a functional programming style.

Functional Iterators

type Iterator func()(item interface{}, next Iterator)

This iterator is a function. Calling the function yields the next item in the collection and a continuation function pointer. When the continuation function pointer becomes nil, there are no more items left in the collection. Here's an example of how to use a functional iterator from a test case in my data-structures repository:

var data []int = []int{
    1, 5, 7, 9, 12, 13, 17, 18, 19, 20,
}

j := 0
for k, v, next := tree.Iterate()(); next != nil; k, v, next = next() {
    if !k.Equals(types.Int(data[j])) {
        t.Error("Wrong key")
    }
    if v.(int) != j {
        t.Error("Wrong value")
    }
    j += 1
}

The functional approach allows the user to have a nice "linked-list" style interface while avoiding the problems with using channels to implement iteration. Using channels make iterators quite easy to write, consider this post-order traversal:

package main

import "fmt"

type Node struct {
    label int
    children []Node
}

func (self *Node) String() string {
    return fmt.Sprintf("<%v %v>", self.label, self.children)
}

func (self *Node) PostOrder() (<-chan int) {
    ch := make(chan int)
    go func(yield chan<- int) {
        var iter func(node *Node)
        iter = func(node *Node) {
            if node == nil {
                return
            }
            for _, child := range node.children {
                iter(&child)
            }
            yield<-node.label
        }
        iter(self)
        close(yield)
    }(ch)
    return ch
}

func main() {
    root := &Node{5, []Node{Node{3, []Node{Node{1, nil}, Node{4, nil}}}, Node{7, []Node{Node{6, nil}, Node{10, []Node{Node{8, nil}, Node{12, nil}}}}}}}
    fmt.Println(root)
    for i := range root.PostOrder() {
        fmt.Println(i)
    }
}

(try it on the playground http://play.golang.org/p/kUGDQvWhkb)

However, what happens if the consumer of the iterator needs to bail out early? Like so:

for i := range root.PostOrder() {
    fmt.Println(i)
    if i % 2 == 0 {
        break
    }
}

In this case the go-routine PostOrder is running in will leak. Since, it will be blocked on sending to the consumer who will never consume the item. The fix for this is to make the API more complicated by adding two way communication such that the consuming thread can indicate to the producing thread it no longer wants any more items. This can be a significant complication as it would be in the cute little post-order traversal presented above.

Functional Iteration to the Rescue (almost)

Functional iteration solves several of the problems with iterating using channels while maintaining an easy to use interface. Because all the code runs on the same go-routine no go-routines can leak. If a consumer stops using an iterator the garbage collector can clean it up. In the PostOrder traversal above the channel can leak because the producing go-routine maintains a reference. However, functional iterators are a bit tricky to write. Let's look at a few from the data-structures repository.

Hash Table Iteration.

func (self *Hash) Iterate() KVIterator {
    table := self.table
    i := -1
    var e *entry
    var kv_iterator KVIterator
    kv_iterator = func()(key Equatable, val interface{}, next KVIterator) {
        for e == nil {
            i++
            if i >= len(table) {
                return nil, nil, nil
            }
            e = table[i]
        }
        key = e.key
        val = e.value
        e = e.next
        return key, val, kv_iterator
    }
    return kv_iterator
}

How does the function work? The first thing to notice is the Iterate function returns a function kv_iterator which closes over some state: table, i and e. The table variable is the array backing the hash table. The i variable refers to which bucket is currently being examined. Finally, the e variable refers to which entry in the bucket is next to be returned to the consumer.

When the consumer calls the kv_iterator function for the first time e will be nil and the top loop will run until and entry is found or it has run through the entire table. When and entry is found, the key and value are stored and e is set to e.next. This assignment modifies the closed state of kv_iterator and will be preserved on the next call to kv_iterator. Finally, key, val and kv_iterator are returned to the consumer.

The main thing to understand about the kv_iterator function and all iterators of this style is they implement a tail-recursive approach to iteration. Thus, just like writing a tail-recursive function in Scheme or Lisp one must write the iterator such that no actions need to be taken after the data is returned to the user. This sometimes gets tricky, for instance in post-order iteration.

Post Order Iteration

package main

import "fmt"

type Node struct {
    label int
    children []Node
}

type IntIterator func()(label int, next IntIterator)

func (self *Node) String() string {
    return fmt.Sprintf("<%v %v>", self.label, self.children)
}

func (self *Node) PostOrder() IntIterator {
    type entry struct {
        tn *Node
        i int
    }

    pop := func (stack []entry) ([]entry, *Node, int) {
        if len(stack) <= 0 {
            return stack, nil, 0
        } else {
            e := stack[len(stack)-1]
            return stack[0:len(stack)-1], e.tn, e.i
        }
    }

    stack := append(make([]entry, 0, 10), entry{self, 0})

    var iterator IntIterator
    iterator = func()(label int, next IntIterator) {
        var i int
        var tn *Node

        if len(stack) <= 0 {
            return 0, nil
        }

        stack, tn, i = pop(stack)
        if tn == nil {
            return 0, nil
        }
        for i < len(tn.children) {
            kid := &tn.children[i]
            stack = append(stack, entry{tn, i+1})
            tn = kid
            i = 0
        }
        return tn.label, iterator
    }
    return iterator

}

func main() {
    root := &Node{5, []Node{Node{3, []Node{Node{1, nil}, Node{4, nil}}}, Node{7, []Node{Node{6, nil}, Node{10, []Node{Node{8, nil}, Node{12, nil}}}}}}}
    fmt.Println(root)
    for i, next := root.PostOrder()(); next != nil; i, next = next() {
        fmt.Println(i)
    }
}

(try it out yourself on the playground http://play.golang.org/p/M4ivyGL7GM)

This time, the iterator must keep a stack of items in order to function. The top entry contains the next item to process and location in its child list next to process. The function pops a node, tn, and child index, i, off of the stack. If the node, tn, has unprocessed children it gets the next child, and pushes tn back onto the stack (and increments i). The child then becomes tn. It continues traversing down the left most side of the tree until it gets to a leaf node. At that point, the function exits the loop and returns the final value for tn and the tn_iterator.

The next time the function is called, it will pop the stack and find the parent of the last node it returned to the consumer. If there are more children of parent left to processed it will repeat the process described above. Otherwise, the parent gets returned to the consumer and the stack shrinks.

Conclusion

Functional iterators represent a flexible way to implement generic iterators in Go. They are easy to use and do not leak resources. There only draw back in comparison to other approaches is greater difficulty in constructing the iterators. In order solve this libraries of data-structures and iterator generators should be constructed.

Hackthology

The Discourses of Tim Henderson