Thursday, January 27, 2011

Clojure's PersistentVector in F#

Clojure rocks. One of the attractive features are its efficient pure functional data structures. Most operations are effectively O(1).

Since F# is making more headway in my industry than Clojure is, I'm spending my sparse spare time studying F#. F# is great but the data structures aren't as great as the ones in Clojure. I'm sure that the next iteration of the powerpack or some extra library from Microsoft will have like super-awesome pure functional data structures. But for now, F# folks like me have to work around the lack of random access in linked lists or settle for the O(logn) with the built-in map.

As an effort to gain deeper understanding in both Clojure and F#, I decided to try implementing the Clojure data structures in F#. I'm starting with the persistent vector and I've put my initial work on github. So I started by googling to find out more about the design of the Clojure vector and finding out if anyone has already done this in some functional language. Here are some of the useful links I've found:

  • Understanding Clojure’s PersistentVector implementation
  • 20 Days of Clojure : Day 7
  • PersistentVector source code

    One departure from the Rich's original implementation is that I've implemented the trie/tree with discriminated union instead of an object array of object arrays.

    module Vector =
        // discriminated union
        type Tree<'a> = Leaf of 'a[] | Branch of Tree<'a>[]
        // ...
        type PersistentVector<'T>(cnt: int, shift: int, root: Tree<'T>, tail: 'T[]) =
        // ...
    

    Compare that to top of Rich's original and it gives you an idea of how to interpret the F# port.

    public class PersistentVector extends APersistentVector{
    final int cnt;
    final int shift;
    final Object[] root;
    final Object[] tail;
    // ...
    
    I also set index map width to a small value so that I could verify that structure is getting built properly.

    let blockSizeShift = 2 // is 5 in Rich's original implementation
        let blockSize = 1 <<< blockSizeShift
        let blockIndexMask = blockSize - 1
    
    The pushTail method was made into a function that returns a tuple with both return values. The original returned the new root and modified a simulated ref parameter with an expansion node/leaf if any.
    let rec pushTail<'T> level (root : Tree<'T>) (tail : 'T[]) : Tree<'T> * Tree<'T> option =
            match root with
                | Branch(arr) ->
                    if level = 0 then
                        addChild arr (Leaf(tail))
                    else
                        let lastchild = arr.[arr.Length - 1]
                        let (newlastchild, expansion) = pushTail (level - blockSizeShift) lastchild tail
                        match expansion with
                            | Some(xs) -> addChild arr xs
                            | _ -> 
                                let newarr = Array.copy arr
                                newarr.[newarr.Length - 1] <- newlastchild
                                (Branch(newarr), None)
                | _ -> failwith "root must be a Branch"
    
    Here's a short demo of what's been implemented so far.
    > open Clojurish.Vector;;
    > vector [1..16];;
    val it : int vector = seq [1; 2; 3; 4; ...]
    > let v = vector [0..255];;
    
    val v : int vector
    
    > v.[25];;
    val it : int = 25
    > v.assocN(25,52).[25];;
    val it : int = 52
    > v.assocN(0,111);;
    val it : PersistentVector< int> = seq [111; 1; 2; 3; ...]
    > v.Tail;;
    val it : int [] = [|252; 253; 254; 255|]
    

    I think it's time I found a unit testing library for F# so I can give this some proper testing. But it looks like it's doing the right thing so far. I'll probably implement more methods on it before I move onto the persistent hash map.
  • No comments:

    Post a Comment