ℂDuce: Documentation: Tutorial: Polymorphism in ℂDuce (development only)

Previous page: Higher-order functions Next page: Exercises

What follows is code that can be executed in the current development version but it is not included in the stable version yet. The syntax is not finalized and may change before inclusion in a stable release.

Deep flatten

In ℂDuce there is a flatten operator that transforms a sequence of sequences into a sequence. To show the power of the polymorphic typing of ℂDuce let us define the deep variant of flatten, that we call flatten_d, which recursively flattens all nested sequences into a single flat one. Nested sequences are a convenient representation of trees, whose leaves are values that are not sequences. So let us first define the type of α-trees, that is of trees whose leaves are of type α:
type Tree('a) = ('a\[Any*]) | [ (Tree('a))* ] 
Type variables are identifiers prefixed by a quote (as in OCaml), so here 'a is a type variable (read "alpha") and Tree('a) is the type of trees whose leaves are of type 'a. These are either a leaf (i.e., any 'a value that is not a sequence: 'a\[Any*]) or a (possibly empty) sequence of Tree('a). The deep flatten function can then be defined as
let flatten_d ( (Tree('a)) -> ['a*] )
   | [] -> []
   | [h ; t] -> (flatten_d h)@(flatten_d t)
   | x -> [x]
that is a function that maps any α-tree into the sequence of its leaves (that is, into an α-sequence).
ℂDuce is the only language that can type this function truly polymorphically, in the sense that the function can be applied to any well typed expression. Indeed, notice that it is possible to unify the type Tree('a) with every type (in particular Any is a subtype of Tree(Any)). Therefore flatten_d can be applied to every well-typed expression. So for instance we have:
# flatten_d [ 3 'r' [ 4 [ `true 5 ]] [ "quo" [[ `false ] "stop" ] ] ];;
- : [ (Bool | 3--5 | 'o'--'u')* ] = [ 3 'r' 4 true 5 'quo' false 'stop' ]
notice how 'a was unified with the union of all the types of the leaves of the tree, that is [ (Bool | 3--5 | 'o'--'u')* ]. Without polymorphism the best type we can give to flatten_d would be Any->[(Any/[Any*])*], which is not very useful (it just states that the function returns a sequence that does not contain other sequences).
If we want to define a version of deep flatten that can be applied only to sequences, then this can be done by the following further definition
let flatten_tree (x: [(Tree('a))*]) : ['a*] = flatten_d x;; 
which restricts the applicability of the function only to sequences of α-trees.

Partial applications

ℂDuce type inference is pretty sofisticated. While in ML-like languages unification suffices to type the application of a function to an argument, to type the same application in ℂDuce requires to combine unification with subtyping and expansion. As an example consider the following definition
let even (Int -> Bool; ('a \ Int) -> ('a \ Int)) 
   | x&Int -> x mod 2 = 0 
   | x -> x;;
even is a function that applied to an integer it returns a Boolean (i.e., whether the integer is even or not) and applied to an argument of any other type, returns a result of the same type (i.e., the argument itself). Now take the classic functional map function which can be defined as follows (alternatively in ℂDuce you can call Stdlib.List.map function of the OCaml library):
let fmap (f : 'a -> 'b) (l : ['a*]) : ['b*] = 
  match l with 
    | [] -> [] 
    | (a,b) -> (f a, fmap f b)
of type ('a -> 'b) -> ['a*] -> ['b*]. For the partial application fmap even ℂDuce returns the following type:
# fmap even;;
- : ([ Int* ] -> [ Bool* ]) &
    ([ ('a \ Int)* ] -> [ ('a \ Int)* ]) &
    ([  ] -> [  ]) &
    ([ (Int | ('b \ Int))* ] -> [ (Bool | ('b \ Int))* ]) = <fun> 
This type states that (fmap even) is a function that (in the order): applied to a list of integers it returns a list of Booleans; applied to a list that does not contain integers it returns a list of the same type (actually, the same list); applied to the empty list returns the empty list; applied to a list that may contain some integers and other elements that are not integers (of type 'a \Int) it returns a list that contains some Booleans (that replaced all the integers of the list) and the other elements that are not integers. To obtain this high degree of precision the type system of ℂDuce has instantiated type type of fmap in four different ways (corresponding to all the possibilities of instantiation of the type 'a -> 'b with the type of even), computed the results of the applications, and taken their intersection.

While the definition of even is somewhat artificial, we show next that its type is not strange, since it follows a common pattern that, for instance, is used to type the balance function we present next.

Red-Black trees

As a second example we show that the use of polymorphic set-theoretic types yields a better definition of Okasaki's implementation of red-black trees that can be found in his excellent monograph [1].
A red and black tree is a colored binary search tree in which all nodes are colored either black or red and that satisfies 4 invariants:
  1. the root of the tree is black;
  2. the leaves of the tree are black;
  3. no red node has a red child;
  4. very path from the root to a leaf contains the same number of black nodes.
Thanks to our type system (and contrary to Okasaki's version) the implementation below ensures by typing that the operations on red-black trees (notably, the insertion) satisfy the first three invariants, as well as, that the ins_aux function, local to insertion, never returns empty trees (yet another important property that, in ML/Haskell Okasaki's version, types cannot ensure). We use α and β to pretty print the type variables'a and 'b respectively. Also we have slightly simplified the code. For the executable code see this file red-black.cd.
type RBtree(α) = Btree(α) | Rtree(α)

(* Black rooted RB tree: *)
type Btree(α) = [] | <black elem=α>[ RBtree(α) RBtree(α) ]

(* Red rooted RB tree: *)
type Rtree(α) = <red elem=α>[ Btree(α) Btree(α) ]

type Wrongtree(α) =    <red elem=α>( [ Rtree(α) Btree(α) ]
                                   | [ Btree(α) Rtree(α) ])

type Unbalanced(α) = <black elem=α>( [ Wrongtree(α) RBtree(α) ]
                                   | [ RBtree(α) Wrongtree(α) ])

let balance ( Unbalanced(α) -> Rtree(α) ; β\Unbalanced(α) -> β\Unbalanced(α) )
  | <black (z)>[ <red (y)>[ <red (x)>[ a b ] c ] d ]
  | <black (z)>[ <red (x)>[ a <red (y)>[ b c ] ] d ]
  | <black (x)>[ a <red (z)>[ <red (y)>[ b c ] d ] ]
  | <black (x)>[ a <red (y)>[ b <red (z)>[ c d ] ] ] ->
        <red (y)>[ <black (x)>[ a b ] <black (z)>[ c d ] ]
  | x -> x

let insert (x : α) (t : Btree(α)) : Btree(α) =
  let ins_aux ( [] -> Rtree(α);
                Btree(α)\[] -> RBtree(α)\[];
                Rtree(α) -> Rtree(α)|Wrongtree(α) )
    | [] -> <red elem=x>[ [] [] ]
    | (<(color) elem=y>[ a b ]) & z ->
           if x << y then balance <(color) elem=y>[ (ins_aux a) b ]
	   else if x >> y then balance <(color) elem=y>[ a (ins_aux b) ]
	   else z
    in match ins_aux t with
       | <_ (y)>[ a b ] -> <black (y)>[ a b ]
We invite the reader to refer to the excellent Okasaki's monograph for details about Okasaki's algorithm — that our code faithfully follows. Let us instead spend some words to comment the types, since they are the real novelty and the added value of our definition. First, notice that we used the full palette of our types: unions, intersections, negations (difference), and type variables. Red-black trees (Btrees) are black-rooted RBtrees (invariant 1), which are themselves black-rooted trees or red-rooted trees. The difference between the last two is that the latter cannot be leaves (invariant 2) and their children can only be black-rooted trees (invariant 3).
The insert function takes an element x of type α, and returns a function that maps red-black trees into red-black trees.
insert :: α -> Btree(α) -> Btree(α)
If the argument tree is empty a red-rooted tree is returned, otherwise the element is inserted in the appropriate subtree and the whole tree is then balanced by the function balance. This has the following type (which follows the same typing pattern as the function even defined in the introduction):
balance :: ( Unbalanced(α) -> Rtree(α) ) & ( β\Unbalanced(α) -> β\Unbalanced(α) )
This type states that balance transforms an unbalanced tree into a (balanced) red-rooted tree and leaves all other trees (in particular the balanced ones) unchanged. The core of our definition is the type of ins_aux:
ins_aux :: ( [] -> Rtree(α) )
         & ( Btree(α)\[] -> RBtree(α)\[] )
         & ( Rtree(α) -> Rtree(α)|Wrongtree(α) )
which precisely describes the behaviour of the function. Notice that the domain of ins_aux (which is the union of the three domains of the arrows forming its intersection type) is exactly RBtree. The intersection type describes the behaviour of ins_aux for each form of an RBtreeie, empty, black-rooted, and red-rooted—. The type system needs the full precision of this type to infer whether the calls to balance in the body of ins_aux are applied to a balanced or an unbalanced tree: even a slight approximation of this type, such as
ins_aux :: ( Btree(α)\[] -> RBtree(α)\[] )
         & ( Rtree(α)|[] -> Rtree(α)|Wrongtree(α) )
makes type-checking fail. By examining the type of ins_aux it is easy to see that ins_aux always returns either a (balanced) black-rooted tree or a tree with a red root in which one of the children may be a Rtree. In case of a tree with a red root, a balanced red-black tree is then obtained by changing the color of the root to black, as it is done in the last line of insert.
The implementation above must be compared with the corresponding version in monomorphic ℂDuce:
type RBtree = Btree | Rtree;;
type Btree = [] | <black elem=Int>[ RBtree RBtree ];;
type Rtree = <red elem=Int>[ Btree Btree ];;
type Wrongtree = Wrongleft | Wrongright;;
type Wrongleft = <red elem=Int>[ Rtree Btree ];;
type Wrongright = <red elem=Int>[ Btree Rtree ];;
type Unbalanced = <black elem=Int>([ Wrongtree RBtree ] | [ RBtree Wrongtree ])
let balance ( Unbalanced -> Rtree ; Rtree -> Rtree ; Btree\[] -> Btree\[] ;
              [] -> [] ; Wrongleft -> Wrongleft ; Wrongright -> Wrongright)
  | <black (z)>[ <red (y)>[ <red (x)>[ a b ] c ] d ]
  | <black (z)>[ <red (x)>[ a <red (y)>[ b c ] ] d ]
  | <black (x)>[ a <red (z)>[ <red (y)>[ b c ] d ] ]
  | <black (x)>[ a <red (y)>[ b <red (z)>[ c d ] ] ] ->
        <red (y)>[ <black (x)>[ a b ] <black (z)>[ c d ] ]
  | x -> x

let insert (x : Int) (t : Btree) : Btree =
let ins_aux ( [] -> Rtree ; Btree\[] -> RBtree\[]; Rtree -> Rtree|Wrongtree)
  | [] -> <red elem=x>[ [] [] ]
  | (<(color) elem=y>[ a b ]) & z ->
         if x << y then balance <(color) elem=y>[ (ins_aux a) b ]
	 else if x >> y then balance <(color) elem=y>[ a (ins_aux b) ]
	 else z
  in match ins_aux t with
     | <_ (y)>[ a b ] -> <black (y)>[ a b ]
which, besides being monomorphic, requires the introduction of several intermediate types (in particular Wrongleft and Wrongright) in order to describe the polymorphic behavior of balance —whose type results, thus, much more obscure—. As far as we know ours types are the only system that can statically ensure the invariants above simply by decorating (with types) the original Okasaki's code without any further modification.

[1] Purely Functional Data Structures, by Chris Okasaki. Cambridge University Press, 1998.

ℂDuce: Documentation: Tutorial: Polymorphism in ℂDuce (development only)

Previous page: Higher-order functions Next page: Exercises