First functions

A first example of transformation is names, which extracts the sequences of all names of parents in a ParentBook element:

let names (ParentBook -> [Name*])
    <parentbook>x -> (map x with <person ..>[ n  _*] -> n)

The name of the transformation is followed by an interface that states that names is a function from ParentBook elements to (possibly empty) sequences of Name elements. This is obtained by matching the argument of the function against the pattern

<parentbook>x 

which binds x to the sequence of person elements forming the parentbook. The operator map applies to each element of a sequence (in this case x) the transformation defined by the subsequent pattern matching. Here map returns the sequence obtained by replacing each person in x by its Name element. Note that we use the pattern

<person ..>[ n _*], 

to match the person elements: n matches (and captures) the Name element-that is, the first element of the sequence-, _* matches (and discards) the sequence of elements that follow, and person matches the tag of the person. Since elements of type Person contain attributes (actually, just the attribute gender) then we use .. to match (and discard) them. This is not necessary for the parentbook elements, but we could have specified it as well as <parentbook ..>x since .. matches any sequence of attibutes, the empty one as well.

The interface and the type definitions ensure that the tags will be the expected ones, so we could optimize the code by defining a body that skips the check of the tags:

<_> x -> (map x with <_ ..>[ n _*] -> n)

However this optimization would be useless since it is already done by the implementation (for technical details see this paper) and, of course, it would make the code less readable. If instead of extracting the list of all parents we wanted to extract the sublist containing only parents with exactly two children, then we had to replace transform for map:

let names2 (ParentBook -> [Name*])
   <parentbook> x -> 
      transform x with <person ..>[ n <children>[Person Person] _*] -> [n]

While map must be applicable to all the elements of a sequence, transform filters only those that make its pattern succeed. The right-hand sides return sequences which are concatenated in the final result. In this case transform returns the names only of those persons that match the pattern <person ..>[ n <children>[Person Person] _*]. Here again, the implementation compiles this pattern exactly as <_ ..>[ n <_>[_ _] _*], and in particular avoids checking that sub-elements of <children> are of type Person when static-typing enforces this property.

These first examples already show the essence of CDuce's patterns: all a pattern can do is to decompose values into subcomponents that are either captured by a variable or checked against a type.

The previous functions return only the names of the outer persons of a ParentBook element. If we want to capture all the name elements in it we have to recursively apply names to the sequence of children:

let names (ParentBook -> [Name*])
   <parentbook> x -> transform x with 
         <person ..> [ n  <children>c  _*] -> [n]@(names <parentbook>c)

where @ denotes the concatenation of sequences. Note that in order to recursively call the function on the sequence of children we have to include it in a ParentBook element. A more elegant way to obtain the same behavior is to specify that names can be applied both to ParentBook elements and to Children elements, that is, to the union of the two types denoted by (ParentBook|Children):

let names ( ParentBook|Children -> [Name*] )
   <_>x -> transform x with <person ..>[ n  c  _*] -> [n]@(names c)

Note here the use of the pattern <_> at the beginning of the body which makes it possible for the function to work both on ParentBook and on Children elements.

Regular Expressions

In all these functions we have used the pattern _* to match, and thus discard, the rest of a sequence. This is nothing but a particular regular expression over types. Type regexps can be used in patterns to match subsequences of a value. For instance the pattern <person ..>[ _ _ Tel+] matches all person elements that specify no Email element and at least one Tel element. It may be useful to bind the sequence captured by a (pattern) regular expression to a variable. But since a regexp is not a type, we cannot write, say, x&Tel+. So we introduce a special notation x::R to bind x to the sequence matched by the type regular expression R. For instance:

let domain (Email ->String) <_>[ _*?  d::(Echar+ '.' Echar+) ] -> d

returns the last two parts of the domain of an e-mail (the *? is an ungreedy version of *, see regular expressions patterns). If these ::-captures are used inside the scope of the regular expression operators * or +, or if the same variable appears several times in a regular expression, then the variable is bound to the concatenation of all the corresponding matches. This is one of the distinctive and powerful characteristics of CDuce, since it allows to define patterns that in a single match capture subsequences of non-consecutive elements. For instance:

type PhoneItem = {name = String; phones = [String*] }
let agendaitem (Person -> PhoneItem)
    <person ..>[<name>n  _  (t::Tel | _)*] ->
        { name = n ; phones = map t with <tel ..> s ->s }

transforms a person element into a record value with two fields containing the element's name and the list of all the phone numbers. This is obtained thanks to the pattern (t::Tel | _)* that binds to t the sequence of all Tel elements appearing in the person. By the same rationale the pattern

( w::<tel kind="work">_ | t::<tel kind=?"home">_ | e::<email>_ )*

partitions the (Tel | Email)* sequence into three subsequences, binding the list of work phone numbers to w, the list of other numbers to t, and the list of e-mails to e. Alternative patterns | follow a first match policy (the second pattern is matched only if the first fails). Thus we can write a shorter pattern that (applied to (Tel|Email)* sequences) is equivalent:

( w::<tel kind="work">_ | t::Tel | e::_ )*

Both patterns are compiled into

( w::<tel kind="work">_ | t::<tel ..>_ | e::_)*

since checking the tag suffices to determine if the element is of type Tel.

Storing phone numbers in integers rather than in strings requires minimal modifications. It suffices to use a pattern regular expression to strip off the possible occurrence of a dash:

let agendaitem2 (Person -> {name=String; phones=[Int*]})
  <person ..>[ <name>n  _  (t::Tel|_)* ] ->
      { name = n; phones = map t with <tel ..>[(s::'0'--'9'|_)*] -> int_of s }

In this case s extracts the subsequence formed only by numerical characters, therefore int_of s cannot fail because s has type [ '0'--'9'+ ] (otherwise, the system would have issued a warning) (Actually the type system deduces for s the following type [ '0'--'9'+ '0'--'9'+] (subtype of the former) since there always are at least two digits).

First use of overloading

Consider the type declaration

type PhoneBook = <phonebook>[PhoneItem*]

If we add a new pattern matching branch in the definition of the function names, we make it work both with ParentBook and PhoneBook elements. This yields the following overloaded function:

let names3 (ParentBook -> [Name*] ; PhoneBook -> [String*])    
      | <parentbook> x -> (map x with <person ..>[ n  _* ] -> n)
      | <phonebook> x -> (map x with { name=n ; .. } -> n) 

The overloaded nature of names3 is expressed by its interface, which states that when the function is applied to a ParentBook element it returns a list of names, while if applied to a PhoneBook element it returns a list of strings (in the pattern { name=n ; .. } the two points play the same role as in tags: they match and discard any other field of the record). We can factorize the two branches in a unique alternative pattern:

let names4 (ParentBook -> [Name*] ; PhoneBook -> [String*])    
     <_> x -> map x with ( <person ..>[ n  _* ] | { name=n ; ..} ) -> n

The interface ensures that the two representations will never mix.

If you try to enter the code for functions name3 and name4 in the online interpreter (after having entered the corresponing type declarations) the system will answer

val names4 : PhoneBook -> [ String* ] & ParentBook -> [ Name* ] = <fun>
val names3 : PhoneBook -> [ String* ] & ParentBook -> [ Name* ] = <fun>

Ok.

In words, the system confirms that both functions have the intersection type(PhoneBook -> [ String* ]) & (ParentBook -> [ Name* ]), that is, that they have both the types that compose the intersection, since when they are applied to a PhoneBook expressions return a possibly empty sequence of strings and when they are applied to a ParentBook expression they returns a possible empty sequence of Name values.

You can cut and paste the code on this page and test it on the online interpreter.