Value constructors expressions

The page Types and patterns presents the different kind of values: scalar constant (integers, characters, atoms), structured values (pairs, records, sequences, XML elements), and functional values (abstractions). Value themselves are expressions, and the value constructors for structured values operate also on expressions.

This page presents the other kinds of expressions in the language.

Pattern matching

A fundamental operation in CDuce is pattern matching:

match e with
 | p1 -> e1
...
 | pn -> en

The first vertical bar | can be omitted. The semantics is to try to match the result of the evaluation of e successively with each pattern pi. The first matching pattern triggers the corresponding expression in the right hand side, which can use the variables bound by the pattern. Note that a first match policy, as for the disjunction patterns.

The static type system ensures that the pattern matching is exhaustive: the type computed for e must be a subtype of the union of the types accepted by all the patterns.

Local definition is a lighter notation for a pattern matching with a single branch:

let p = e1 in e2

is equivalent to:

match e1 with p -> e2

Note that the pattern p need not be a simple capture variable.

Functions

Abstraction

The general form for a function expression is:

fun f (t1 -> s1; ...; tn -> sn)
 | p1 -> e1
...
 | pm -> em

The first line is the interface of the function, and the remaining is the body, which is a form of pattern matching (the first vertical bar | can thus be omitted).

The identifier f is optional; it is useful to define a recursive function (the body of the function can use this identifier to refer to the function itself).

The interface of the function specifies some constraints on the behavior of the function. Namely, when the function receive an argument of type, say ti, the result (if any) must be of type si. The type system ensures this property by type-checking the body once for each constraint.

The function operate by pattern-matching the argument (which is a value) exactly as for standard pattern matching. Actually, it is always possible to add a line x -> match x with between the interface and the body without changing the semantics.

When there is a single constraint in the interface, there is an alternative notation, which is lighter for several arguments (that is, when the argument is a tuple):

fun f (p1 : t1, ..., pn : tn) : s = e

(note the blank spaces around the colons which are mandatory when the pattern is a variable [1]) which is strictly equivalent to:

fun f ((t1,...,tn) -> s) (p1,...,pn) -> e

It is also possible to define currified functions with this syntax:

fun f (p1 : t1, ..., pn : tn) (q1 : s1, ..., qm : sm) ... : s = e

which is strictly equivalent to:

fun f ((t1,...,tn) -> (s1,...,sm) -> ... -> s) 
 (p1,...,pn) -> 
  fun ((s1,...,sm) -> ... -> s)
   (q1,...,qm) -> 
     ...
     e

The standard notation for local binding a function is:

let f = fun g (...) ... in ...

Here, f is the "external" name for the function, and g is the "internal" name (used when the function needs to call itself recursively, for instance). When the two names coincide (or when you don't need an internal name), there are lighter notations:

let fun f (...) ... in ...
let f (...) ... in ...

Application

The only way to use a function is ultimately to apply it to an argument. The notation is simply a juxtaposition of the function and its argument. E.g.:

(fun f (x : Int) : Int = x + 1) 10

evaluates to 11. The static type system ensures that applications cannot fail.

Note that even if there is no functional "pattern" in CDuce, it is possible to use in a pattern a type constraint with a functional type, as in:

fun (Any -> Int)
 | f & (Int -> Int) -> f 5 
 | x & Int -> x
 | _ -> 0

Exceptions

The following construction raises an exception:

raise e

The result of the evaluation of e is the argument of the exception.

It is possible to catch an exception with an exception handler:

try e with
 | p1 -> e1
...
 | pn -> en

Whenever the evaluation of e raises an exception, the handler tries to match the argument of the exception with the patterns (following a first-match policy). If no pattern matches, the exception is propagated.

Note that contrary to ML, there is no exception name: the only information carried by the exception is its argument. Consequently, it is the responsibility of the programmer to put enough information in the argument to recognize the correct exceptions. Note also that a branch (`A,x) -> e in an exception handler gives no static information about the capture variable x (its type is Any). Note: it is possible that the support for exceptions will change in the future to match ML-like named exceptions.

Record operators

There are three kinds of operators on records:

  • Field projection:
    e.l
    where l is the name of a label which must be present in the result of the evaluation of e. This construction is equivalent to: match e with { l = x } -> x. It is necessary to put whitespace between the expression and the dot when the expression is an identifier.
  • Record concatenation:
    e1 + e2
    The two expressions must evaluate to records, which are merged together. If both have a field with the same name, the one on the right have precedence. Note that the operator + is overloaded: it also operates on integers.
  • Field suppression:
    e \ l
    deletes the field l in the record resulting from the evaluation of e whenever it is present.

Arithmetic operators

Binary arithmetic operators on integers: +,-,*,div,mod. Note that / is used for projection and not for division.

The operator +,- and * are typed using simple interval arithmetic. The operators div and mod produce a warning at compile type if the type of there second argument include the integer 0.

The type Float represents floating point numbers. An operator float_of: String -> Float is provided to create values of this type. Currently, no other operator are provided for this type (but you can use OCaml functions to work on floats).

Generic comparisons, if-then-else

Binary comparison operators (returns booleans): =,<<,<=,>>,>=. Note that < is used for XML elements and is this not available for comparison.

The semantics of the comparison is not specified when the values contain functions. Otherwise, the comparison gives a total ordering on CDuce values. The result type for all the comparison operators is Bool, except for equality when the arguments are known statically to be different (their types are disjoint); in this case, the result type is the singleton `false.

The if-then-else construction is standard:

if e1 then e2 else e3

and is equivalent to:

match e1 with `true -> e2 | `false -> e3

Note that the else-clause is mandatory.

The infix operators || and && denote respectively the logical or and the logical and. The prefix operator not denotes the logical negation.

Upward coercions

It is possible to "forget" that an expression has a precise type, and give it a super-type:

(e : t)

The type of this expression if t, and e must provably have this type (it can have a subtype). This "upward coercion" can be combined with the local let binding:

let p : t = e in ...

which is equivalent to:

let p = (e : t) in ...

Note that the upward coercion allows earlier detection of type errors, better localization in the program, and more informative messages.

CDuce also have a dynamic type-check construction:

(e :? t)
let p :? t = e in ...

If the value resulting from the evaluation of e does not have type t, an exception whose argument (of type Latin1) explains the reason of the mismatch is raised.

Sequences

The concatenation operator is written @. There is also a flatten operator which takes a sequence of sequences and returns their concatenation.

There are two built-in constructions to iterate over a sequence. Both have a very precise typing which takes into account the position of elements in the input sequence as given by its static type. The map construction is:

map e with
 | p1 -> e1
...
 | pn -> en

Note the syntactic similarity with pattern matching. Actually, map is a pattern matching form, where the branches are applied in turn to each element of the input sequence (the result of the evaluation of e). The semantics is to return a sequence of the same length, where each element in the input sequence is replaced by the result of the matching branch.

Contrary to map, the transform construction can return a sequence of a different length. This is achieved by letting each branch return a sequence instead of a single element. The syntax is:

transform e with
 | p1 -> e1
...
 | pn -> en

There is always an implicit default branch _ -> [] at then end of transform, which means that unmatched elements of the input sequence are simply discarded.

Note that map can be simulated by transform by replacing each expression ei with [ ei ].

Conversely, transform can be simulated by map by using the flatten operator. Indeed, we can rewrite transform e with ... as flatten (map e with ... | _ -> []).

XML-specific constructions

Loading XML documents

The load_xml: Latin1 -> AnyXml built-in function parses an XML document on the local file system. The argument is the filename. The result type AnyXml is defined as:

type AnyXml = <(Atom) (Record)>[ (AnyXml|Char)* ]

If the support for netclient or curl is available, it is also possible to fetch an XML file from an URL, e.g.: load_xml "http://...". A special scheme string: is always supported: the string following the scheme is parsed as it is.

There is also a load_html: Latin1 -> [Any*] built-in function to parse in a permissive way HTML documents.

Pretty-printing XML documents

Two built-in functions can be used to produce a string from an XML document:

print_xml: Any -> Latin1
print_xml_utf8: Any -> String

They fail if the argument is not an XML document (this isn't checked statically). The first operator print_xml prepares the document to be dumped to a ISO-8859-1 encoded XML file: Unicode characters outside Latin1 are escaped accordingly, and the operator fails if the document contains tag or attribute names which cannot be represented in ISO-8859-1. The second operator print_xml_utf8 always succeed but produces a string suitable for being dumped in an UTF-8 encoded file. See the variants of the dump_to_file operator in the section on Input/output.

In both cases, the resulting string does not contain the XML prefix "<?xml ...>".

dump_xml: Any -> []
dump_xml_utf8: Any -> []

These functions behave has print_xml and print_xml_utf8 but send the result to the standard output.

Projection

The projection takes a sequence of XML elements and returns the concatenation of all their children with a given type. The syntax is:

e/t

which is equivalent to:

transform e with <_>[ (x::t | _)* ] -> x

For instance, the expression [ <a>[ <x>"A" <y>"B" ] <b>[ <y>"C" <x>"D"] ] / <x>_ evaluates to [ <x>"A" <x>"D" ] .

There is another form of projection to extract attributes:

e/@l

which is equivalent to:

transform e with <_ l=l>_ -> l

The dot notation can also be used to extract the value of the attribute for one XML element:

# <a x=3>[].x;;
- : 3 = 3

Iteration over XML trees

Another XML-specific construction is xtransform which is a generalization of transform to XML trees:

xtransform e with
 | p1 -> e1
...
 | pn -> en

Here, when an XML elements in the input sequence is not matched by a pattern, the element is copied except that the transformation is applied recursively to its content. Elements in the input sequence which are not matched and are not XML elements are copied verbatim.

Unicode Strings

Strings are nothing but sequences of characters, but in view of their importance when dealing with XML we introduced the standard double quote notation. So [ 'F' 'r' 'a' 'n' 'ç' 'e' ] can be written as "Françe". In double quote all the values of type Char can be used: so besides Unicode chars we can also double-quote codepoint-defined characters (\xh; \d; where h and d are hexadecimal and decimal integers respectively), and backslash-escaped characters (\t tab, \n newline, \r return, \\ backslash). Instead we cannot use character expressions that are not values. For instance, for characters there is the built-in function char_of_int : Int -> Char which returns the character corresponding to the given Unicode codepoint (or raises an exception for a non-existent codepoint), and this can only be used with the regular sequence notation, thus "Françe", "Fran"@[(char_of_int 231)]@"e", and "Fran\231;e" are equivalent expressions.

Converting to and from string

Pretty-printing a value

The built-in function string_of: Any -> Latin1 converts any value to a string, using the same pretty-printing function as the CDuce interpreter itself.

Creating and decomposing atoms from strings

The built-in functions split_atom: Atom -> (String,String) and make_atom: (String,String) -> Atom converts between atoms and pair of strings (namespace,local name).

Creating integers from strings

The operator int_of converts a string to an integer. The string is read in decimal (by default) or in hexadecimal (if it begins with 0x or 0X), octal (if it begins with 0o or 0O), or binary (if it begins with 0b or 0B). It fails if the string is not a decimal representation of an integer or if in the case of hexadecimal, octal, and binary representation the integer cannot be contained in 64 bits. There is a type-checking warning when the argument cannot be proved to be of type [ '-'? '0'--'9'+ ] | ['-'? 'O'('b'|'B') '0'--'1'+ ] | ['-'? 'O'('o'|'O') '0'--'7'+ ] | ['-'? 'O'('x'|'X') ('0'--'9'|'a'--'f'|'A'--'F')+] .

Creating characters from integers and characters from integers

Besides the built-in function string_of: Any -> Latin1, it is also possible to create characters, hence strings, from their codepoints: either by enclosing their code within a backslash (\x for hexadecimal code) and a semicolon, or by applying the built-in function char_of_int : Int -> Char. The reverse function int_of_char : Char -> Int is available as well.

Input-output

Displaying a string

To print a string to standard output, you can use one of the built-in function print: Latin1 -> [] or print_utf8: String -> [].

Loading files

There are two built-in functions available to load a file into a CDuce string:

load_file: Latin1 -> Latin1
load_file_utf8: Latin1 -> String

The first one loads an ISO-8859-1 encoded file, whereas the second one loads a UTF-8 encoded file.

If the support for netclient or curl is available, it is also possible to fetch a file from an URL, e.g.: load_file "http://...".

Dumping to files

There are two operators available to dump a CDuce string to a file:

dump_to_file e1 e2
dump_to_file_utf8 e1 e2

The first one creates an ISO-8859-1 encoded file (it fails when the CDuce string contains non Latin1 characters), whereas the second one creates a UTF-8 encoded file. In both cases, the first argument is the filename and the second one is the string to dump.

System

Running external commands

The predefined function system executes an external command (passed to /bin/sh) and returns its standard output and standard error channels and its exit code. The type for system is:

Latin1 -> { stdout = Latin1; stderr = Latin1; 
             status = (`exited,Int) | (`stopped,Int) | (`signaled,Int) |}

Terminating the program

The predefined function exit: 0--255 -> Empty terminates the current process. The argument is the exit code.

Accessing the environment

The built-in function getenv: Latin1 -> Latin1 queries the system environment for an environment variable. If the argument does not refer to an existing variable, the function raises the exception `Not_found.

Command line arguments

The built-in function argv: [] -> [ String* ] returns the sequence of command line arguments given to the current program.

Namespaces

It is possible in expression position to define a local prefix-namespace binding or to set a local default namespace.

namespace p = "..." in e
namespace "..." in e

See XML Namespaces for more details.

Imperative features

The construction ref T e is used to build a reference initialized with the result of the expression e; later, the reference can receive any value of type T. The reference is actually a value of type { get = [] -> T ; set = T -> [] }.

Two syntactic sugar constructions are provided to facilitate the use of references:

!e        ===  e.get []            Dereferencing 
e1 := e2  ===  e1.set e2           Assignment 

An expression of type [] is often considered as a command and followed by another expression. The sequencing operator gives a syntax for that:

e1 ; e2   ===  let [] = e1 in e2   Sequencing

Queries

CDuce is endowed with a select_from_where syntax to perform some SQL-like queries. The general form of select expressions is

select e from
   p1 in e1,
   p2 in e2,
       :
   pn in en
where c

where e is an expression, c a boolean expression, the pi's are patterns, and the ei's are sequence expressions.

It works exactly as a standard SQL select expression, with the difference that relations (that is sequences of tuples) after the in keyword can here be generic sequences, and before the in generic patterns instead of just capture variables can be used. So the result is the sequence of all values obtained by calculating e in the sequence of environments in which the free variables of e are bounded by iteratively matching each pattern pi with every element of the sequence ei, provided that the condition c is satisfied. In other words, the first element of the result is obtained by calculating e in the environment obtained by matching p1 against the first element of e1, p2 against the first element of e2, ... , and pn against the first element of en; the second element of the result is obtained by calculating e in the environment obtained by matching p1 against the first element of e1, p2 against the first element of e2, ..., and pn against the second element of en, ... ; and so on.

Formally, the semantics of the select expression above is defined as:

transform e1 with p1 ->
   transform e2 with p2 ->
         ...
       transform en with pn ->
          if c then  [e] else []

A select expression works like a set of nested transform expressions. The advantage of using select rather than transform is that queries are automatically optimized by applying classical logic SQL optimization techniques (this automatic optimization can be disabled).

The built-in optimizer is free to move boolean conditions around to evaluate them as soon as possible. A warning is issued if a condition does not depend on any of the variables captured by the patterns.

[1] The reason why the blank spaces are mandatory with variables is that the XML recommendation allows colons to occur in variables ("names" in XML terminology: see section on XML Namespaces), so the blanks disambiguate the variables. Actually only the blank on the right hand side is necessary: CDuce accepts fun f (x1 :t1, ..., xn :tn):s = e, as well (see also this paragraph on let declarations in the tutorial).