Identifiers

  • Type and Pattern identifiers: words formed by of Unicode letters and the underscore "_" character, starting by an uppercase letter.
  • value identifiers: words formed by of Unicode letters and the underscore " _" character, starting by a lowercase letter or underscore.

Scalars

  • Large integers:
    • Values: 0,1,2,3,...
    • Types: intervals -*--10, 20--30, 50--*, ..., singletons 0,1,2,3,...
    • Operators: +,-,/,*,div,mod, int_of
  • Floats:
    • Values: none built-in.
    • Types: only Float.
    • Operators: float_of : String -> Float
  • Unicode characters:
    • Values: quoted characters ('a', 'b', 'c', ...,'あ', 'い', ... , '私', ... , '⊆', ...), codepoint-defined characters ('\xh;' '\d;' where h and d are hexadecimal and decimal integers respectively), and backslash-escaped characters ('\t' tab, '\n' newline, '\r' return, '\\' backslash).
    • Types: intervals 'a'--'z', '0'--'9', singletons 'a','b','c',...
    • Operators: char_of_int : Int -> Char, int_of_char : Char -> Int
  • Symbolic atoms:
    • Values: `A, `B, `a, `b, `true, `false, ...
    • Types: singletons `A, `B, ...
    • Operators: make_atom : (String,String) -> Atom, split_atom : Atom -> (String,String)
    • CDuce also supports XML Namespaces

Operators, built-in functions

  • Infix:
    @ : concatenation of sequences
    +,*,-,div,mod : Integer,Integer -> Integer
    =, <<, <=, >>, >= : t,t -> Bool = `true | `false (any non functional type t)
    ||, && : Bool,Bool -> Bool
    not: Bool -> Bool
  • Prefix:
    load_xml : Latin1 -> AnyXml,
    load_html : Latin1 -> [ Any* ],
    load_file : Latin1 -> Latin1,
    load_file_utf8 : Latin1 -> String,
    dump_to_file : Latin1 -> String -> [],
    dump_to_file_utf8 : Latin1 -> String -> [],
    print_xml : Any -> Latin1,
    print_xml_utf8 : Any -> String,
    print : Latin1 -> [],
    print_utf8 : String -> [],
    dump_xml : Any -> [],
    dump_xml_utf8 : Any -> [],
    int_of : String -> Int,
    float_of : String -> Float,
    string_of : Any -> Latin1,
    char_of_int : Int -> Char,
    make_atom : (String,String) -> Atom,
    split_atom : Atom -> (String,String),
    system : Latin1 -> { stdout = Latin1; stderr = Latin1; status = (`exited,Int) | (`stopped,Int) | (`signaled,Int) },
    exit : 0--255 -> Empty,
    getenv : Latin1 -> Latin1,
    argv : [] -> [ String* ],
    raise : Any -> Empty

Pairs

  • Expressions: (e1,e2)
  • Types and patterns: (t1,t2)
  • Note: tuples are right-associative pairs; e.g.: (1,2,3)=(1,(2,3))
  • When a capture variable appears on both side of a pair pattern, the two captured values are paired together (e.g. match (1,2,3) with (x,(_,x)) -> x ==> (1,3)).

Sequences

  • Expressions: [ 1 2 3 ], which is syntactic sugar for (1,(2,(3,`nil)))
  • A sub-sequence can be escaped by !: [ 1 2 ![ 3 4 ] 5 ] is then equal to [ 1 2 3 4 5 ] .
  • Types and patterns : [ R ] where R is a regular expression built on types and patterns:
    • A type or a pattern is a regexp by itself, matching a single element of the sequence
    • Postfix repetition operators: *,+,? and the ungreedy variants (for patterns) *?, +? ,??
    • Concatenation of regexps
    • For patterns, sequence capture variable x::R
  • It is possible to specify a tail, for expressions, types, and patterns; e.g.: [ x::Int*; q ]
  • Map: map e with p1 -> e1 | ... | pn -> en. Each element of e must be matched.
  • Transform: transform e with p1 -> e1 | ... | pn -> en. Unmatched elements are discarded; each branch returns a sequence and all the resulting sequences are concatenated together.
  • Selection: : select e from p1 in e1 ... pn in en where e'. SQL-like selection with the possibility of using CDuce patterns instead of variables. e1 ... en must be sequences and e' a boolean expression.
  • Operators: concatenation e1 @ e2 = [ !e1 !e2 ], flattening flatten e = transform e with x -> x.

Record

  • Records literal { l1 = e1; ...; ln = en }
  • Types: { l1 = t1; ...; ln = tn } (closed, no more fields allowed), { l1 = t1; ...; ln = tn; .. } (open, any other field allowed). Optional fields: li =? ti instead of li = ti. Semi-colons are optional.
  • Record concatenation: e1 + e2 (priority to the fields from the right argument)
  • Field removal: e1 \ l (does nothing if the field l is not present)
  • Field access: e1.l
  • Labels are in fact Qualified Names (see XML Namespaces)

Strings

  • Strings are actually sequences of characters.
  • Expressions: "abc", [ 'abc' ], [ 'a' 'b' 'c' ].
  • Operators: string_of, print, dump_to_file
  • PCDATA means Char* inside regular expressions

XML elements

  • Expressions: <(tag) (attr)>content
  • If the tag is an atom `X, it can be written X (without the (..)). Similarly, parenthesis and curly braces may be omitted when attr is a record l1=e1;...;ln=en (semicolon can also be omitted in this case). E.g: <a href="abc">[ 'abc' ].
  • Types and patterns: same notations.
  • XPath like projection: e/t. For every XML tree in e it returns the sequence of children of type t
  • Tree transformation: xtransform e with p1 -> e1 | ... | pn -> en. Applies to sequences of XML trees. Unmatched elements are left unchanged and the transformation is recursively applied to the sequence of children of the unmatched element; as for transform, each branch returns a sequence and all the resulting sequences are concatenated together.
  • Operators: load_xml : Latin1 -> AnyXml; print_xml : Any -> Latin1; dump_xml : Any -> []

Functions

  • Expressions:
    • General form: fun f (t1->s1;...;tn->sn) p1 -> e1 | ... | pm -> em (f is optional)
    • Simple function: fun f (p : t) : s = e, equivalent to fun f (t -> s) p -> e
    • Multiple arguments: fun f (p1 : t1, p2 : t2,...) : s = e, equivalent to fun f ((p1,p2,...):(t1,t2,...)) : s = e (note the blank spaces around colons to avoid ambiguity with namespaces)
    • Currified function: fun f (p1 : t1) (p2 : t2) ... : s = e (can be combined with the multiple arguments syntax).
  • Types: t -> s

Pattern matching, exceptions, ...

  • Type restriction: (e : t) (forgets any more precise type for e; note the blank spaces around colons to avoid ambiguity with namespaces)
  • Pattern matching: match e with p1 -> e1 | ... | pn -> en
  • Local binding: let p = e1 in e2, equivalent to match e1 with p -> e2; let p : t = e1 in e2 equivalent to let p = (e1 : t) in e2
  • If-then-else: if e1 then e2 else e3, equivalent to match e1 with `true -> e2 | `false -> e3
  • Exceptions:
    • Raise exception: raise e
    • Handle exception: try e with p1 -> e1 | ... | pn -> en

More about types and patterns

  • Boolean connectives: &,|,\ (| is first-match).
  • Empty and universal types: Empty,Any or _.
  • Recursive types and patterns: t where T1 = t2 and ... and Tn = tn.
  • Capture variable: x.
  • Default values: (x := c).

References

  • Type: ref T.
  • Construction: ref T e.
  • Dereferencing: !e1.
  • Assignment: e1 := e2.

Toplevel statements

  • Global expression to evaluate.
  • Global let-binding.
  • Global function declaration.
  • Type declarations: type T = t.
  • Global namespace: namespace p = "...", namespace "...".
  • Source inclusion: include filename_string.
  • Debug directives: debug directive argument
    where directive is one of the following: accept, subtype, compile, sample, filter.
  • Toplevel directives: #env, #quit, #reinit_ns.