ℂDuce: Documentation: User's manual: Types and patterns

## Introduction

In CDuce, a type denotes a set of values, and a pattern extracts sub-values from a value. Syntactically, patterns and types are very akin with two notable exceptions: type variables cannot occur in patterns and capture (expression) variables do not occur in types. Indeed, any closed type (i.e., a type without type variables) can be seen as a pattern which accepts any value of that type and extracts nothing, and a pattern without any capture variable is nothing but a closed type.

Moreover, values
also share a common syntax with types and patterns. This is motivated
by the fact that basic and constructed values (that is, any values without
functional values inside) are themselves singleton types.
For instance ** (1,2)** is both a value, a type and a pattern.
As a type, it can be interpreted as a singleton type,
or as a pair type made of two singleton types.
As a pattern, it can be interpreted as a type constraint,
or as a pair pattern of two type constraints.

In this page, we present all the types and patterns that CDuce recognizes. It is also the occasion to present the CDuce values themselves, the corresponding expression constructions, and fundamental operations on them.

## Type variables, instances, and subtyping

Since version 1.0 type variables may occur in types. A type variable is written ** 'xxx** where “

**” is the single quote character and**

`'`**follows the rules for CDuce identifiers. E.g.:**

*xxx***.**

`'a, 'my-type, 'This`
A type **s** is an

*instance*of a type

**if**

*t***can be obtained from**

*s***by applying one or more type substitutions to it, that is, by replacing a given type for all occurrences of a type variable in**

*t***.**

*t*
A type is *closed* if no type variable occurs in it. A type is *open* if it is not closed. Intuitively, an open type denotes the set of all its closed instances and if an expression has an open type **t** it means that it is has all the types that are instances of

**(and, by subsumption, all their super-types, too: see below).**

*t*Type variables cannot occur in patterns.

As any other type, type variables can occur in expressions. For the scoping rules of type variables see Polymorphism, type variables scope.

A type **s** is a

*subtype*of a type

**(equivalently,**

*t***is a**

*t**supertype*of

**) if every value of type**

*s***is also a value of type**

*s***. In that case we say that**

*t***is a supertype of**

*t***. Since a type is a set of values (i.e., the set of all values that have that type), then subtyping corresponds to set containment on types as sets of values. Intuitively, an expression of a given type can be safely used wherever an expression of a super-type is expected: this property is called**

*s**subsumption*.

Type substitutions preserve subtyping: if two types are in a subtyping relation so are all the pair of instances obtained by applying a same type susbtitutions to the two types.

## Capture variables and default patterns

A value identifier inside a pattern behaves as a capture variable: it accepts and binds any value.

Another form of capture variable is the default value pattern
** ( x := c )** where

**is a capture variable (that is, an identifier), and**

*x***is a scalar constant. The semantics of this pattern is to bind the capture variable to the constant, disregarding the matched value (and accepting any value).**

*c*
Such a pattern is useful in conjunction with the first match policy
(see below) to define "default cases". For instance, the pattern
** ((x & Int) | (x := 0), (y & Int) | (y := 0))**
accepts any pair and bind

**to the left component if it is an integer (and**

`x`**otherwise), and similarly for**

`0`**with the right component of the pair.**

`y`## Boolean connectives

CDuce recognizes the full set of boolean connectives, whose interpretation is purely set-theoretic.

denotes the empty type (it contains no value).`Empty`and`Any`denote the universal type (all the values); the preferred notation is`_`for types and`Any`for patterns, but they are strictly equivalent.`_`is the conjunction boolean connective. The type`&`has all the values that belongs to*t1*&*t2*and to*t1*. Similarly, the pattern*t2*accepts all the values accepted by both sub-patterns; a capture variable cannot appear on both side of this pattern.*p1*&*p2*is the disjunction boolean connective. The type`|`has all the values that belongs either to*t1*|*t2*or to*t1*. Similarly, the pattern*t2*accepts all the values accepted by any of the two sub-patterns; if both match, the first match policy applies, and*p1*|*p2*dictates how to capture sub-values. The two sub-patterns must have the same set of capture variables.*p1*is the difference boolean connective. The left hand-side can be a type or a pattern, but the right-hand side is necessarily a type (no capture variable).`\`

**Priorities**

For Boolean connectives we have that the priority of intersection is hiher than unions and lower than differences. The priority of all Boolean connectives is higher than the function type constructor ** ->** and lower than the XML type constructor. Therefore

**is parsed as**

`Int -> Int & Bool -> Bool`**while**

`Int -> (Int & Bool) -> Bool`**is parsed as**

`<a>[Int*]|String/Bool`

`(<a>[Int*])|(String/Bool)`A type with type variables is empty if and only if all its instances are empty

## Type declarations, recursive types and recursive patterns

A set of mutually recursive types can be defined by toplevel type declarations, as in:

type T1 = <a>[ T2* ] type T2 = <b>[ T1 T1 ]

It is also possible to use the syntax
**T where T1 = t1 and ... and Tn = tn**
where

**and the**

*T***are type identifiers and the**

*Ti***are type expressions. The same notation works for recursive patterns (for which there is no toplevel declaration).**

*ti*
There is an important restriction concerning recursive types:
any cycle must cross a *type constructor* (pairs, records, XML
elements, arrows). Boolean connectives do *not* count as type
constructors! The code sample above is a correct definition.
The one below is invalid, because there is an unguarded cycle
between ** T** and

**.**

`S`type T = S | (S,S) (* INVALID! *) type S = T (* INVALID! *)

**(What follows is for the development version only:)**

Type declarations can be parametrized by one or more type variables. The general syntax is:

`type name('a1,...,'an) = t`

where **name** is a type identifier,

*immediately followed by an opening parenthesis*and a list of type variables

**that must occur in**

`'a1, ..., 'an`

*t*For instance, a possible definition for associative maps parametric in types of the pairs is:

type Map('a,'b) = ( ('a,'b) , Map('a,'b) ) | `Nil

The same restriction for cycles applies also in this case for the
whole parametric type. In this case the recursion variable is given
by the name *and* the type parameters which must always appear in the same order (e.g., in the definition above
** Map('a,'b)**). So for instance a definition such as

type Map('a,'b) = ( ('a,'b) , Map('b,'a) ) | `Nil (* ERROR: Invalid instantiation of type 'Map' *)

is rejected since ** Map('a,'b)** and

**are not textually the same.**

`Map('b,'a)`Parametric type variables can then be instantiated by any type, such as

**. The parametric notation is just a shorthand for the whole definition, therefore an instance is completely equivalent to using the right hand-side of the definition of the parametric types where the type variables are replaced by the specified types. This also means that parameters that are not used in the definition are useless.**

`Map(String , ('a,'a) -> Bool)`## Scalar types

CDuce has three kinds of atomic (scalar) values: integers, characters, and atoms. To each kind corresponds a family of types.

**Integers**.

CDuce integers are arbitrarily large. An integer literal is a sequence of decimal digits, plus an optional leading unary minus () character.`-`: all the integers.`Int`(where*i*--*j*and*i*are integer literals, or*j*for infinity): integer interval. E.g.:`*`,`100--*`[1] (note that`*--0`stands both for plus and minus infinity).`*`(where*i*is an integer literal): integer singleton type.*i*

**Floats**.

CDuce provider minimal features for floats. The only way to construct a value of typeis by the function`Float``float_of : String -> Float`**Characters**.

CDuce manipulates Unicode characters. A character literal is enclosed in single quotes, e.g.. The single quote and the backslash character must be escaped by a backslash:`'a', 'b', 'c'`,`'\''`. The double quote can also be escaped, but this is not mandatory. The usual`'\\'`are recognized. Arbitrary Unicode codepoints can be written in decimal`'\n', '\t', '\r'`(`'\`*i*;'is an decimal integer; note that the code is ended by a semicolon) or in hexadecimal*i*. Any other occurrence of a backslash character is prohibited.`'\x`*i*;': all the Unicode character set.`Char`(where*c*--*d*and*c*are character literals): interval of Unicode character set. E.g.:*d*.`'a'--'z'`(where*c*is a character literal): character singleton type.*c*: all the Latin1 character set (equivalent to`Byte`).`'\0;'--'\255;'`

**Atoms**.

Atoms are symbolic elements. They are used in particular to denote XML tag names, and also to simulate ML sum type constructors and exceptions names. An atom is writtenwhere “```*xxx*” is the backquote character and```follows the rules for CDuce identifiers) E.g.:*xxx*. The atom``yes, `No, `my-name`is used to denote empty sequences.``nil`: all the atoms.`Atom`(where*a*is an atom literal): atom singleton type.*a*: the two atoms`Bool`and``true`.``false`- See also: XML Namespaces.

## Pairs

The notion of Pairs is fundamental in CDuce, since pairs constitute a building block for sequences. Even if syntactic sugar somehow hides pairs when you use sequences, it is important to know that the underlying representation of sequences are nested pairs.

A pair expression is written ** (e1,e2)**
where

**and**

*e1***are expressions.**

*e2*
Similarly, pair types and patterns are written
** (t1,t2)** where

**and**

*t1***are types or patterns. E.g.:**

*t2***.**

`(Int,Char)`
When a capture variable **x** appears on both
side of a pair pattern

**, the semantics is the following one: when a value match**

*p*= (*p1*,*p2*)**, if**

*p***is bound to**

*x***by**

*v1***and to**

*p1***by**

*v2***, then**

*p2***is bound to the pair**

*x***by**

*(v1,v2)***.**

*p*
Tuples are syntactic sugar for pairs. For instance,
** (1,2,3,4)** denotes

**.**

`(1,(2,(3,4)))`** Pair**: the type of all pairs of values (it is equivalent to

**).**

`(Any,Any)`## Sequences

### Values and expressions

Sequences are a key ingredient of CDuce. They represent both the content of XML elements and strings of characters. Actually, as said before, they are only syntactic sugar over pairs.

Sequences expressions are written inside square brackets; element
are simply separated by whitespaces:
** [ e1 e2 ... en ]**.
Such an expression is syntactic sugar for:

**. E.g.:**

`(`*e1*,(*e2*,*...*(*en*,`nil)*...*))**.**

`[ 1 2 3 4 ]`
The binary operator ** @** denotes sequence concatenation.
E.g.:

**evaluates to**

`[ 1 2 3 ] @ [ 4 5 6 ]`**.**

`[ 1 2 3 4 5 6 ]`
It is possible to specify a terminator different from ** `nil**;
for instance

**denotes**

`[ 1 2 3 4 ;`*q*]**, and is equivalent to**

`(1,(2,(3,(4,`*q*))))**.**

`[ 1 2 3 4 ] @`*q*
Inside the square brackets of a sequence expression, it is possible
to have elements of the form ** ! e** (which is not
an expression by itself), where

**is an expression which should evaluate to a sequence. The semantics is to "open"**

*e***. For instance:**

*e***evaluates to**

`[ 1 2 ![ 3 4 ] 5 ]`**. Consequently, the concatenation of two sequences**

`[ 1 2 3 4 5 ]`**can also be written**

*e1*@*e2***or**

`[ !`*e1*!*e2*]**.**

`[ !`*e1*;*e2*]### Types and patterns

In CDuce, a sequence can be heterogeneous: the element can all have
different types. Types and patterns for sequences are specified
by regular expressions over types or patterns. The syntax is
** [ R ]** where

**is a regular expression, which can be:**

*R*- A type or a pattern, which correspond to a single element in the
sequence (in particular,
represents sequences of length 1,`[ _ ]`*not*arbitrary sequences). - A juxtaposition of regular expressions
which represents concatenation.*R1**R2* - A union of regular expressions
.*R1*|*R2* - A postfix repetition operator; the greedy operators are
,*R*?,*R*+, and the ungreedy operators are:*R**,*R*??,*R*+?. For types, there is no distinction in semantics between greedy and ungreedy.*R**? - A sequence capture variable
(only for patterns, of course). The semantics is to capture in*x*::*R*the subsequence matched by*x*. The same sequence capture variable can appear several times inside a regular expression, including under repetition operators; in that case, all the corresponding subsequences are concatenated together. Two instances of the same sequence capture variable cannot be nested, as in*R*.`[x :: (1 x :: Int)]`

Note the difference betweenand`[ x::Int ]`. Both accept sequences made of a single integer, but the first one binds`[ (x & Int) ]`to a sequence (of a single integer), whereas the second one binds it to the integer itself.`x` -
Grouping
. E.g.:`(`*R*).`[ x::(Int Int) y ]` -
Tail predicate
. The type/pattern`/`*p*applies to the current tail of the sequence (the subsequence starting at the current position) without consuming it. E.g.:*p*will bind`[ (Int /(x:=1) | /(x:=2)) _* ]`to`x`if the sequence starts with an integer and`1`otherwise (notice the presence of`2`to consume the tail of the sequence).`_*` -
Repetition
where*R****n*is a positive integer constant, which is just a shorthand for the concatenation of*n*copies of*n*.*R*

Sequence types and patterns also accepts the ** [ ...; ... ]**
notation. This is a convenient way to discard the tail of a sequence
in a pattern, e.g.:

**, which is equivalent to**

`[ x::Int* ; _ ]`**.**

`[ x::Int* _* ]`
It is possible to use the ** @**
operator (sequence concatenation) on types, including in recursive
definitions. E.g.:

type t = [ <a>(t @ t) ? ] (* [s?] where s=<a>[ s? s? ] *) type x = [ Int* ] type y = x @ [ Char* ] (* [ Int* Char* ] *) type t = ([Int] @ t) | [] (* [ Int* ] *)

however when used in recursive definitions ** @** but must be right linear so for instance the following definition are not allowed:

type t = (t @ [Int]) | [] (* ERROR: Ill-formed concatenation loop *) type t = t @ t (* ERROR: Ill-formed concatenation loop *)

## Strings

In CDuce, character strings are nothing but sequences of characters.
The type ** String** is pre-defined as

**. This makes it possible to use the full power of regular expression pattern matching with strings.**

`[ Char* ]`
Inside a regular expression type or pattern, it is possible
to use ** PCDATA** instead of

**(note that both are not types on their own, they only make sense inside square brackets, contrary to**

`Char*`**).**

`String`
The type ** Latin1** is the subtype of

**defined as**

`String`**; it denotes strings that can be represented in the ISO-8859-1 encoding, that is, strings made only of characters from the Latin1 character set.**

`[ Byte* ]`
Several consecutive characters literal in a sequence can be
merged together between two single quotes:
** [ 'abc' ]** instead of

**. Also it is possible to avoid square brackets by using double quotes:**

`[ 'a' 'b' 'c' ]`**. The same escaping rules applies inside double quotes, except that single quotes may be escaped (but must not), and double quotes must be.**

`"abc"`## Records

Records are finite sets of (name,value) bindings. They are used in particular to represent XML attribute sets. Names are actually Qualified Names (see XML Namespaces).

The syntax of a record expression is
** { l1=e1; ...; ln=en }**
where the

**are label names (same lexical conventions as for identifiers), and the**

*li***are expressions. When an expression**

*vi***is simply a variable whose name match the field label**

*ei***, it is possible to omit it. E.g.:**

*li***is equivalent to**

`{ x; y = 10; z }`**. The semi-colons between fields are optional.**

`{ x = x; y = 10; z = z }`
They are two kinds of record types. Open record types
are written ** { l1=t1; ...; ln=tn; ..
}**, and closed record types are written

**. Both denote all the record values where the labels**

`{`*l1*=*t1*;*...*;*ln*=*tn*}**are present and the associated values are in the corresponding type. The semi-colon between fields is optional. The distinction is that that open type allows extra fields, whereas the closed type gives a strict enumeration of the possible fields. As a consequence**

*li***is the type of all record values.**

`{..}`
Additionally, both for open and close record types,
it is possible to specify optional fields by using ** =?**
instead of

**between a label and a type. For instance,**

`=`**represents records with a**

`{ x =? Int; y = Bool }`**field of type**

`y`**, and an optional field**

`Bool`**(that when it is present, has type**

`y`**), and no other field.**

`Int`
The syntax is the same for patterns. Note that capture variables
cannot appear in an optional field. A common idiom is to bind
default values to replace missing optional fields: **
({ x = a } | (a := 1)) & { y = b }**. A special syntax
makes this idiom more convenient:

**.**

`{ x = a else (a:=1); y = b }`
As for record expressions, when the pattern
is simply a capture variable whose name match the field label,
it is possible to omit it. E.g.: ** { x; y = b; z }**
is equivalent to

**.**

`{ x = x; y = b; z = z }`
The ** +** operator (record concatenation, with priority given
to the right argument in case of overlapping) is available on record
types and patterns. This operator can be used to make a close
record type/pattern open, or to add fields:

type t = { a=Int b=Char } type s = t + {..} (* { a=Int b=Char .. } type u = s + { c=Float } (* { a=Int b=Char c=Float .. } *) type v = t + { c=Float } (* { a=Int b=Char c=Float } *)

## XML elements

In CDuce, the general form of an XML element is
** <(tag) (attr)>content** where

**,**

*tag***and**

*attr***are three expressions. Usually,**

*content***is a tag literal**

*tag***, and in this case, instead of writing**

```*xxx***, you can write:**

`<(``*tag*)>**. Similarly, when**

`<`*tag*>**is a record literal, you can omit the surrounding**

*attr***, and also the semicolon between attributes, E.g:**

`({...})`**.**

`<a href="http://..." dir="ltr">[]`
The syntax for XML elements types and patterns follows closely
the syntax for expressions:
** <(tag) (attr)>content**
where

**,**

*tag***and**

*attr***are three types or patterns. As for expressions, it is possible to simplify the notations for tags and attributes. For instance,**

*content***can be written:**

`<(`a) ({ href=String })>[]`**.**

`<a href=String>[]`The following sample shows several way to write XML types.

type A = <a x=String y=String ..>[ A* ] type B = <(`x | `y) ..>[ ] type C = <c x = String; y = String>[ ] type U = { x = String y =? String ..} type V = [ W* ] type W = <v (U)>V

** AnyXML**: the type of all XML values (it is equivalent to

**).**

`<(Atom) ({..})>[Any*]`## Functions

CDuce is an higher-order functional language: functions are first-class citizen values and, as such, they can be passed as argument or returned as result, stored in data structures, etc...

A functional type has the form **t -> s**
where

**and**

*t***are types. Intuitively, this type corresponds to functions that accept (at least) any argument of type**

*s***, and for such an argument on which they terminate they return a value of type**

*t***. For instance, the type**

*s***denotes functions that maps every pair of integers to an integer, and every pair of characters to a character.**

`((Int,Int) -> Int) & ((Char,Char) -> Char)`
The explanation above gives the intuition behind the interpretation
of functional types. It is sufficient to understand which
subtyping relations and equivalences hold between (boolean
combination) of functional types. For instance,
** (Int -> Int) & (Char -> Char)** is a subtype
of

**because with the intuition above, a function of the first type, when given a value of type**

`(Int|Char) -> (Int|Char)`**returns a value of type**

`Int|Char`**or of type**

`Int`**(depending on the argument).**

`Char`
Formally, the type **t -> s** denotes
the set of all CDuce abstractions

**such that**

`fun (`*t1*->*s1*;*...*;*tn*->*sn*)...**is a subtype of**

`(`*t1*->*s1*) &*...*& (*tn*->*sn*)**.**

*t*->*s*** Arrow**: the type of all functions (it is equivalent to

**).**

`Empty -> Any`Functional types have no counterpart in patterns.

## References

References are mutable memory cells. CDuce has no built-in
reference type. Instead, references are implemented
in an object-oriented way. The type ** ref t**
denotes references of values of type

**. It is only syntactic sugar for the type**

*t***.**

`{ get = [] ->`*t*; set =*t*-> [] }
Currently polymorphic reference types are not supported. Therefore, CDuce accepts only reference types ** ref t** where

**is a closed type.**

*t*## OCaml abstract types

The notation ** !t** is used by the
CDuce/OCaml interface
to denote the OCaml abstract type

**.**

`t`**TO BE DONE**

## Type syntax outline

Below we give a brief and incomplete summary of the syntax of types. For the complete syntax, please refer to the preceding sections.

TYPESt::=vsingleton type (v is a scalar value)| `satom (s is a string)| Atomall the atoms| Boolbooleans (equivalent to`true |`false)| Charcharacters|c--dUnicode intervals (c, d character literals)| ByteLatin1 characters| Intintegers|i--jinteger intervals (i, j integer literals or *)|t->tfunctions| (t,t)pairs| {l=t;...;l=t}closed record types| {l=t;...;l=t; ..}open record types|t+trecord type concatenation| <tt>tXML types| [R]sequences (where R is a type regexp)|t@tsequence concatenation| Stringstrings (equivalent to[Char*])| Latin1Latin1 strings (equivalent to[Byte*])| reftreferences (equivalent to{ get = []->t; set =t->[] })|t&tintersection|t|tunion|t\tdifference| Emptyempty type| Anytop type|Xtype recursion variable|twhereX=trecursive typesandX=t: andX=t|'atype variables (development version only)REGULAR EXPRESSIONS ON TYPESR::=RRconcatenation|R|Runion|R+one or more times|R*zero or more times|R?zero or one time

ℂDuce: Documentation: User's manual: Types and patterns