/[svn]/web/manual/types_patterns.xml
ViewVC logotype

Contents of /web/manual/types_patterns.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1404 - (show annotations)
Tue Jul 10 18:46:42 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 19780 byte(s)
[r2005-01-03 15:10:20 by afrisch] Doc

Original author: afrisch
Date: 2005-01-03 15:10:21+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="manual_types_patterns">
3
4 <title>Types and patterns</title>
5
6 <box title="Types and patterns" link="gen">
7
8 <p>
9 In CDuce, a type denotes a set of values, and a pattern
10 extracts sub-values from a value. Syntactically, types and patterns
11 are very close. Indeed, any type can be seen as a pattern
12 (which accepts any value and extracts nothing), and a pattern
13 without any capture variable is nothing but a type.
14 </p>
15
16 <p>
17 Moreover, values
18 also share a common syntax with types and patterns. This is motivated
19 by the fact that basic and constructed values (that is, any values without
20 functional values inside) are themselves singleton types.
21 For instance <code>(1,2)</code> is both a value, a type and a pattern.
22 As a type, it can be interpreted as a singleton type,
23 or as a pair type made of two singleton types.
24 As a pattern, it can be interpreted as a type constraint,
25 or as a pair pattern of two type constraints.
26 </p>
27
28 <p>
29 In this page, we present all the types and patterns that CDuce recognizes.
30 It is also the occasion to present the CDuce values themselves, the
31 corresponding expression constructions, and fundamental operations on them.
32 </p>
33
34 </box>
35
36 <box title="Capture variables and default patterns" link="capture">
37
38 <p>
39 A value identifier inside a pattern behaves as a capture variable:
40 it accepts and bind any value.
41 </p>
42
43 <p>
44 Another form of capture variable is the default value pattern
45 <code>( %%x%% := %%c%% )</code> where <code>%%x%%</code>
46 is a capture variable (that is, an identifier),
47 and <code>%%c%%</code> is a scalar constant.
48 The semantics of this pattern is to bind the capture variable
49 to the constant, disregarding the matched value (and accepting
50 any value).
51 </p>
52
53 <p>
54 Such a pattern is useful in conjunction with the first match policy
55 (see below) to define "default cases". For instance, the pattern
56 <code>((x &amp; Int) | (x := 0), (y &amp; Int) | (y := 0))</code>
57 accepts any pair and bind <code>x</code> to the left component
58 if it is an integer (and <code>0</code> otherwise), and similarly
59 for <code>y</code> with the right component of the pair.
60 </p>
61
62 </box>
63
64 <box title="Boolean connectives" link="bool">
65 <p>
66 CDuce recognize the full set of boolean connectives, whose
67 interpretation is purely set-theoretic.
68 </p>
69 <ul>
70 <li><code>Empty</code> denotes the empty type (no value).</li>
71 <li><code>Any</code> and <code>_</code> denote the universal type (all the values); the preferred notation is <code>Any</code> for types
72 and <code>_</code> for patterns, but they are strictly equivalent.
73 </li>
74 <li><code>&amp;</code> is the conjunction boolean connective.
75 The type <code>%%t1%% &amp; %%t2%%</code> has all the values
76 that belongs to <code>%%t1%%</code> and to <code>%%t2%%</code>.
77 Similarly, the pattern <code>%%p1%% &amp; %%p2%%</code> accepts
78 all the values accepted by both sub-patterns; a capture variable
79 cannot appear on both side of this pattern.
80 </li>
81 <li><code>|</code> is the disjunction boolean connective.
82 The type <code>%%t1%% | %%t2%%</code> has all the values
83 that belongs either to <code>%%t1%%</code> or to <code>%%t2%%</code>.
84 Similarly, the pattern <code>%%p1%% | %%p2%%</code> accepts
85 all the values accepted by any of the two sub-patterns;
86 if both match, the first match policy applies, and <code>%%p1%%</code>
87 dictates how to capture sub-values. The two sub-patterns
88 must have the same set of capture variables.</li>
89 <li><code>\</code> is the difference boolean connective.
90 The left hand-side can be a type or a pattern, but the right-hand side
91 is necessarily a type (no capture variable).</li>
92 </ul>
93 </box>
94
95 <box title="Recursive types and patterns" link="recurs">
96 <p>
97 A set of mutually recursive types can be defined
98 by toplevel type declarations, as in:
99 </p>
100
101 <sample><![CDATA[
102 type T1 = <a>[ T2* ]
103 type T2 = <b>[ T1 T1 ]
104 ]]></sample>
105
106 <p>
107 It is also possible to use the syntax
108 <code>%%T%% where %%T1%% = %%t1%% and ... and %%Tn%% = %%tn%%</code>
109 where <code>%%T%%</code> and the <code>%%Ti%%</code> are type identifiers
110 and the <code>%%ti%%</code> are type expressions. The same notation
111 works for recursive patterns (for which there is no toplevel declarations).
112 </p>
113
114 <p>
115 There is an important restriction concerning recursive types:
116 any cycle must cross a <em>type constructor</em> (pairs, records, XML
117 elements, arrows). Boolean connectives do <em>not</em> count as type
118 constructors! The code sample above is a correct definition.
119 The one below is invalid, because there is an unguarded cycle
120 between <code>T</code> and <code>S</code>.
121 </p>
122
123 <sample><![CDATA[
124 type T = S | (S,S) (* INVALID! *)
125 type S = T (* INVALID! *)
126 ]]></sample>
127
128 </box>
129
130
131 <box title="Scalar types" link="basic">
132
133 <p>
134 CDuce has three kind of atomic (scalar) values:
135 integers, characters, and atoms. To each kind corresponds a family of types.
136 </p>
137
138 <ul>
139 <li><b>Integers</b>.
140 <br/>CDuce integers are arbitrarily large. An integer
141 literal is a sequence of decimal digits, plus an optional leading unary
142 minus (<code>-</code>) character.
143 <ul>
144 <li><code>Int</code>: all the integers.</li>
145 <li><code>%%i%%--%%j%%</code> (where <code>%%i%%</code> and
146 <code>%%j%%</code> are integer literals, or <code>*</code>
147 for infinity): integer interval. E.g.: <code>100--*</code>. </li>
148 <li><code>%%i%%</code> (where <code>%%i%%</code> is an integer
149 literal): integer singleton type.</li>
150 </ul>
151 </li>
152
153 <li><b>Characters</b>.
154 <br/>CDuce manipulates Unicode characters. A character
155 literal is enclosed in single quotes, e.g. <code>'a', 'b', 'c'</code>.
156 The single quote and the backslash character must be escaped
157 by a backslash: <code>'\''</code>, <code>'\\'</code>. The double
158 quote can also be escaped, but this is not mandatory.
159 The usual <code>'\n', '\t', '\r'</code> are recognized.
160 Arbitrary Unicode codepoints can be written in decimal
161 <code>'\%%i%%;</code> (<code>%%i%%</code> is an decimal integer) or
162 in hexadecimal <code>'\x%%i%%;</code>. Any other occurrence of
163 a backslash character is prohibited.
164
165 <ul>
166 <li><code>Char</code>: all the Unicode character set.</li>
167 <li><code>%%c%%--%%d%%</code> (where <code>%%d%%</code> and
168 <code>%%d%%</code> are character literals):
169 interval of Unicode character set. E.g.: <code>'a'--'z'</code>. </li>
170 <li><code>%%c%%</code> (where <code>%%c%%</code> is an integer
171 literal): character singleton type.</li>
172 <li><code>Byte</code>: all the Latin1 character set
173 (equivalent to <code>'\0;'--'\255;'</code>).</li>
174 </ul>
175 </li>
176
177 <li><b>Atoms</b>.
178 <br/>Atoms are symbolic elements. They are used in particular
179 to denote XML tag names, and also to simulate ML sum type
180 constructors and exceptions names.
181 An atomic is written <code>`%%xxx%%</code> where
182 <code>%%xxx%%</code> follows the rules for CDuce identifiers.
183 E.g.: <code>`yes, `No, `my-name</code>. The atom <code>`nil</code>
184 is used to denote empty sequences.
185 <ul>
186 <li><code>Atom</code>: all the atoms.</li>
187 <li><code>%%a%%</code> (where <code>%%a%%</code> is an atom
188 literal): atom singleton type.</li>
189 <li><code>Bool</code>: the two atoms <code>`true</code> and
190 <code>`false</code>.</li>
191 <li>See also: <local href="namespaces"/>.</li>
192 </ul>
193 </li>
194 </ul>
195 </box>
196
197 <box title="Pairs" link="pairs">
198 <p>
199 Pairs is a fundamental notion in CDuce, as they constitute a building
200 block for sequence. Even if syntactic sugar somewhat hides
201 pairs when you use sequences, it is good to know the existence of pairs.
202 </p>
203
204 <p>
205 A pair expression is written <code>(%%e1%%,%%e2%%)</code>
206 where <code>%%e1%%</code> and <code>%%e2%%</code> are expressions.
207 </p>
208
209 <p>
210 Similarly, pair types and patterns are written
211 <code>(%%t1%%,%%t2%%)</code> where <code>%%t1%%</code> and
212 <code>%%t2%%</code> are types or patterns. E.g.: <code>(Int,Char)</code>.
213 </p>
214
215 <p>
216 When a capture variable <code>%%x%%</code> appears on both
217 side of a pair pattern <code>%%p%% = (%%p1%%,%%p2%%)</code>, the semantics
218 is the following one: when a value match <code>%%p%%</code>,
219 if <code>%%x%%</code> is bound to <code>%%v1%%</code> by
220 <code>%%p1%%</code> and to <code>%%v2%%</code> by
221 <code>%%p2%%</code>,
222 then <code>%%x%%</code> is bound to the pair <code>%%(v1,v2)%%</code> by
223 <code>%%p%%</code>.
224 </p>
225
226 <p>
227 Tuples are syntactic sugar for pairs. For instance,
228 <code>(1,2,3,4)</code> denotes <code>(1,(2,(3,4)))</code>.
229 </p>
230 </box>
231
232 <box title="Sequences" link="seq">
233
234 <section title="Values and expressions">
235
236 <p>
237 Sequences are fundamental in CDuce. They represents
238 the content of XML elements, and also character strings.
239 Actually, they are only syntactic sugar over pairs.
240 </p>
241
242 <p>
243 Sequences expressions are written inside square brackets; element
244 are simply separated by whitespaces:
245 <code>[ %%e1%% %%e2%% %%...%% %%en%% ]</code>.
246 Such an expression is syntactic sugar for:
247 <code>(%%e1%%,(%%e2%%, %%...%% (%%en%%,`nil) %%...%%))</code>.
248 E.g.: <code>[ 1 2 3 4 ]</code>.
249 </p>
250
251 <p>
252 The binary operator <code>@</code> denotes sequence concatenation.
253 E.g.: <code>[ 1 2 3 ] @ [ 4 5 6 ]</code> evaluates to
254 <code>[ 1 2 3 4 5 6 ]</code>.
255 </p>
256
257 <p>
258 It is possible to specify a terminator different from <code>`nil</code>;
259 for instance
260 <code>[ 1 2 3 4 ; %%q%% ]</code> denotes <code>(1,(2,(3,(4,%%q%%))))</code>,
261 and is equivalent to (but more efficient than):
262 <code>[ 1 2 3 4 ] @ %%q%%</code>.
263 Consequently, a pair <code>(%%e1%%,%%e2%%)</code> can also
264 be written <code>[ %%e1%%; %%e2%% ]</code>.
265 </p>
266
267 <p>
268 Inside the square brackets of a sequence expression, it is possible
269 to have elements of the form <code>! %%e%%</code> (which is not
270 an expression by itself), where <code>%%e%%</code> is an expression
271 which should evaluate to a sequence. The semantics is
272 to "open" <code>%%e%%</code>. For instance:
273 <code>[ 1 2 ![ 3 4 ] 5 ]</code>
274 evaluates to
275 <code>[ 1 2 3 4 5 ]</code>.
276 Consequently, the concatenation of two sequences <code>%%e1%% @ %%e2%%</code>
277 can also be written <code>[ !%%e1%% !%%e2%% ]</code>
278 or <code>[ !%%e1%% ; %%e2%% ]</code>.
279 </p>
280
281 </section>
282
283 <section title="Types and patterns">
284
285 <p>
286 In CDuce, a sequence can be heterogeneous: the element can all have
287 different types. Types and patterns for sequences are specified
288 by regular expressions over types or patterns. The syntax is
289 <code>[ %%R%% ]</code> where <code>%%R%%</code> is a regular expression, which
290 can be:
291 </p>
292 <ul>
293 <li>A type or a pattern, which correspond to a single element in the
294 sequence (in particular, <code>[ _ ]</code> represents
295 sequences of length 1, <em>not</em> arbitrary sequences).</li>
296 <li>A juxtaposition of regular expression <code>%%R1%% %%R2%%</code>
297 which represents concatenation.
298 </li>
299 <li>A postfix repetition operator; the greedy operators are
300 <code>%%R%%?</code>,
301 <code>%%R%%+</code>,
302 <code>%%R%%*</code>, and the ungreedy operators are:
303 <code>%%R%%??</code>,
304 <code>%%R%%+?</code>,
305 <code>%%R%%*?</code>. For types, there is no distinction in semantics between
306 greedy and ungreedy. </li>
307 <li>A sequence capture variable <code>%%x%%::%%R%%</code>
308 (only for patterns, of course).
309 The semantics is to capture in <code>%%x%%</code> the subsequence
310 matched by <code>%%R%%</code>. The same sequence capture variable
311 can appear several times inside a regular expression, including
312 under repetition operators; in that case, all the corresponding
313 subsequences are concatenated together. Two instances of the
314 same sequence capture variable cannot be nested, as in
315 <code>[x :: (1 x :: Int)]</code>.
316 <br/>
317 Note the difference between <code>[ x::Int ]</code> and
318 <code>[ (x &amp; Int) ]</code>. Both accept sequences made of a single
319 integer, but the first one binds <code>x</code> to a sequence
320 (of a single integer), whereas the second one binds it to
321 the integer itself.</li>
322 <li>
323 Grouping <code>(%%R%%)</code>. E.g.: <code>[ x::(Int Int) y ]</code>.
324 </li>
325 <li>
326 Tail predicate <code>/p</code>. The type/pattern <code>p</code>
327 applies to the current tail of the sequence (the subsequence
328 starting at the current position). E.g.:
329 <code>[ (Int /(x:=1) | /(x:=2)) _* ]</code> will bind
330 <code>x</code> to <code>1</code> if the sequence starts
331 with an integer and <code>2</code> otherwise.
332 </li>
333 </ul>
334
335 <p>
336 Sequence types and patterns also accepts the <code>[ %%...%%; %%...%% ]</code>
337 notation. This is a convenient way to discard the tail of a sequence
338 in a pattern, e.g.: <code>[ x::Int* ; _ ]</code>, which
339 is equivalent to <code>[ x::Int* _* ]</code>.
340 </p>
341
342 </section>
343
344 </box>
345
346 <box title="Strings" link="string">
347
348 <p>
349 In CDuce, character strings are nothing but sequences of characters.
350 The type <code>String</code> is pre-defined as <code>[ Char* ]</code>.
351 This allows to use the full power of regular expression
352 pattern matching with strings.
353 </p>
354
355 <p>
356 Inside a regular expression type or pattern, it is possible
357 to use <code>PCDATA</code> instead of <code>Char*</code>
358 (note that both are not types on their own, they only make sense
359 inside square brackets, contrary to <code>String</code>).
360 </p>
361
362 <p>
363 The type <code>Latin1</code> is the subtype of <code>String</code>
364 defined as <code>[ Byte* ]</code>; it denotes strings that can
365 be represented in the ISO-8859-1 encoding, that is, strings made only
366 of characters from the Latin1 character set.
367 </p>
368
369 <p>
370 Several consecutive characters literal in a sequence can be
371 merged together between two single quotes:
372 <code>[ 'abc' ]</code> instead of <code>[ 'a' 'b' 'c' ]</code>.
373 Also it is possible to avoid square brackets by using
374 double quotes: <code>"abc"</code>. The same escaping rules applies
375 inside double quotes, except that single quotes may be escaped (but
376 must not), and double quotes must be.
377 </p>
378
379 </box>
380
381 <box title="Records" link="record">
382
383 <p>
384 Records are set of finite (name,value) bindings. They are used
385 in particular to represent XML attribute sets. Names are
386 actually Qualified Names (see <local href="namespaces"/>).
387 </p>
388
389 <p>
390 The syntax of a record expression is
391 <code>{ %%l1%% = %%e1%%; %%...%%; %%ln%% = %%en%% }</code>
392 where the <code>%%li%%</code> are label names (same lexical
393 conventions as for identifiers), and the <code>%%vi%%</code>
394 are expressions. When an expression <code>%%ei%%</code>
395 is simply a variable whose name match the field label
396 <code>%%li%%</code>, it is possible to omit it.
397 E.g.: <code>{ x; y = 10; z }</code>
398 is equivalent to <code>{ x = x; y = 10; z = z }</code>.
399 </p>
400
401
402
403 <p>
404 They are two kinds of record types. Open record types
405 are written <code>{ %%l1%% = %%t1%%; %%...%%; %%ln%% = %%tn%%
406 }</code>, and closed record types are written
407 <code>{| %%l1%% = %%t1%%; %%...%%; %%ln%% = %%tn%%
408 |}</code>.
409 Both denote all the record values where
410 the labels <code>%%li%%</code> are present and the associated values
411 are in the corresponding type. The distinction is that that open
412 type allow extra fields, whereas the closed type gives a strict
413 enumeration of the possible fields.
414 </p>
415
416 <p>
417 Additionally, both for open and close record types,
418 it is possible to specify optional fields by using <code>=?</code>
419 instead of <code>=</code> between a label and a type.
420 For instance, <code>{| x = Int; y =? Bool |}</code>
421 represents records with an <code>x</code> field of type
422 <code>Int</code>, an optional field <code>y</code> (when it is
423 present, it has type <code>Bool</code>), and no other field.
424 </p>
425
426 <p>
427 Note that the value <code>{ x = 1; y = 2 }</code>
428 has actually the type <code>{| x = 1; y = 2 |}</code>
429 which is more precise than <code>{ x = 1; y = 2 }</code>. This is
430 the only situation where the singleton type corresponding to a constructed
431 value is not syntactically equal to this value.
432 </p>
433
434 <p>
435 The syntax is the same for patterns. Note that capture variables
436 cannot appear in an optional field. A common idiom is to bind
437 default values to replace missing optinal fields:<code>
438 ({ x = a } | (a := 1)) &amp; { y = b }</code>. A special syntax
439 makes this idiom more convenient:
440 <code>{ x = a else (a:=1); y = b }</code>.
441 </p>
442
443 <p>
444 As for record expressions, when the pattern
445 is simply a capture variable whose name match the field label,
446 it is possible to omit it. E.g.: <code>{ x; y = b; z }</code>
447 is equivalent to <code>{ x = x; y = b; z = z }</code>.
448 </p>
449
450 </box>
451
452 <box title="XML elements" link="xml">
453
454 <p>
455 In CDuce, the general of an XML element is
456 <code>&lt;(%%tag%%) (%%attr%%)>%%content%%</code> where
457 <code>%%tag%%</code>,
458 <code>%%attr%%</code> and
459 <code>%%content%%</code> are three expressions.
460 Usually, <code>%%tag%%</code> is a tag literal <code>`%%xxx%%</code>, and
461 in this case, instead of writing <code>&lt;(`%%tag%%)></code>,
462 you can write: <code>&lt;%%tag%%></code>.
463 Similarly, when <code>%%attr%%</code> is a record literal, you can
464 omit the surrounding <code>({...})</code>, and also the semicolon
465 between attributes,
466 E.g: <code>&lt;a href="http://..." dir="ltr">[]</code>.
467 </p>
468
469 <p>
470 The syntax for XML elements types and patterns follows closely
471 the syntax for expressions:
472 <code>&lt;(%%tag%%) (%%attr%%)>%%content%%</code>
473 where
474 <code>%%tag%%</code>,
475 <code>%%attr%%</code> and
476 <code>%%content%%</code> are three types or patterns.
477 As for expressions, it is possible to simplify the notations
478 for tags and attributes. For instance,
479 <code>&lt;(`a) ({ href=String })>[]</code>
480 can be written:
481 <code>&lt;a href=String>[]</code>.
482 </p>
483
484 <p>
485 The following sample shows several way to write XML types.
486 </p>
487
488 <sample><![CDATA[
489 type A = <a x=String y=String>[ A* ]
490 type B = <(`x | `y)>[ ]
491 type C = <c {| x = String; y = String |}>[ ]
492 type U = { x = String; y =? String }
493 type V = [ W* ]
494 type W = <v (U)>V
495 ]]></sample>
496
497 </box>
498
499
500 <box title="Functions" link="fun">
501
502 <p>
503 CDuce is an higher-order functional languages: functions are
504 first-class citizen values, and can be passed as argument or returned
505 as result, stored in data structure, etc...
506 </p>
507
508 <p>
509 A functional type has the form <code>%%t%% -> %%s%%</code>
510 where <code>%%t%%</code> and <code>%%s%%</code> are types.
511 Intuitively, this type corresponds to functions that accept
512 (at least) any argument of type <code>%%t%%</code>, and for
513 such an argument, returns a value of type <code>%%s%%</code>.
514 For instance, the type <code>(Int,Int) -> Int &amp; (Char,Char) -> Char</code>
515 denotes functions that maps any pair of integer to an integer,
516 and any pair of characters to a character.
517 </p>
518
519 <p>
520 The explanation above gives the intuition behind the interpretation
521 of functional types. It is sufficient to understand which
522 subtyping relations and equivalences hold between (boolean
523 combination) of functional types. For instance,
524 <code>Int -> Int &amp; Char -> Char</code> is a subtype
525 of <code>(Int|Char) -> (Int|Char)</code> because
526 with the intuition above, a function of the first type,
527 when given a value of type <code>Int|Char</code> returns
528 a value of type <code>Int</code> or of type <code>Char</code>
529 (depending on the argument).
530 </p>
531
532 <p>
533 Formally, the type <code>%%t%% -> %%s%%</code> denotes
534 CDuce abstractions
535 <code>fun (%%t1%% -> %%s1%%; %%...%%; %%tn%% -> %%sn%%)...</code>
536 such that <code>%%t1%% -> %%s1%% &amp; %%...%% &amp; %%tn%% ->
537 %%sn%%</code> is a subtype of <code>%%t%% -> %%s%%</code>.
538 </p>
539
540 <p>
541 Functional types have no counterpart in patterns.
542 </p>
543
544 </box>
545
546 <box title="References" link="ref">
547
548 <p>
549 References are mutable memory cells. CDuce has no built-in
550 reference type. Instead, references are implemented
551 in an object-oriented way. The type <code>ref %%T%%</code>
552 denotes references of values of type <code>%%T%%</code>. It
553 is only syntactic sugar for the type
554 <code>{| get = [] -> T ; set = T -> [] |}</code>.
555 </p>
556
557 </box>
558
559 <box title="OCaml abstract types" link="abstr">
560 <p>
561 The notation <code>!t</code> is used by the
562 <local href="manual_interfacewithocaml">CDuce/OCaml interface</local>
563 to denote the OCaml abstract type <code>t</code>.
564 </p>
565 </box>
566
567 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5