/[svn]/web/manual/types_patterns.xml
ViewVC logotype

Contents of /web/manual/types_patterns.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 384 - (show annotations)
Tue Jul 10 17:30:36 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 17319 byte(s)
[r2003-05-21 21:24:56 by cvscast] Manual

Original author: cvscast
Date: 2003-05-21 21:24:56+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="manual_types_patterns">
3
4 <title>Types and patterns</title>
5
6 <box title="Types and patterns" link="gen">
7
8 <p>
9 In CDuce, a type denotes a set of values, and a pattern
10 extracts sub-values from a value. Syntactically, types and patterns
11 are very close. Indeed, any type can be seen as a pattern
12 (which accepts any value and extracts nothing), and a pattern
13 without any capture variable is nothing but a type.
14 </p>
15
16 <p>
17 Moreover, values
18 also share a common syntax with types and patterns. This is motivated
19 by the fact that basic and constructed values (that is, any values without
20 functional values inside) are themselves singleton types.
21 For instance <code>(1,2)</code> is both a value, a type and a pattern.
22 As a type, it can be interpreted as a singleton type,
23 or as a pair type made of two singleton types.
24 As a pattern, it can be interpreted as a type constraint,
25 or as a pair pattern of two type constraints.
26 </p>
27
28 <p>
29 In this page, we present all the types and patterns that CDuce recognizes.
30 It is also the occasion to present the CDuce values themselves, the
31 corresponding expression constructions, and fundamental operations on them.
32 </p>
33
34 </box>
35
36 <box title="Capture variable" link="capture">
37
38 <p>
39 A value identifier inside a pattern behaves as a capture variable:
40 it accepts and bind any value.
41 </p>
42
43 <p>
44 Another form of capture variable is the default value pattern
45 <code>( %%x%% := %%c%% )</code> where <code>%%x%%</code>
46 is a capture variable (that is, an identifier),
47 and <code>%%c%%</code> is a scalar constant.
48 The semantics of this pattern is to bind the capture variable
49 to the constant, disregarding the matched value (and accepting
50 any value).
51 </p>
52
53 <p>
54 Such a pattern is useful in conjunction with the first match policy
55 (see below) to define "default cases". For instance, the pattern
56 <code>((x &amp; Int) | (x := 0), (y &amp; Int) | (y := 0))</code>
57 accepts any pair and bind <code>x</code> to the left component
58 if it is an integer (and <code>0</code> otherwise), and similarly
59 for <code>y</code> with the right component of the pair.
60 </p>
61
62 </box>
63
64 <box title="Boolean connectives" link="bool">
65 <p>
66 CDuce recognize the full set of boolean connectives, whose
67 interpretation is purely set-theoretic.
68 </p>
69 <ul>
70 <li><code>Empty</code> denotes the empty type (no value).</li>
71 <li><code>Any</code> and <code>_</code> denote the universal type (all the values); the preferred notation is <code>Any</code> for types
72 and <code>_</code> for patterns, but they are strictly equivalent.
73 </li>
74 <li><code>&amp;</code> is the conjunction boolean connective.
75 The type <code>%%t1%% &amp; %%t2%%</code> has all the values
76 that belongs to <code>%%t1%%</code> and to <code>%%t2%%</code>.
77 Similarly, the pattern <code>%%p1%% &amp; %%p2%%</code> accepts
78 all the values accepted by both sub-patterns; a capture variable
79 cannot appear on both side of this pattern.
80 </li>
81 <li><code>|</code> is the disjunction boolean connective.
82 The type <code>%%t1%% | %%t2%%</code> has all the values
83 that belongs either to <code>%%t1%%</code> or to <code>%%t2%%</code>.
84 Similarly, the pattern <code>%%p1%% | %%p2%%</code> accepts
85 all the values accepted by any of the two sub-patterns;
86 if both match, the first match policy applies, and <code>%%p1%%</code>
87 dictates how to capture sub-values. The two sub-patterns
88 must have the same set of capture variables.</li>
89 <li><code>\</code> is the difference boolean connective.
90 The left hand-side can be a type or a pattern, but the right-hand side
91 is necessarily a type (no capture variable).</li>
92 </ul>
93 </box>
94
95 <box title="Recursive types and patterns" link="recurs">
96 <p>
97 A set of mutually recursive types can be defined
98 by toplevel type declarations, as in:
99 </p>
100
101 <sample><![CDATA[
102 type T1 = <a>[ T2* ];;
103 type T2 = <b>[ T1 T1 ];;
104 ]]></sample>
105
106 <p>
107 It is also possible to use the syntax
108 <code>%%T%% where %%T1%% = %%t1%% and ... and %%Tn%% = %%tn%%</code>
109 where <code>%%T%%</code> and the <code>%%Ti%%</code> are type identifiers
110 and the <code>%%ti%%</code> are type expressions. The same notation
111 works for recursive patterns (for which there is no toplevel declarations).
112 </p>
113
114 <p>
115 There is an important restriction concerning recursive types:
116 any cycle must cross a <em>type constructor</em> (pairs, records, XML
117 elements, arrows). Boolean connectives do <em>not</em> count as type
118 constructors ! The code sample above is a correct definition.
119 The one below is invalid, because there is an unguarded cycle
120 between <code>T</code> and <code>S</code>.
121 </p>
122
123 <sample><![CDATA[
124 type T = S | (S,S);; (* INVALID ! *)
125 type S = T;;
126 ]]></sample>
127
128 </box>
129
130
131 <box title="Scalar types" link="basic">
132
133 <p>
134 CDuce has three kind of atomic (scalar) values:
135 integers, characters, and atoms. To each kind corresponds a family of types.
136 </p>
137
138 <ul>
139 <li><b>Integers</b>.
140 <br/>CDuce integers are arbitrarily large. An integer
141 literal is a sequence of decimal digits, plus an optinal leading unary
142 minus (<code>-</code>) character.
143 <ul>
144 <li><code>Int</code>: all the integers.</li>
145 <li><code>%%i%%--%%j%%</code> (where <code>%%i%%</code> and
146 <code>%%j%%</code> are integer literals, or <code>*</code>
147 for infinity): integer interval. E.g.: <code>100--*</code>. </li>
148 <li><code>%%i%%</code> (where <code>%%i%%</code> is an integer
149 literal): integer singleton type.</li>
150 </ul>
151 </li>
152
153 <li><b>Characters</b>.
154 <br/>CDuce manipulates Unicode characters. A character
155 literal is enclosed in single quotes, e.g. <code>'a', 'b', 'c'</code>.
156 The single quote and the backslash character must be escaped
157 by a backslash: <code>'\''</code>, <code>'\\'</code>. The double
158 quote can also be escaped, but this is not mandatory.
159 The usual <code>'\n', '\t', '\r'</code> are recognized.
160 Arbitrary Unicode codepoints can be written in decimal
161 <code>'\%%i%%;</code> (<code>%%i%%</code> is an decimal integer) or
162 in hexadecimal <code>'\x%%i%%;</code>. Any other occurence of
163 a backslash character is prohibited.
164
165 <ul>
166 <li><code>Char</code>: all the Unicode character set.</li>
167 <li><code>%%c%%--%%d%%</code> (where <code>%%d%%</code> and
168 <code>%%d%%</code> are character literals):
169 interval of Unicode character set. E.g.: <code>'a'--'z'</code>. </li>
170 <li><code>%%c%%</code> (where <code>%%c%%</code> is an integer
171 literal): character singleton type.</li>
172 </ul>
173 </li>
174
175 <li><b>Atoms</b>.
176 <br/>Atoms are symbolic elements. They are used in particular
177 to denote XML tag names, and also to simulate ML sum type
178 constructors and exceptions names.
179 An atomic is written <code>`%%xxx%%</code> where
180 <code>%%xxx%%</code> follows the rules for CDuce identifiers.
181 E.g.: <code>`yes, `No, `my-name</code>. The atom <code>`nil</code>
182 is used to denote empty sequences.
183 <ul>
184 <li><code>Atom</code>: all the atoms.</li>
185 <li><code>%%a%%</code> (where <code>%%a%%</code> is an atom
186 literal): atom singleton type.</li>
187 <li><code>Bool</code>: the two atoms <code>`true</code> and
188 <code>`false</code>.</li>
189 </ul>
190 </li>
191 </ul>
192 </box>
193
194 <box title="Pairs" link="pairs">
195 <p>
196 Pairs is a fundamental notion in CDuce, as they constitute a building
197 block for sequence. Even if syntactic sugar somewhat hides
198 pairs when you use sequences, it is good to know the existence of pairs.
199 </p>
200
201 <p>
202 A pair expression is written <code>(%%e1%%,%%e2%%)</code>
203 where <code>%%e1%%</code> and <code>%%e2%%</code> are expressions.
204 </p>
205
206 <p>
207 Similarly, pair types and patterns are written
208 <code>(%%t1%%,%%t2%%)</code> where <code>%%t1%%</code> and
209 <code>%%t2%%</code> are types or patterns. E.g.: <code>(Int,Char)</code>.
210 </p>
211
212 <p>
213 When a capture variable <code>%%x%%</code> appears on both
214 side of a pair pattern <code>%%p%% = (%%p1%%,%%p2%%)</code>, the semantics
215 is the following one: when a value match <code>%%p%%</code>,
216 if <code>%%x%%</code> is bound to <code>%%v1%%</code> by
217 <code>%%p1%%</code> and to <code>%%v2%%</code> by
218 <code>%%p2%%</code>,
219 then <code>%%x%%</code> is bound to the pair <code>%%(v1,v2)%%</code> by
220 <code>%%p%%</code>.
221 </p>
222
223 <p>
224 Tuples are syntactic sugar for pairs. For instance,
225 <code>(1,2,3,4)</code> denotes <code>(1,(2,(3,4)))</code>.
226 </p>
227 </box>
228
229 <box title="Sequences" link="seq">
230
231 <section title="Values and expressions">
232
233 <p>
234 Sequences are fundamental in CDuce. They represents
235 the content of XML elements, and also character strings.
236 Actually, they are only syntactic sugar over pairs.
237 </p>
238
239 <p>
240 Sequences expressions are written inside square brackets; element
241 are simply separated by whitespaces:
242 <code>[ %%e1%% %%e2%% %%...%% %%en%% ]</code>.
243 Such an expression is syntactic sugar for:
244 <code>(%%e1%%,(%%e2%%, %%...%% (%%en%%,`nil) %%...%%))</code>.
245 E.g.: <code>[ 1 2 3 4 ]</code>.
246 </p>
247
248 <p>
249 The binary operator <code>@</code> denotes sequence concatenation.
250 E.g.: <code>[ 1 2 3 ] @ [ 4 5 6 ]</code> evaluates to
251 <code>[ 1 2 3 4 5 6 ]</code>.
252 </p>
253
254 <p>
255 It is possible to specify a terminator different from <code>`nil</code>;
256 for instance
257 <code>[ 1 2 3 4 ; %%q%% ]</code> denotes <code>(1,(2,(3,(4,%%q%%))))</code>,
258 and is equivalent to (but more efficient than):
259 <code>[ 1 2 3 4 ] @ %%q%%</code>.
260 Consequently, a pair <code>(%%e1%%,%%e2%%)</code> can also
261 be written <code>[ %%e1%%; %%e2%% ]</code>.
262 </p>
263
264 <p>
265 Inside the square brackets of a sequence expression, it is possible
266 to have elements of the form <code>! %%e%%</code> (which is not
267 an expression by itself), where <code>%%e%%</code> is an expression
268 which should evaluate to a sequence. The semantics is
269 to "open" <code>%%e%%</code>. For instance:
270 <code>[ 1 2 ![ 3 4 ] 5 ]</code>
271 evaluates to
272 <code>[ 1 2 3 4 5 ]</code>.
273 Consequently, the concatenation of two sequences <code>%%e1%% @ %%e2%%</code>
274 can also be written <code>[ !%%e1%% !%%e2%% ]</code>
275 or <code>[ !%%e1%% ; %%e2%% ]</code>.
276 </p>
277
278 </section>
279
280 <section title="Types and patterns">
281
282 <p>
283 In CDuce, a sequence can be heterogeneous: the element can all have
284 different types. Types and patterns for sequences are specified
285 by regular expressions over types or patterns. The syntax is
286 <code>[ %%R%% ]</code> where <code>%%R%%</code> is a regular expression, which
287 can be:
288 </p>
289 <ul>
290 <li>A type or a pattern, which correspond to a single element in the
291 sequence (in particular, <code>[ _ ]</code> represents
292 sequences of length 1, <em>not</em> arbitrary sequences).</li>
293 <li>A juxtaposition of regular expression <code>%%R1%% %%R2%%</code>
294 which represents concatenation.
295 </li>
296 <li>A postfix repetition operator; the greedy operators are
297 <code>%%R%%?</code>,
298 <code>%%R%%+</code>,
299 <code>%%R%%*</code>, and the ungreedy operators are:
300 <code>%%R%%??</code>,
301 <code>%%R%%+?</code>,
302 <code>%%R%%*?</code>. For types, there is no distinction in semantics between
303 greedy and ungreedy. </li>
304 <li>A sequence capture variable <code>%%x%%::%%R%%</code>.
305 The semantics is to capture in <code>%%x%%</code> the subsequence
306 matched by <code>%%R%%</code>. The same sequence capture variable
307 can appear several times inside a regular expression, including
308 under repetition operators; in that case, all the corresponding
309 subsequences are concatenated together.
310 <br/>
311 Note the difference between <code>[ x::Int ]</code> and
312 <code>[ (x &amp; Int) ]</code>. Both accept sequences made of a single
313 integer, but the first one binds <code>x</code> to a sequence
314 (of a single integer), whereas the second one binds it to
315 the integer itself.</li>
316 </ul>
317
318 <p>
319 Sequence types and patterns also accepts the <code>[ %%...%%; %%...%% ]</code>
320 notation. This is a convenient way to discard the tail of a sequence
321 in a pattern, e.g.: <code>[ x::Int* ; _ ]</code>.
322 </p>
323
324 </section>
325
326 </box>
327
328 <box title="Strings" link="string">
329
330 <p>
331 In CDuce, character strings are nothing but sequences of characters.
332 The type <code>String</code> is pre-defined as <code>[ Char* ]</code>.
333 This allows to use the full power of regular expression
334 pattern matching with strings.
335 </p>
336
337 <p>
338 Inside a regular expression type or pattern, it is possible
339 to use <code>PCDATA</code> instead of <code>Char*</code>
340 (note that both are not types on their own, they only make sense
341 inside square brackets, contrary to <code>String</code>).
342 </p>
343
344 <p>
345 Several consecutive characters literal in a sequence can be
346 merged together between two single quotes:
347 <code>[ 'abc' ]</code> instead of <code>[ 'a' 'b' 'c' ]</code>.
348 Also it is possible to avoid square brackets by using
349 double quotes: <code>"abc"</code>. The same escaping rules applies
350 inside double quotes, except that single quotes may be escaped (but
351 must not), and double quotes must be.
352 </p>
353
354 </box>
355
356 <box title="Records" link="record">
357
358 <p>
359 Records are set of finite (name,value) bindings. They are used
360 in particular to represent XML attribute sets.
361 </p>
362
363 <p>
364 The syntax of a record expression is
365 <code>{ %%l1%% = %%e1%%; %%...%%; %%ln%% = %%en%% }</code>
366 where the <code>%%li%%</code> are label names (same lexical
367 conventions as for identifiers), and the <code>%%vi%%</code>
368 are expressions.
369 </p>
370
371 <p>
372 They are two kinds of record types. Open record types
373 are written <code>{ %%l1%% = %%t1%%; %%...%%; %%ln%% = %%tn%%
374 }</code>, and closed record types are written
375 <code>{ %%l1%% = %%t1%%; %%...%%; %%ln%% = %%tn%%
376 }</code>.
377 Both denote all the record values where
378 the labels <code>%%li%%</code> are present and the associated values
379 are in the corresponding type. The distinction is that that open
380 type allow extra fields, whereas the closed type gives a strict
381 enumeration of the possible fields.
382 </p>
383
384 <p>
385 Additionally, both for open and close record types,
386 it is possible to specify optional fields by using <code>=?</code>
387 instead of <code>=</code> between a label and a type.
388 For instance, <code>{| x = Int; y =? Bool |}</code>
389 represents records with an <code>x</code> field of type
390 <code>Int</code>, an optional field <code>y</code> (when it is
391 present, it has type <code>Bool</code>), and no other field.
392 </p>
393
394 <p>
395 Note that the value <code>{ x = 1; y = 2 }</code>
396 has actually the type <code>{| x = 1; y = 2 |}</code>
397 which is more precise than <code>{ x = 1; y = 2 }</code>. This is
398 the only situation where the singleton type corresponding to a constructed
399 value is not syntactically equal to this value.
400 </p>
401
402 <p>
403 The syntax is the same for patterns. Note that capture variables
404 cannot appear in an optional field.
405 </p>
406
407 </box>
408
409 <box title="XML elements" link="xml">
410
411 <p>
412 In CDuce, the general of an XML element is
413 <code>&lt;(%%tag%%) (%%attr%%)>%%content%%</code> where
414 <code>%%tag%%</code>,
415 <code>%%attr%%</code> and
416 <code>%%content%%</code> are three expressions.
417 Usually, <code>%%tag%%</code> is a tag literal <code>`%%xxx%%</code>, and
418 in this case, instead of writing <code>&lt;(`%%tag%%)></code>,
419 you can write: <code>&lt;%%tag%%></code>.
420 Similarly, when <code>%%attr%%</code> is a record literal, you can
421 omit the surrounding <code>({...})</code>.
422 E.g: <code>&lt;a href="http://...">[]</code>.
423 </p>
424
425 <p>
426 The syntax for XML elements types and patterns follows closely
427 the syntax for expressions:
428 <code>&lt;(%%tag%%) (%%attr%%)>%%content%%</code>
429 where
430 <code>%%tag%%</code>,
431 <code>%%attr%%</code> and
432 <code>%%content%%</code> are three types or patterns.
433 As for expressions, it is possible to simplify the notations
434 for tags and attributes. For instance,
435 <code>&lt;(`a) ({ href=String })>[]</code>
436 can be written:
437 <code>&lt;a href=String>[]</code>.
438 </p>
439
440 <p>
441 The following sample shows several way to write XML types.
442 </p>
443
444 <sample><![CDATA[
445 type A = <a x = String; y = String>[ A* ];;
446 type B = <(`x | `y)>[ ];;
447 type C = <c {| x = String; y = String |}>[ ];;
448 type U = { x = String; y =? String };;
449 type V = [ W* ];;
450 type W = <v (U)>V;;
451 ]]></sample>
452
453 </box>
454
455
456 <box title="Functions" link="fun">
457
458 <p>
459 CDuce is an higher-order functional languages: functions are
460 first-class citizen values, and can be passed as argument or returned
461 as result, stored in data structure, etc...
462 </p>
463
464 <p>
465 A functional type has the form <code>%%t%% -> %%s%%</code>
466 where <code>%%t%%</code> and <code>%%s%%</code> are types.
467 Intuitively, this type corresponds to functions that accept
468 (at least) any argument of type <code>%%t%%</code>, and for
469 such an argument, returns a value of type <code>%%s%%</code>.
470 For instance, the type <code>(Int,Int) -> Int &amp; (Char,Char) -> Char</code>
471 denotes functions that maps any pair of integer to an integer,
472 and any pair of characters to a character.
473 </p>
474
475 <p>
476 The explanation above gives the intuition behind the interpretation
477 of functional types. It is sufficient to understand which
478 subtyping relations and equivalences hold between (boolean
479 combination) of functional types. For instance,
480 <code>Int -> Int &amp; Char -> Char</code> is a subtype
481 of <code>(Int|Char) -> (Int|Char)</code> because
482 with the intuition above, a function of the first type,
483 when given a value of type <code>Int|Char</code> returns
484 a value of type <code>Int</code> or of type <code>Char</code>
485 (depending on the argument).
486 </p>
487
488 <p>
489 Formally, the type <code>%%t%% -> %%s%%</code> denotes
490 CDuce abstractions
491 <code>fun (%%t1%% -> %%s1%%; %%...%%; %%tn%% -> %%sn%%)...</code>
492 such that <code>%%t1%% -> %%s1%% &amp; %%...%% &amp; %%tn%% ->
493 %%sn%%</code> is a subtype of <code>%%t%% -> %%s%%</code>.
494 </p>
495
496 <p>
497 Functional types have no counterpart in patterns.
498 </p>
499
500 </box>
501
502 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5