/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1791 - (show annotations)
Tue Jul 10 19:23:09 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 33696 byte(s)
[r2005-07-30 20:14:39 by afrisch] Empty log message

Original author: afrisch
Date: 2005-07-30 20:14:39+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="ocaml">
3
4 <title>OCamlDuce</title>
5
6 <left>
7 <local-links href="index,documentation"/>
8 <p>On this page:</p>
9 <boxes-toc/>
10 </left>
11
12 <box>
13
14 <p>
15 OCamlDuce is a merger between <a
16 href="http://caml.inria.fr/">OCaml</a> and
17 <local href="index">CDuce</local>. It comes as a modified
18 version of OCaml which integrates CDuce features: expressions, types,
19 patterns.
20 </p>
21
22 <p>
23 OCamlDuce is distributed under the same licenses as Objective Caml:
24 the Q Public License version 1.0 for the Compiler, and the LGPL
25 version 2 for the Library. The extension has been written by Alain
26 Frisch. Parts of the CDuce implementation, by the same author, have
27 been reused.
28 </p>
29
30 </box>
31
32 <box title="Download and installation" link="install">
33
34 <p>
35 The build procedure for OCamlDuce is exactly the same as for OCaml:
36 <tt>configure, make world, make install</tt>. The names of the tools
37 are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce
38 is based on CVS snapshots of OCaml (between 3.08.3 and the current
39 <tt>release308</tt> branch) and CDuce (between 0.3.91 and the head).
40 </p>
41
42 <ul>
43 <li><a
44 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler,
45 version 0.0.5</a></li>
46 <!--<li><a
47 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support
48 library, version 0.0.4</a></li>-->
49 </ul>
50
51 <p>
52 GODI users can upgrade an existing installation by adding this
53 line to their <tt>etc/godi.conf</tt> file:
54 </p>
55 <sample>
56 GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
57 </sample>
58 <p>
59 and by forcing a recompilation of the <tt>godi-ocaml-src</tt>
60 and <tt>godi-ocaml</tt> packages. <!--They should also build
61 the <tt>godi-xml-support</tt> library.-->
62 </p>
63
64 <!--
65 <p>
66 Some simple examples can be found <a -->
67 <!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p>
68 -->
69
70 </box>
71
72 <box title="Overview" link="overview">
73
74 <p>
75 The goal of the OCamlDuce project is to extend the OCaml language with features
76 to make it easier to write safe and efficient complex applications
77 that need to deal with XML documents. In particular, it relies
78 on a notion of types and patterns to guarantee statically
79 that all the possible input documents are correctly processed, and
80 that only valid output documents are produced.
81 </p>
82
83 <p>
84 In a nutshell, OCamlDuce extends OCaml with a new kind of values
85 (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
86 strings. In order to describe these values, it also extends the type algebra
87 with so-called <em>x-types</em>. The philosophy behind these types is that they
88 represent <em>set of x-values</em>. They can be very precise: indeed,
89 each value can be seen as a singleton type (a set with a single
90 value), and it is possible to form Boolean combinations of x-types
91 (intersection, union, difference).
92 </p>
93
94 <p>
95 OCamlDuce's type system can be understood as a refinement of OCaml.
96 For each sub-expression which is inferred to be of the x-kind (using
97 OCaml unification based type-system), OCamlDuce will try to infer to
98 best possible sound x-type. Here, best means smallest for the natural
99 subtyping relation (set inclusion). The inference algorithm is
100 actually a data-flow analysis: the x-type will collect all the values
101 that can be produced by the expression, considering all the possible
102 data-flow in the program. It it sometimes necessary to provide
103 explicit type annotations to help the type checker infer this type, in
104 particular when you define recursive functions or when you use
105 iterators.
106 </p>
107
108 <p>
109 Subtyping is implicit for x-types: if an expression is inferred to be
110 of x-type <code>t</code>, which is a subtype of <code>s</code>, then
111 it is possible to use this expression in any context which expects a
112 value of type <code>s</code>.
113 </p>
114
115 </box>
116
117 <box title="Getting started" link="start">
118
119 <p>
120 Most of the new language features are enclosed within double curly braces
121 <code>{{ON}}{{...}}</code>. For instance, the following code sample
122 defines a value <code>x</code> as an XML element (with tag
123 <code>a</code>, an attribute <code>href</code>, and a simple
124 string as content):
125 </p>
126
127 <sample><![CDATA[{{ON}}
128 # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
129 val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
130 {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
131 ]]></sample>
132
133 <p>
134 What appears between the curly braces is called an x-expression.
135 Similarly, there are x-types (as seen above), and also x-patterns.
136 The delimiters <code>{{ON}}{{...}}</code> are only used
137 for syntactical reasons, to avoid clashed between OCaml and CDuce
138 syntaxes and lexical conventions. As a matter of fact,
139 an OCaml expression need not be a syntactical x-expression
140 (delimited by double curly braces) to evaluate to an x-value.
141 For instance, once <code>x</code> has been declared as above,
142 the expression <code>x</code> evaluates to an x-value.
143 </p>
144
145
146 <p>
147 It is possible to use an arbitrary
148 OCaml expression as part of an x-expression: it must simply be
149 protected by a new pair of double curly braces. For instance, there is
150 no <code>if-then-else</code> construction for x-expressions, but you
151 can write:
152 </p>
153
154 <sample><![CDATA[{{ON}}
155 # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
156 - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
157 ]]></sample>
158
159 <p>
160 Only the highlighted parts are parsed as x-expressions. The
161 <code>if-then-else</code> sub-expression is parsed as an OCaml
162 expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
163 'z' ]}}</code>).
164 </p>
165
166 </box>
167
168 <box title="X-values" link="values">
169
170 <p>
171 X-values are intended to represent XML documents and fragments
172 thereof: elements, tags, text, sequences. In this section, we
173 present the x-value algebra, the syntax of the corresponding
174 x-expression constructors and the associated x-types.
175 </p>
176
177 <p>
178 There are three kinds of atomic kind of x-values:
179 </p>
180 <ul>
181 <li>Unicode characters;</li>
182 <li>qualified names;</li>
183 <li>arbitrarily large integers.</li>
184 </ul>
185
186 <section title="Characters">
187
188 <p>
189 X-characters are different from OCaml characters. They can represent
190 the range of Unicode codepoints defined in the XML specification.
191 Character literals are delimited by single quotes. The escape
192 sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
193 numerical escape sequence are written <code>\n;</code> where n is an integer
194 literal (note the extra semi-colon). The source code is interpreted as
195 being encoded in iso-8859-1. As a consequence, Unicode characters which are not
196 part of the Latin1 character set must be introduced with this
197 numerical escape mechanism. The x-types for x-characters are:
198 </p>
199 <ul>
200 <li>singletons;</li>
201 <li>intervals, written <code>c -- d</code>, where <code>c</code> and
202 <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
203 }}</code>);</li>
204 <li>the type of all x-characters, written <code>Char</code>;</li>
205 <li>the type of all Latin1 characters, written <code>Latin1Char</code>
206 (defined as <code>\0; -- \255;</code>).</li>
207 </ul>
208
209 </section>
210
211 <section title="Integers">
212
213 <p>
214 X-integers are arbitrarily large. Literals must be written in decimal.
215 Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
216 The x-types for x-integers are:
217 </p>
218 <ul>
219 <li>singletons;</li>
220 <li>intervals, written <code>i -- j</code>, where <code>i</code> and
221 <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
222 }}</code>); it is possible to replace <code>i</code> or <code>j</code>
223 with <code>**</code> to define open-ended intervals, e.g.
224 <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
225 </li>
226 <li>the type of all x-integers, written <code>Int</code>;</li>
227 <li>the type of all the integers which can be represented by a
228 signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
229 <code>Int64</code>).</li>
230 </ul>
231
232 </section>
233
234 <section title="Qualified names">
235
236 <p>
237 Qualified names are intended to represent XML tag names. Conceptually,
238 they are made of a namespace URI and a local name. Since URIs tends
239 to be long, literals are of the form <code>`prefix:local</code>
240 where <code>local</code> is the local name and <code>prefix</code>
241 is an <em>namespace prefix</em> bound to some URI (in the scope of the
242 literal). The local name follows the definitions from
243 the XML Namespaces specification; a dot character must be protected
244 by a backslash and non-Latin1 characters are written as character
245 literals <code>\n;</code>. <a href="#ns">See below</a> for a
246 explanation on how to bind prefixes to URIs. To refer
247 to the default namespace (or the absence of namespace if not default
248 has been defined), the syntax is simply <code>`local</code>.
249 The x-types for qualified names are:
250 </p>
251 <ul>
252 <li>singletons;</li>
253 <li>the type of all qualified names, written <code>Atom</code>;</li>
254 <li>the type of all qualified names from a specified namespace,
255 written <code>`ns:*</code>.</li>
256 </ul>
257 </section>
258
259 <section title="Records">
260
261 <p>
262 X-records are mainly used to represent the set of attributes of an XML
263 element. An x-record is a binding from a finite set of <em>labels</em>
264 to x-values. Labels follows the same syntax as for qualified names
265 without the leading backquote. However, if the namespace prefix is not
266 given, the default namespace does not apply (the namespace URI is
267 empty). The syntax for record x-expressions is <code> { l1=e1
268 ... ln=en }</code> where the <code>li</code> are labels and the
269 <code>ei</code> are x-expressions. Fields can also be separated with a
270 semi-colon. It is legal to omit the expression for a field; the label is then
271 taken as the content of the field (a value with this name must be
272 defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
273 in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
274 y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
275 which labels are authorized/mandatory, and what the types of the
276 corresponding fields are. There are two kind of record x-types:
277 </p>
278
279 <ul>
280 <li>
281 Closed record types, which only allow a finite number of fields:
282 <code>{ l1=t1 ... ln=tn }</code>;
283 </li>
284 <li>
285 Open record types, which allow additional fields (with arbitrary
286 type):
287 <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
288 in the syntax).
289 </li>
290 </ul>
291
292 <p>
293 In both cases, it is possible to make one of
294 the fields optional by changing = to =?.
295 </p>
296
297 <p>
298 The x-type of all x-record is thus <code>{ .. }</code>,
299 and the x-type of x-records with maybe a field <code>l</code>
300 of type <code>Int</code> and maybe arbitrary other fields is
301 <code>{ l=?Int .. }</code>.
302 </p>
303
304 </section>
305
306 <section title="Sequences">
307
308 <p>
309 X-sequences are finite and ordered collections of x-values.
310 The syntax for a sequence x-expression in
311 <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
312 by semi-colons as in OCaml list). Each item <code>ei</code>
313 can either be:
314 </p>
315 <ul>
316 <li>an x-expression;</li>
317 <li><code>!e</code> where <code>e</code> is an x-expression which
318 evaluates to a sequence (whose content is inserted in the sequence
319 which is currently defined); e.g.
320 <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
321 <code>[ 1 2 3 4 ]</code>;</li>
322 <li>a string literal delimited by simple quotes; e.g.
323 <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
324 </ul>
325
326 <p>
327 X-types for sequences are of the form <code>[R]</code>
328 where <code>R</code> is a regular expression over x-types which
329 describe the possible contents of the sequences. The possible
330 forms of regular expressions are:
331 </p>
332
333 <ul>
334 <li><code>t</code> (one single element of x-type <code>t</code>)</li>
335 <li><code>R*</code> (zero or more repetitions)</li>
336 <li><code>R+</code> (one or more repetitions)</li>
337 <li><code>R?</code> (zero or one repetition)</li>
338 <li><code>R1 R2</code> (sequence)</li>
339 <li><code>R1|R2</code> (alternation)</li>
340 <li><code>(R)</code></li>
341 <li><code>/t</code> (guard: the tail of the sequence must comply with
342 <code>t</code>).</li>
343 <li><code>PCDATA</code> (equivalent to Char*).</li>
344 </ul>
345
346 <note>sequence are actually encoded with embedded pairs and a
347 terminator, and sequences types are encoded with product types and
348 recursive types. The encoding is available to the programmer
349 but not described in this manual.
350 </note>
351
352 </section>
353
354 <section title="Strings">
355
356 <p>
357 Strings are nothing but sequences of characters. There are two
358 predefined types <code>String</code> and <code>Latin1</code>
359 (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
360 </p>
361
362 <p>
363 A string literal <code>[ '...' ]</code> can also be written
364 <code>"..." </code> (without the square brackets). Note that simple
365 (resp. double) quotes need to be escaped only when the string is
366 delimited with double (resp. simple) quotes.
367 </p>
368
369 </section>
370
371 <section title="XML elements">
372
373 <p>
374 An XML element is a triple of x-values. The syntax for
375 the corresponding x-expression constructor is
376 <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
377 qualified name literal, it is possible to omit the leading
378 backquote and the surrounding parentheses. Similarly,
379 when <code>e2</code> is an x-record literal, it is possible
380 to omit the curly braces and the parentheses. For instance,
381 one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
382 instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
383 </p>
384
385 <p>
386 XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
387 and the same simplifications applies. For instance, if
388 the namespace prefix <code>ns</code> has been defined,
389 the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
390 it describes XML elements whose tag is in the namespace bound to
391 <code>ns</code>, with an empty content, and with an arbitrary set of
392 attributes. An underscore in place of <code>(t1)</code> is
393 equivalent to <code>(Atom)</code> (any tag).
394 </p>
395
396 </section>
397
398 </box>
399
400 <box title="X-expressions" link="expr">
401
402 <p>
403 In the previous section, we have seen the syntax for x-values
404 constructors (constant literals, sequence, record, element constructors).
405 In this section, we describe the other kinds of x-expressions.
406 </p>
407
408 <section title="Binary infix operators">
409
410 <p>
411 The arithmetic operators on integers follow the usual precedence.
412 They are written <code>+,*,-,div,mod</code> (they are all infix).
413 </p>
414
415 <p>
416 Record concatenation: <code>e1 ++ e2</code>. The x-expressions
417 <code>e1</code> and <code>e2</code> must evaluate to x-records.
418 The result is obtained by concatening them. If a field with the same
419 label is present in both records, the right-most one is selected.
420 </p>
421
422 <p>
423 Sequence concatenation: <code>e1 @ e2</code>, equivalent
424 to <code>[!e1 !e2]</code>.
425 </p>
426
427 </section>
428
429 <section title="Projections, filtering">
430
431 <p>
432 If the x-expression <code>e</code> evaluates to a record or an XML
433 element, the construction <code>e.l</code> will extract the value of
434 field or attribute <code>l</code>. Similarly, the construction
435 <code>e.?l</code> will extract the value of field or attribute
436 <code>l</code> if present, and return the empty sequence
437 <code>[]</code> otherwise.
438 </p>
439
440 <p>
441 If the x-expression <code>e</code> evaluates to a record,
442 the construction <code>e -. l</code> will produce a new record
443 where the field <code>l</code> has been removed (if present).
444 </p>
445
446 <p>
447 If the x-expression <code>e</code> evaluates to an x-sequence,
448 the construction <code>e/</code> will result in a new x-sequence
449 obtained by taking in order all the children of the XML elements
450 from the sequence <code>e</code>. For instance, the x-expression
451 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
452 evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
453 </p>
454
455 <p>
456 If the x-expression <code>e</code> evaluates to an x-sequence,
457 the construction <code>e.(t)</code> (where <code>t</code> is an
458 x-type) will result in a new x-sequence
459 obtained by filtering <code>e</code> to keep only the elements
460 of type <code>t</code>. For instance, the x-expression
461 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
462 evaluates to the x-value <code>[ 4 5 ]</code>.
463 </p>
464 </section>
465
466 <section title="Dynamic type checking">
467
468 <p>
469 If <code>e</code> is an x-expression and <code>t</code> is an x-type,
470 the construction <code>(e :? t)</code> returns the same
471 result as <code>e</code> if it has type <code>t</code>, and otherwise
472 raises a <code>Failure</code> exception whose argument explains
473 why this is not the case.
474 </p>
475
476 <sample><![CDATA[{{ON}}
477 # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
478 f {{ <a>[ 1 2 '3' ] }};;
479 Exception:
480 Failure
481 "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
482 ]]></sample>
483 </section>
484
485 <section title="Pattern matching">
486
487 <p>
488 OCamlDuce comes with a powerful pattern matching operation.
489 X-patterns are described <a href="#patterns">below</a>.
490 The syntax for the pattern matching operation is:
491 <code>match e with p1 -> e1 | ... | pn -> en</code>.
492 The type-system ensures exhaustivivity for the pattern matching
493 and infers precise types for the capture variables.
494 It is also possile to use x-pattern matching as a regular
495 OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
496 match e with {{p1}} -> e1 | ... | {{pn}} -> en
497 function {{p1}} -> e1 | ... | {{pn}} -> en
498 </p>
499
500 <note>
501 currently it is impossible to mix normal OCaml patterns and x-patterns
502 in a single pattern matching.
503 </note>
504
505 </section>
506
507 <section title="Local binding">
508
509 <p>
510 The x-expression <code>let p=e1 in e2</code> is equivalent to
511 <code>match e1 with p -> e2</code>. There is also an local binding
512 with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
513 e2</code>.
514 </p>
515
516 </section>
517
518
519 <section title="Iterators">
520
521 <p>
522 OCamlDuce comes with a sequence iterator
523 <code>map e with p1 -> e1 | ... | pn -> en</code> and
524 a tree iterator
525 <code>map* e with p1 -> e1 | ... | pn -> en</code>.
526 </p>
527
528 <p>
529 For both constructions, the argument must evaluate to a sequence.
530 The <code>map</code> iterator applies the patterns to each element
531 of this sequence in turns and produces a new sequence by concatenating
532 all the results (all the right-hand sides must thus produce a
533 sequence). The set of patterns must be exhaustive for all the possible
534 elements of the input sequence.
535 </p>
536
537 <p>
538 The tree iterator is similar except that the patterns need not be
539 exhaustive. If some element of the input sequence is not matched,
540 it is simply copied into the result unless it is an XML element. In
541 this case, the transformation is applied recursively to its content.
542 </p>
543
544 </section>
545
546 <section title="OCaml constructions">
547
548 <p>
549 As a convenience, some of the OCaml expression constructors
550 are allowed as x-expressions (without a need to go back to OCaml
551 with double curly braces): (unqualified) value identifiers and
552 function calls.
553 </p>
554
555 </section>
556
557 </box>
558
559 <box title="More on x-types" link="types">
560
561 <p>
562 We have seen how to write simple x-types. We can then combine
563 them with Boolean connectives:
564 </p>
565
566 <ul>
567 <li><code>t1 &amp; t2</code>: intersection;</li>
568 <li><code>t1 | t2</code>: union;</li>
569 <li><code>t1 - t2</code>: difference.</li>
570 </ul>
571
572 <p>
573 The empty x-type is written <code>Empty</code> (it contains no value),
574 and the universal x-type is written <code>Any</code> (it contains
575 all the x-values) or <code>_</code>.
576 </p>
577
578 <p>
579 When an x-type has been bound to some OCaml identifier
580 (<code>{{ON}}type t = {{...}}</code>), it is possible to use
581 this identifier in another x-type. Recursive definitions
582 are allowed:
583 </p>
584
585 <sample><![CDATA[{{ON}}
586 type t1 = {{ <a>[ t2* ] }}
587 and t2 = {{ <b>[ t1* ] }}
588 ]]></sample>
589
590 <p>
591 Note that x-values are always finite and acyclic. The type checker
592 detects type definition which would yield empty types:
593 </p>
594
595 <sample><![CDATA[{{ON}}
596 # type t = {{ <a>[ t+ ] }};;
597 This definition yields an empty type
598 ]]></sample>
599
600 <p>
601 If <code>t1</code> and <code>t2</code> are record x-types,
602 we can combine them with the infix <code>++</code> operator, which
603 mimics the corresponding operator on expressions (record
604 concatenation). Similarly, we can use the infix <code>@</code>
605 concatenation operator on sequence x-types.
606 </p>
607
608 </box>
609
610 <box title="X-patterns" link="patterns">
611
612 <p>
613 X-patterns follow the same syntax as X-types. In particular,
614 any X-type is a valid X-pattern. In addition to X-types constructors,
615 X-patterns can have:
616 </p>
617
618 <ul>
619 <li>capture variables (lowercase OCaml identifiers);</li>
620 <li>constant bindings <code>(x := c)</code> where x is a capture
621 variable and c is
622 a literal x-constant (this pattern always succeeds and returns the
623 binding x->c).</li>
624 </ul>
625
626 <p>
627 In record x-patterns, it is possible to omit the <code>=p</code> part of a field.
628 The content is then replaced with the label name considered as
629 a capture variable. E.g. <code>{ x y=p }</code> is equivalent to
630 <code>{ x=x y=p }</code>.</p>
631
632 <p>It is also possible to add an "else" clause:
633 <code>{ x = (a,_)|(a:=3) }</code>
634 will accept any record with atmost the field <code>x</code>. If the content
635 is a pair, the capture variable a will be bound to its component;
636 otherwise, it is set to <code>3</code>.</p>
637
638 <p>
639 In regular expressions, it is possible to extract whole subsequences
640 with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
641 </p>
642
643 <p>
644 If the same sequence capture variable appears several times (or below a
645 repetition) in a regexp, it is bound to the concatenation of all
646 matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
647 collect in <code>x</code> all the elements of type <code>Int</code> from
648 a sequence.</p>
649
650 <p>
651 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
652 as possible). They admit non-greedy variants <code>+?,*?,??</code>.
653 </p>
654 </box>
655
656 <box title="Namespace bindings" link="ns">
657
658 <p>
659 The binding of namespace prefixes to URIs
660 can be done either by toplevel phrases (structure items) or
661 by local declarations:
662 </p>
663
664 <sample>{{ON}}
665 # {{ namespace ns = "http://..." }};;
666 # let x = {{ `ns: x }};;
667 val x : {{`ns:x}} = {{`ns:x}}
668 # let x = {{ let namespace ns = "http://..." in `ns:x }};;
669 val x : {{`ns:x}} = {{`ns:x}}
670 </sample>
671
672 <p>The toplevel definitions can also appear in module interfaces
673 (signatures). A toplevel prefix binding is not exported by a module: its scope
674 is limited to the current structure or signature. It is possible
675 to specify a default namespace, and to reset it:
676 </p>
677
678 <sample>{{ON}}
679 # {{ namespace "http://..." }};;
680 # {{ `x }};;
681 - : {{`ns1:x}} = {{`ns1:x}}
682 # {{ namespace "" }};;
683 # {{ `x }};;
684 - : {{`x}} = {{`x}}
685 </sample>
686
687 <p>
688 Note that the value pretty-printer invented some prefix
689 for the namespace URI. The default prefix declaration also have a
690 local form <code> let namespace "..." in ... </code>.
691 </p>
692
693 </box>
694
695 <box title="More on type-checking" link="typecheck">
696
697 <section title="Type inference">
698
699 <p>
700 As we said above, the programmer is sometimes required to provide type
701 annotations. To know where to put these annotation, it is necessary to
702 get a basic understanding of how type-checking works.
703 </p>
704
705 <p>
706 The OCaml type-checker is run first to detect which sub-expressions
707 are of the x-kind. A second ML type-checking pass is then done to
708 introduce subsumption (implicit subtyping) steps where allowed. After
709 these two passes, the OCamlDuce type checker obtains a data-flow summary of
710 x-values in the whole compilation unit. This is a directed graph,
711 whose edges represent either simple data-flow or complex operation
712 on x-values. The nodes of the graph can be thought as x-type
713 variables. A data-flow edge corresponds to a subtyping constraints,
714 and an operation edge corresponds to a symbolic constraints which
715 mimics the corresponding operation on values.
716 </p>
717
718 <p>
719 Some of the nodes are given an explicit type by the programmer,
720 through type annotations (on expressions or function arguments)
721 or the other usual mechanism in ML (data type declarations,
722 signatures, ...).
723 </p>
724
725 <p>
726 Also, if there is a loop with only subtyping edges in the graph,
727 all the nodes on the loop are merged together.
728 </p>
729
730 <p>
731 After this operation, the graph is required to be acyclic (assuming
732 that the nodes with an explicit type are removed from the graph). It
733 is the responsibility of the programmer to provide enough type
734 annotation to achieve this property. Otherwise, a type error
735 is issued.
736 </p>
737
738 <sample><![CDATA[{{ON}}
739 # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
740 Cycle detected: cannot type-check
741 # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
742 val f : int -> {{String}} = <fun>]]>
743 </sample>
744
745 <p>
746 In the example above, there is a cycle between the result type for
747 <code>f</code> and the type for the sub-expression <code>{{ON}}f
748 {{n-1}}</code>. It is here broken with a type annotation on the result; it could
749 have been broken by a type annotation on the expression <code>{{ON}}f
750 {{n-1}}</code>, or on the function <code>f</code> itself, or by a
751 module signature.
752 </p>
753
754 <p>
755 Let us study another simple example:
756 </p>
757
758 <sample>{{ON}}
759 # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
760 - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
761 </sample>
762
763 <p>
764 The type-checkers detects that the two x-values <code>2</code> and
765 <code>3</code> can flow to the argument of <code>f</code>. Its body
766 is thus type-checked with the assumption that <code>x</code> has type
767 <code>2--3</code>. The computed result type is then <code>3--4</code>.
768 </p>
769
770
771 <p>
772 The type-inference process described above is global by nature. The
773 acyclicity condition is only imposed after a whole compilation unit
774 has been type-checked by OCaml (and the information from the module
775 interface as been integrated). When a type variable is inferred to
776 be of the x-kind, it is never generalized. As a consequence, there
777 is no parametric polymorphism on x-types.
778 </p>
779
780 <p>
781 In the toplevel, type-checking is done after each phrase. Consider
782 the following session:
783 </p>
784
785 <sample><![CDATA[{{ON}}
786 # let f x = {{ x + 1 }};;
787 val f : {{Empty}} -> {{Empty}} = <fun>
788 # let a = f {{ 2 }};;
789 Subtyping failed 2 <= Empty
790 Sample:
791 2
792 ]]></sample>
793
794 <p>
795 The function <code>f</code> is inferred to have type
796 <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
797 phrase is type-checked, the data-flow graph says that no value
798 can flow to <code>x</code>, and thus the input type is empty
799 (and similarly for the result type). If the two phrases
800 were type-checked together (which would be the case it they had
801 been compiled by the compiler, not in the toplevel), the type checker
802 would have correctly inferred that the input type for <code>f</code>
803 must contain <code>2</code>.
804 </p>
805
806 </section>
807
808 <section title="Implicit subtyping">
809
810 <p>
811 Coercion from an x-type to a super type is automatic in OCamlDuce.
812 However, this automatic subsumption does not carry over to OCaml
813 type constructor, even if there are covariant. Consider:
814 </p>
815
816 <sample><![CDATA[{{ON}}
817 # let f (x : {{ Int }} * {{ Int }}) = 1;;
818 val f : {{Int}} * {{Int}} -> int = <fun>
819 # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
820 This expression has type {{0}} * {{0}} but is here used with type
821 {{Int}} * {{Int}}
822 # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
823 val g : {{0}} * {{0}} -> int = <fun>
824 # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
825 val g : {{0}} * {{0}} -> int = <fun>
826 ]]></sample>
827
828 <p>
829 The first attempt to define <code>g</code> fails because the type for
830 <code>x</code> is not an x-type and thus subsumption does not
831 apply. In the second attempt, we extract the two components of the
832 pair; since they are inferred to be x-values, subtyping applies to
833 both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
834 it is legal to unify its type with the input type of <code>f</code>.
835 The third definition for <code>g</code> gives an alternative solution:
836 using explicit OCaml type coercions.
837 </p>
838
839 </section>
840
841 </box>
842
843 <box title="Exchanging values" link="transl">
844
845 <p>
846 OCamlDuce strongly seperates regular OCaml values from the new
847 x-values. They have different syntax, expressions, types, patterns,
848 and even type-checking algorithms. This strong segregation is key point
849 which allowed a simple integration between very different type
850 systems.
851 </p>
852
853 <p>
854 At some point, it is still necessary to cross the frontier and
855 translate OCaml values to x-values or the opposite.
856 </p>
857
858 <p>
859 Fortunately, OCamlDuce provides automatic translations in both
860 directions. Instead of double curly braces, you can
861 enclose x-expressions in curly brace+colon <code>{: ... :}</code>
862 (here, the <code>...</code> is an x-expression).
863 The effect is to translate the result of the x-expression
864 (which must be an x-value) to an OCaml value. Similarly,
865 in an x-expression, you can obtain the x-translation of
866 an OCaml value with the same syntax <code>{: ... :}</code>
867 (here, the <code>...</code> is an OCaml expression).
868 </p>
869
870 <p>
871 Here is how the translation works. To each OCaml type <code>t</code>,
872 we associate an x-type <code>T(t)</code> and a pair of translation
873 function between <code>t</code> and <code>T(t)</code>.
874 Actually, not all the features are supported. For instance,
875 free type variables, abstract types, object types, non-regular
876 recursive types cannot be translated. In particular, since
877 type variables are not allowed, the OCaml type must be fully known.
878 </p>
879
880 <p>
881 The translation for an OCaml type <code>t</code> is defined by structural
882 induction on <code>t</code>. Sum types are
883 translated to union types: a constant constructor <code>A</code> is
884 translated to the qualified name <code>`A</code>; a non-constant
885 constructor <code>A of t1 * ... * tn</code> is translated to
886 <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
887 have the same translation. Record types are translated to closed
888 record x-types. Some other translations:
889 </p>
890
891 <table border="1">
892 <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
893 <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
894 <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
895 <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
896 <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
897 <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
898 <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
899 <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
900 <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
901 <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
902 </table>
903
904 <p>
905 Here is an example:
906 </p>
907
908 <sample>{{ON}}
909 # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
910 - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
911 </sample>
912
913 <p>
914 In this example, the result type of the translation is inferred
915 to be <code>{{ON}}{{ Int }} list</code> (because the type for
916 <code>f</code> is given). The corresponding x-type
917 is <code>{{ON}}{{ [Int*] }}</code>.
918 </p>
919
920 </box>
921
922 <box title="The standard library" link="stdlib">
923
924 <p>
925 In OCamlDuce, the Num library from OCaml is included in the standard
926 library. In addition, there are two new module called
927 <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
928 </p>
929
930 <p>
931 The module <code>Cduce_types</code> gives access to the internal
932 representation of x-values. It is currently undocumented.
933 </p>
934
935 <p>
936 The module <code>Ocamlduce</code> provides several useful
937 functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
938 documentation for a description of its interface.
939 </p>
940
941 </box>
942
943 <box title="Code samples" link="code">
944
945
946 <section title="Parsing XML files">
947
948 <p>
949 OCamlDuce does not come with any built-in XML parser. However,
950 the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
951 makes it easy to plug existing XML parsers. Here is some
952 code which demonstrate how to do that with three of
953 the most popular OCaml XML parser libraries:
954 </p>
955
956 <ul>
957 <li><a
958 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
959 <li><a
960 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
961 <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
962 </ul>
963
964 </section>
965
966 <section title="Converting DTD to OCamlDuce types">
967
968 <p>
969 This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
970 from a DTD. It requires PXP.
971 </p>
972
973 <note>This application does not use any of the new features, but it
974 can be useful in the development of OCamlDuce applications.
975 </note>
976
977 </section>
978
979 <section title="Parsing XML Schema, producing valid XHTML output">
980
981 <p>
982 This <a
983 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
984 parses XML Schema Definitions (.xsd files), and produces summaries
985 (toplevel declaration names) in XHTML. OCamlDuce type system ensures
986 that the parser is coherent with the input XML type (any valid XML
987 Schema is accepted) and that the printer is coherent with the output
988 XML type (it is necessarily a valid XHTML document).
989 </p>
990
991 <p>
992 Of course, for such a simple transformation, parsing the XML document
993 into an internal representation is not necessary. A direct XML-to-XML
994 transformation would be easy to write. We wanted to illustrate
995 a complex parsing of XML.
996 </p>
997
998 <p>
999 It it interesting to introduce errors in the parser
1000 <code>schema_loader.ml</code> or the printer
1001 <code>dump_schema.ml</code> and see how the type system catch them.
1002 </p>
1003
1004 <note>
1005 The application uses XML Light to parse XML document.
1006 </note>
1007
1008 <note>
1009 Some features of XML Schema are not parsed, such as
1010 <code>redefine</code> elements or substitution groups.
1011 </note>
1012
1013 </section>
1014
1015 </box>
1016
1017 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5