/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1789 - (show annotations)
Tue Jul 10 19:23:03 2007 UTC (5 years, 11 months ago) by abate
File MIME type: text/xml
File size: 33014 byte(s)
[r2005-07-30 19:49:01 by afrisch] Empty log message

Original author: afrisch
Date: 2005-07-30 19:49:01+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="ocaml">
3
4 <title>OCamlDuce</title>
5
6 <left>
7 <local-links href="index,documentation"/>
8 <p>On this page:</p>
9 <boxes-toc/>
10 </left>
11
12 <box>
13
14 <p>
15 OCamlDuce is a merger between <a
16 href="http://caml.inria.fr/">OCaml</a> and
17 <local href="index">CDuce</local>. It comes as a modified
18 version of OCaml which integrates CDuce features: expressions, types,
19 patterns.
20 </p>
21
22 </box>
23
24 <box title="Download and installation" link="install">
25
26 <p>
27 The build procedure for OCamlDuce is exactly the same as for OCaml:
28 <tt>configure, make world, make install</tt>. The names of the tools
29 are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce
30 is based on CVS snapshots of OCaml (between 3.08.3 and the current
31 <tt>release308</tt> branch) and CDuce (between 0.3.91 and the head).
32 </p>
33
34 <ul>
35 <li><a
36 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler,
37 version 0.0.5</a></li>
38 <!--<li><a
39 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support
40 library, version 0.0.4</a></li>-->
41 </ul>
42
43 <p>
44 GODI users can upgrade an existing installation by adding this
45 line to their <tt>etc/godi.conf</tt> file:
46 </p>
47 <sample>
48 GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
49 </sample>
50 <p>
51 and by forcing a recompilation of the <tt>godi-ocaml-src</tt>
52 and <tt>godi-ocaml</tt> packages. <!--They should also build
53 the <tt>godi-xml-support</tt> library.-->
54 </p>
55
56 <!--
57 <p>
58 Some simple examples can be found <a -->
59 <!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p>
60 -->
61
62 </box>
63
64 <box title="Overview" link="overview">
65
66 <p>
67 In a nutshell, OCamlDuce extends OCaml with a new kind of values
68 (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
69 strings. In order to describe these values, it also extends the type algebra
70 with so-called <em>x-types</em>. The philosophy behind these types is that they
71 represent <em>set of x-values</em>. They can be very precise: indeed,
72 each value can be seen as a singleton type (a set with a single
73 value), and it is possible to form Boolean combinations of x-types
74 (intersection, union, difference).
75 </p>
76
77 <p>
78 OCamlDuce's type system can be understood as a refinement of OCaml.
79 For each sub-expression which is inferred to be of the x-kind (using
80 OCaml unification based type-system), OCamlDuce will try to infer to
81 best possible sound x-type. Here, best means smallest for the natural
82 subtyping relation (set inclusion). The inference algorithm is
83 actually a data-flow analysis: the x-type will collect all the values
84 that can be produced by the expression, considering all the possible
85 data-flow in the program. It it sometimes necessary to provide
86 explicit type annotations to help the type checker infer this type, in
87 particular when you define recursive functions or when you use
88 iterators.
89 </p>
90
91 <p>
92 Subtyping is implicit for x-types: if an expression is inferred to be
93 of x-type <code>t</code>, which is a subtype of <code>s</code>, then
94 it is possible to use this expression in any context which expects a
95 value of type <code>s</code>.
96 </p>
97
98 </box>
99
100 <box title="Getting started" link="start">
101
102 <p>
103 Most of the new language features are enclosed within double curly braces
104 <code>{{ON}}{{...}}</code>. For instance, the following code sample
105 defines a value <code>x</code> as an XML element (with tag
106 <code>a</code>, an attribute <code>href</code>, and a simple
107 string as content):
108 </p>
109
110 <sample><![CDATA[{{ON}}
111 # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
112 val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
113 {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
114 ]]></sample>
115
116 <p>
117 What appears between the curly braces is called an x-expression.
118 Similarly, there are x-types (as seen above), and also x-patterns.
119 The delimiters <code>{{ON}}{{...}}</code> are only used
120 for syntactical reasons, to avoid clashed between OCaml and CDuce
121 syntaxes and lexical conventions. As a matter of fact,
122 an OCaml expression need not be a syntactical x-expression
123 (delimited by double curly braces) to evaluate to an x-value.
124 For instance, once <code>x</code> has been declared as above,
125 the expression <code>x</code> evaluates to an x-value.
126 </p>
127
128
129 <p>
130 It is possible to use an arbitrary
131 OCaml expression as part of an x-expression: it must simply be
132 protected by a new pair of double curly braces. For instance, there is
133 no <code>if-then-else</code> construction for x-expressions, but you
134 can write:
135 </p>
136
137 <sample><![CDATA[{{ON}}
138 # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
139 - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
140 ]]></sample>
141
142 <p>
143 Only the highlighted parts are parsed as x-expressions. The
144 <code>if-then-else</code> sub-expression is parsed as an OCaml
145 expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
146 'z' ]}}</code>).
147 </p>
148
149 </box>
150
151 <box title="X-values" link="values">
152
153 <p>
154 X-values are intended to represent XML documents and fragments
155 thereof: elements, tags, text, sequences. In this section, we
156 present the x-value algebra, the syntax of the corresponding
157 x-expression constructors and the associated x-types.
158 </p>
159
160 <p>
161 There are three kinds of atomic kind of x-values:
162 </p>
163 <ul>
164 <li>Unicode characters;</li>
165 <li>qualified names;</li>
166 <li>arbitrarily large integers.</li>
167 </ul>
168
169 <section title="Characters">
170
171 <p>
172 X-characters are different from OCaml characters. They can represent
173 the range of Unicode codepoints defined in the XML specification.
174 Character literals are delimited by single quotes. The escape
175 sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
176 numerical escape sequence are written <code>\n;</code> where n is an integer
177 literal (note the extra semi-colon). The source code is interpreted as
178 being encoded in iso-8859-1. As a consequence, Unicode characters which are not
179 part of the Latin1 character set must be introduced with this
180 numerical escape mechanism. The x-types for x-characters are:
181 </p>
182 <ul>
183 <li>singletons;</li>
184 <li>intervals, written <code>c -- d</code>, where <code>c</code> and
185 <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
186 }}</code>);</li>
187 <li>the type of all x-characters, written <code>Char</code>;</li>
188 <li>the type of all Latin1 characters, written <code>Latin1Char</code>
189 (defined as <code>\0; -- \255;</code>).</li>
190 </ul>
191
192 </section>
193
194 <section title="Integers">
195
196 <p>
197 X-integers are arbitrarily large. Literals must be written in decimal.
198 Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
199 The x-types for x-integers are:
200 </p>
201 <ul>
202 <li>singletons;</li>
203 <li>intervals, written <code>i -- j</code>, where <code>i</code> and
204 <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
205 }}</code>); it is possible to replace <code>i</code> or <code>j</code>
206 with <code>**</code> to define open-ended intervals, e.g.
207 <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
208 </li>
209 <li>the type of all x-integers, written <code>Int</code>;</li>
210 <li>the type of all the integers which can be represented by a
211 signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
212 <code>Int64</code>).</li>
213 </ul>
214
215 </section>
216
217 <section title="Qualified names">
218
219 <p>
220 Qualified names are intended to represent XML tag names. Conceptually,
221 they are made of a namespace URI and a local name. Since URIs tends
222 to be long, literals are of the form <code>`prefix:local</code>
223 where <code>local</code> is the local name and <code>prefix</code>
224 is an <em>namespace prefix</em> bound to some URI (in the scope of the
225 literal). The local name follows the definitions from
226 the XML Namespaces specification; a dot character must be protected
227 by a backslash and non-Latin1 characters are written as character
228 literals <code>\n;</code>. <a href="#ns">See below</a> for a
229 explanation on how to bind prefixes to URIs. To refer
230 to the default namespace (or the absence of namespace if not default
231 has been defined), the syntax is simply <code>`local</code>.
232 The x-types for qualified names are:
233 </p>
234 <ul>
235 <li>singletons;</li>
236 <li>the type of all qualified names, written <code>Atom</code>;</li>
237 <li>the type of all qualified names from a specified namespace,
238 written <code>`ns:*</code>.</li>
239 </ul>
240 </section>
241
242 <section title="Records">
243
244 <p>
245 X-records are mainly used to represent the set of attributes of an XML
246 element. An x-record is a binding from a finite set of <em>labels</em>
247 to x-values. Labels follows the same syntax as for qualified names
248 without the leading backquote. However, if the namespace prefix is not
249 given, the default namespace does not apply (the namespace URI is
250 empty). The syntax for record x-expressions is <code> { l1=e1
251 ... ln=en }</code> where the <code>li</code> are labels and the
252 <code>ei</code> are x-expressions. Fields can also be separated with a
253 semi-colon. It is legal to omit the expression for a field; the label is then
254 taken as the content of the field (a value with this name must be
255 defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
256 in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
257 y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
258 which labels are authorized/mandatory, and what the types of the
259 corresponding fields are. There are two kind of record x-types:
260 </p>
261
262 <ul>
263 <li>
264 Closed record types, which only allow a finite number of fields:
265 <code>{ l1=t1 ... ln=tn }</code>;
266 </li>
267 <li>
268 Open record types, which allow additional fields (with arbitrary
269 type):
270 <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
271 in the syntax).
272 </li>
273 </ul>
274
275 <p>
276 In both cases, it is possible to make one of
277 the fields optional by changing = to =?.
278 </p>
279
280 <p>
281 The x-type of all x-record is thus <code>{ .. }</code>,
282 and the x-type of x-records with maybe a field <code>l</code>
283 of type <code>Int</code> and maybe arbitrary other fields is
284 <code>{ l=?Int .. }</code>.
285 </p>
286
287 </section>
288
289 <section title="Sequences">
290
291 <p>
292 X-sequences are finite and ordered collections of x-values.
293 The syntax for a sequence x-expression in
294 <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
295 by semi-colons as in OCaml list). Each item <code>ei</code>
296 can either be:
297 </p>
298 <ul>
299 <li>an x-expression;</li>
300 <li><code>!e</code> where <code>e</code> is an x-expression which
301 evaluates to a sequence (whose content is inserted in the sequence
302 which is currently defined); e.g.
303 <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
304 <code>[ 1 2 3 4 ]</code>;</li>
305 <li>a string literal delimited by simple quotes; e.g.
306 <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
307 </ul>
308
309 <p>
310 X-types for sequences are of the form <code>[R]</code>
311 where <code>R</code> is a regular expression over x-types which
312 describe the possible contents of the sequences. The possible
313 forms of regular expressions are:
314 </p>
315
316 <ul>
317 <li><code>t</code> (one single element of x-type <code>t</code>)</li>
318 <li><code>R*</code> (zero or more repetitions)</li>
319 <li><code>R+</code> (one or more repetitions)</li>
320 <li><code>R?</code> (zero or one repetition)</li>
321 <li><code>R1 R2</code> (sequence)</li>
322 <li><code>R1|R2</code> (alternation)</li>
323 <li><code>(R)</code></li>
324 <li><code>/t</code> (guard: the tail of the sequence must comply with
325 <code>t</code>).</li>
326 <li><code>PCDATA</code> (equivalent to Char*).</li>
327 </ul>
328
329 <note>sequence are actually encoded with embedded pairs and a
330 terminator, and sequences types are encoded with product types and
331 recursive types. The encoding is available to the programmer
332 but not described in this manual.
333 </note>
334
335 </section>
336
337 <section title="Strings">
338
339 <p>
340 Strings are nothing but sequences of characters. There are two
341 predefined types <code>String</code> and <code>Latin1</code>
342 (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
343 </p>
344
345 <p>
346 A string literal <code>[ '...' ]</code> can also be written
347 <code>"..." </code> (without the square brackets). Note that simple
348 (resp. double) quotes need to be escaped only when the string is
349 delimited with double (resp. simple) quotes.
350 </p>
351
352 </section>
353
354 <section title="XML elements">
355
356 <p>
357 An XML element is a triple of x-values. The syntax for
358 the corresponding x-expression constructor is
359 <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
360 qualified name literal, it is possible to omit the leading
361 backquote and the surrounding parentheses. Similarly,
362 when <code>e2</code> is an x-record literal, it is possible
363 to omit the curly braces and the parentheses. For instance,
364 one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
365 instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
366 </p>
367
368 <p>
369 XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
370 and the same simplifications applies. For instance, if
371 the namespace prefix <code>ns</code> has been defined,
372 the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
373 it describes XML elements whose tag is in the namespace bound to
374 <code>ns</code>, with an empty content, and with an arbitrary set of
375 attributes. An underscore in place of <code>(t1)</code> is
376 equivalent to <code>(Atom)</code> (any tag).
377 </p>
378
379 </section>
380
381 </box>
382
383 <box title="X-expressions" link="expr">
384
385 <p>
386 In the previous section, we have seen the syntax for x-values
387 constructors (constant literals, sequence, record, element constructors).
388 In this section, we describe the other kinds of x-expressions.
389 </p>
390
391 <section title="Binary infix operators">
392
393 <p>
394 The arithmetic operators on integers follow the usual precedence.
395 They are written <code>+,*,-,div,mod</code> (they are all infix).
396 </p>
397
398 <p>
399 Record concatenation: <code>e1 ++ e2</code>. The x-expressions
400 <code>e1</code> and <code>e2</code> must evaluate to x-records.
401 The result is obtained by concatening them. If a field with the same
402 label is present in both records, the right-most one is selected.
403 </p>
404
405 <p>
406 Sequence concatenation: <code>e1 @ e2</code>, equivalent
407 to <code>[!e1 !e2]</code>.
408 </p>
409
410 </section>
411
412 <section title="Projections, filtering">
413
414 <p>
415 If the x-expression <code>e</code> evaluates to a record or an XML
416 element, the construction <code>e.l</code> will extract the value of
417 field or attribute <code>l</code>. Similarly, the construction
418 <code>e.?l</code> will extract the value of field or attribute
419 <code>l</code> if present, and return the empty sequence
420 <code>[]</code> otherwise.
421 </p>
422
423 <p>
424 If the x-expression <code>e</code> evaluates to a record,
425 the construction <code>e -. l</code> will produce a new record
426 where the field <code>l</code> has been removed (if present).
427 </p>
428
429 <p>
430 If the x-expression <code>e</code> evaluates to an x-sequence,
431 the construction <code>e/</code> will result in a new x-sequence
432 obtained by taking in order all the children of the XML elements
433 from the sequence <code>e</code>. For instance, the x-expression
434 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
435 evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
436 </p>
437
438 <p>
439 If the x-expression <code>e</code> evaluates to an x-sequence,
440 the construction <code>e.(t)</code> (where <code>t</code> is an
441 x-type) will result in a new x-sequence
442 obtained by filtering <code>e</code> to keep only the elements
443 of type <code>t</code>. For instance, the x-expression
444 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
445 evaluates to the x-value <code>[ 4 5 ]</code>.
446 </p>
447 </section>
448
449 <section title="Dynamic type checking">
450
451 <p>
452 If <code>e</code> is an x-expression and <code>t</code> is an x-type,
453 the construction <code>(e :? t)</code> returns the same
454 result as <code>e</code> if it has type <code>t</code>, and otherwise
455 raises a <code>Failure</code> exception whose argument explains
456 why this is not the case.
457 </p>
458
459 <sample><![CDATA[{{ON}}
460 # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
461 f {{ <a>[ 1 2 '3' ] }};;
462 Exception:
463 Failure
464 "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
465 ]]></sample>
466 </section>
467
468 <section title="Pattern matching">
469
470 <p>
471 OCamlDuce comes with a powerful pattern matching operation.
472 X-patterns are described <a href="#patterns">below</a>.
473 The syntax for the pattern matching operation is:
474 <code>match e with p1 -> e1 | ... | pn -> en</code>.
475 The type-system ensures exhaustivivity for the pattern matching
476 and infers precise types for the capture variables.
477 It is also possile to use x-pattern matching as a regular
478 OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
479 match e with {{p1}} -> e1 | ... | {{pn}} -> en
480 function {{p1}} -> e1 | ... | {{pn}} -> en
481 </p>
482
483 <note>
484 currently it is impossible to mix normal OCaml patterns and x-patterns
485 in a single pattern matching.
486 </note>
487
488 </section>
489
490 <section title="Local binding">
491
492 <p>
493 The x-expression <code>let p=e1 in e2</code> is equivalent to
494 <code>match e1 with p -> e2</code>. There is also an local binding
495 with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
496 e2</code>.
497 </p>
498
499 </section>
500
501
502 <section title="Iterators">
503
504 <p>
505 OCamlDuce comes with a sequence iterator
506 <code>map e with p1 -> e1 | ... | pn -> en</code> and
507 a tree iterator
508 <code>map* e with p1 -> e1 | ... | pn -> en</code>.
509 </p>
510
511 <p>
512 For both constructions, the argument must evaluate to a sequence.
513 The <code>map</code> iterator applies the patterns to each element
514 of this sequence in turns and produces a new sequence by concatenating
515 all the results (all the right-hand sides must thus produce a
516 sequence). The set of patterns must be exhaustive for all the possible
517 elements of the input sequence.
518 </p>
519
520 <p>
521 The tree iterator is similar except that the patterns need not be
522 exhaustive. If some element of the input sequence is not matched,
523 it is simply copied into the result unless it is an XML element. In
524 this case, the transformation is applied recursively to its content.
525 </p>
526
527 </section>
528
529 <section title="OCaml constructions">
530
531 <p>
532 As a convenience, some of the OCaml expression constructors
533 are allowed as x-expressions (without a need to go back to OCaml
534 with double curly braces): (unqualified) value identifiers and
535 function calls.
536 </p>
537
538 </section>
539
540 </box>
541
542 <box title="More on x-types" link="types">
543
544 <p>
545 We have seen how to write simple x-types. We can then combine
546 them with Boolean connectives:
547 </p>
548
549 <ul>
550 <li><code>t1 &amp; t2</code>: intersection;</li>
551 <li><code>t1 | t2</code>: union;</li>
552 <li><code>t1 - t2</code>: difference.</li>
553 </ul>
554
555 <p>
556 The empty x-type is written <code>Empty</code> (it contains no value),
557 and the universal x-type is written <code>Any</code> (it contains
558 all the x-values) or <code>_</code>.
559 </p>
560
561 <p>
562 When an x-type has been bound to some OCaml identifier
563 (<code>{{ON}}type t = {{...}}</code>), it is possible to use
564 this identifier in another x-type. Recursive definitions
565 are allowed:
566 </p>
567
568 <sample><![CDATA[{{ON}}
569 type t1 = {{ <a>[ t2* ] }}
570 and t2 = {{ <b>[ t1* ] }}
571 ]]></sample>
572
573 <p>
574 Note that x-values are always finite and acyclic. The type checker
575 detects type definition which would yield empty types:
576 </p>
577
578 <sample><![CDATA[{{ON}}
579 # type t = {{ <a>[ t+ ] }};;
580 This definition yields an empty type
581 ]]></sample>
582
583 <p>
584 If <code>t1</code> and <code>t2</code> are record x-types,
585 we can combine them with the infix <code>++</code> operator, which
586 mimics the corresponding operator on expressions (record
587 concatenation). Similarly, we can use the infix <code>@</code>
588 concatenation operator on sequence x-types.
589 </p>
590
591 </box>
592
593 <box title="X-patterns" link="patterns">
594
595 <p>
596 X-patterns follow the same syntax as X-types. In particular,
597 any X-type is a valid X-pattern. In addition to X-types constructors,
598 X-patterns can have:
599 </p>
600
601 <ul>
602 <li>capture variables (lowercase OCaml identifiers);</li>
603 <li>constant bindings <code>(x := c)</code> where x is a capture
604 variable and c is
605 a literal x-constant (this pattern always succeeds and returns the
606 binding x->c).</li>
607 </ul>
608
609 <p>
610 In record x-patterns, it is possible to omit the <code>=p</code> part of a field.
611 The content is then replaced with the label name considered as
612 a capture variable. E.g. <code>{ x y=p }</code> is equivalent to
613 <code>{ x=x y=p }</code>.</p>
614
615 <p>It is also possible to add an "else" clause:
616 <code>{ x = (a,_)|(a:=3) }</code>
617 will accept any record with atmost the field <code>x</code>. If the content
618 is a pair, the capture variable a will be bound to its component;
619 otherwise, it is set to <code>3</code>.</p>
620
621 <p>
622 In regular expressions, it is possible to extract whole subsequences
623 with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
624 </p>
625
626 <p>
627 If the same sequence capture variable appears several times (or below a
628 repetition) in a regexp, it is bound to the concatenation of all
629 matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
630 collect in <code>x</code> all the elements of type <code>Int</code> from
631 a sequence.</p>
632
633 <p>
634 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
635 as possible). They admit non-greedy variants <code>+?,*?,??</code>.
636 </p>
637 </box>
638
639 <box title="Namespace bindings" link="ns">
640
641 <p>
642 The binding of namespace prefixes to URIs
643 can be done either by toplevel phrases (structure items) or
644 by local declarations:
645 </p>
646
647 <sample>{{ON}}
648 # {{ namespace ns = "http://..." }};;
649 # let x = {{ `ns: x }};;
650 val x : {{`ns:x}} = {{`ns:x}}
651 # let x = {{ let namespace ns = "http://..." in `ns:x }};;
652 val x : {{`ns:x}} = {{`ns:x}}
653 </sample>
654
655 <p>The toplevel definitions can also appear in module interfaces
656 (signatures). A toplevel prefix binding is not exported by a module: its scope
657 is limited to the current structure or signature. It is possible
658 to specify a default namespace, and to reset it:
659 </p>
660
661 <sample>{{ON}}
662 # {{ namespace "http://..." }};;
663 # {{ `x }};;
664 - : {{`ns1:x}} = {{`ns1:x}}
665 # {{ namespace "" }};;
666 # {{ `x }};;
667 - : {{`x}} = {{`x}}
668 </sample>
669
670 <p>
671 Note that the value pretty-printer invented some prefix
672 for the namespace URI. The default prefix declaration also have a
673 local form <code> let namespace "..." in ... </code>.
674 </p>
675
676 </box>
677
678 <box title="More on type-checking" link="typecheck">
679
680 <section title="Type inference">
681
682 <p>
683 As we said above, the programmer is sometimes required to provide type
684 annotations. To know where to put these annotation, it is necessary to
685 get a basic understanding of how type-checking works.
686 </p>
687
688 <p>
689 The OCaml type-checker is run first to detect which sub-expressions
690 are of the x-kind. A second ML type-checking pass is then done to
691 introduce subsumption (implicit subtyping) steps where allowed. After
692 these two passes, the OCamlDuce type checker obtains a data-flow summary of
693 x-values in the whole compilation unit. This is a directed graph,
694 whose edges represent either simple data-flow or complex operation
695 on x-values. The nodes of the graph can be thought as x-type
696 variables. A data-flow edge corresponds to a subtyping constraints,
697 and an operation edge corresponds to a symbolic constraints which
698 mimics the corresponding operation on values.
699 </p>
700
701 <p>
702 Some of the nodes are given an explicit type by the programmer,
703 through type annotations (on expressions or function arguments)
704 or the other usual mechanism in ML (data type declarations,
705 signatures, ...).
706 </p>
707
708 <p>
709 Also, if there is a loop with only subtyping edges in the graph,
710 all the nodes on the loop are merged together.
711 </p>
712
713 <p>
714 After this operation, the graph is required to be acyclic (assuming
715 that the nodes with an explicit type are removed from the graph). It
716 is the responsibility of the programmer to provide enough type
717 annotation to achieve this property. Otherwise, a type error
718 is issued.
719 </p>
720
721 <sample><![CDATA[{{ON}}
722 # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
723 Cycle detected: cannot type-check
724 # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
725 val f : int -> {{String}} = <fun>]]>
726 </sample>
727
728 <p>
729 In the example above, there is a cycle between the result type for
730 <code>f</code> and the type for the sub-expression <code>{{ON}}f
731 {{n-1}}</code>. It is here broken with a type annotation on the result; it could
732 have been broken by a type annotation on the expression <code>{{ON}}f
733 {{n-1}}</code>, or on the function <code>f</code> itself, or by a
734 module signature.
735 </p>
736
737 <p>
738 Let us study another simple example:
739 </p>
740
741 <sample>{{ON}}
742 # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
743 - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
744 </sample>
745
746 <p>
747 The type-checkers detects that the two x-values <code>2</code> and
748 <code>3</code> can flow to the argument of <code>f</code>. Its body
749 is thus type-checked with the assumption that <code>x</code> has type
750 <code>2--3</code>. The computed result type is then <code>3--4</code>.
751 </p>
752
753
754 <p>
755 The type-inference process described above is global by nature. The
756 acyclicity condition is only imposed after a whole compilation unit
757 has been type-checked by OCaml (and the information from the module
758 interface as been integrated). When a type variable is inferred to
759 be of the x-kind, it is never generalized. As a consequence, there
760 is no parametric polymorphism on x-types.
761 </p>
762
763 <p>
764 In the toplevel, type-checking is done after each phrase. Consider
765 the following session:
766 </p>
767
768 <sample><![CDATA[{{ON}}
769 # let f x = {{ x + 1 }};;
770 val f : {{Empty}} -> {{Empty}} = <fun>
771 # let a = f {{ 2 }};;
772 Subtyping failed 2 <= Empty
773 Sample:
774 2
775 ]]></sample>
776
777 <p>
778 The function <code>f</code> is inferred to have type
779 <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
780 phrase is type-checked, the data-flow graph says that no value
781 can flow to <code>x</code>, and thus the input type is empty
782 (and similarly for the result type). If the two phrases
783 were type-checked together (which would be the case it they had
784 been compiled by the compiler, not in the toplevel), the type checker
785 would have correctly inferred that the input type for <code>f</code>
786 must contain <code>2</code>.
787 </p>
788
789 </section>
790
791 <section title="Implicit subtyping">
792
793 <p>
794 Coercion from an x-type to a super type is automatic in OCamlDuce.
795 However, this automatic subsumption does not carry over to OCaml
796 type constructor, even if there are covariant. Consider:
797 </p>
798
799 <sample><![CDATA[{{ON}}
800 # let f (x : {{ Int }} * {{ Int }}) = 1;;
801 val f : {{Int}} * {{Int}} -> int = <fun>
802 # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
803 This expression has type {{0}} * {{0}} but is here used with type
804 {{Int}} * {{Int}}
805 # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
806 val g : {{0}} * {{0}} -> int = <fun>
807 # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
808 val g : {{0}} * {{0}} -> int = <fun>
809 ]]></sample>
810
811 <p>
812 The first attempt to define <code>g</code> fails because the type for
813 <code>x</code> is not an x-type and thus subsumption does not
814 apply. In the second attempt, we extract the two components of the
815 pair; since they are inferred to be x-values, subtyping applies to
816 both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
817 it is legal to unify its type with the input type of <code>f</code>.
818 The third definition for <code>g</code> gives an alternative solution:
819 using explicit OCaml type coercions.
820 </p>
821
822 </section>
823
824 </box>
825
826 <box title="Exchanging values" link="transl">
827
828 <p>
829 OCamlDuce strongly seperates regular OCaml values from the new
830 x-values. They have different syntax, expressions, types, patterns,
831 and even type-checking algorithms. This strong segregation is key point
832 which allowed a simple integration between very different type
833 systems.
834 </p>
835
836 <p>
837 At some point, it is still necessary to cross the frontier and
838 translate OCaml values to x-values or the opposite.
839 </p>
840
841 <p>
842 Fortunately, OCamlDuce provides automatic translations in both
843 directions. Instead of double curly braces, you can
844 enclose x-expressions in curly brace+colon <code>{: ... :}</code>
845 (here, the <code>...</code> is an x-expression).
846 The effect is to translate the result of the x-expression
847 (which must be an x-value) to an OCaml value. Similarly,
848 in an x-expression, you can obtain the x-translation of
849 an OCaml value with the same syntax <code>{: ... :}</code>
850 (here, the <code>...</code> is an OCaml expression).
851 </p>
852
853 <p>
854 Here is how the translation works. To each OCaml type <code>t</code>,
855 we associate an x-type <code>T(t)</code> and a pair of translation
856 function between <code>t</code> and <code>T(t)</code>.
857 Actually, not all the features are supported. For instance,
858 free type variables, abstract types, object types, non-regular
859 recursive types cannot be translated. In particular, since
860 type variables are not allowed, the OCaml type must be fully known.
861 </p>
862
863 <p>
864 The translation for an OCaml type <code>t</code> is defined by structural
865 induction on <code>t</code>. Sum types are
866 translated to union types: a constant constructor <code>A</code> is
867 translated to the qualified name <code>`A</code>; a non-constant
868 constructor <code>A of t1 * ... * tn</code> is translated to
869 <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
870 have the same translation. Record types are translated to closed
871 record x-types. Some other translations:
872 </p>
873
874 <table border="1">
875 <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
876 <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
877 <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
878 <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
879 <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
880 <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
881 <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
882 <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
883 <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
884 <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
885 </table>
886
887 <p>
888 Here is an example:
889 </p>
890
891 <sample>{{ON}}
892 # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
893 - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
894 </sample>
895
896 <p>
897 In this example, the result type of the translation is inferred
898 to be <code>{{ON}}{{ Int }} list</code> (because the type for
899 <code>f</code> is given). The corresponding x-type
900 is <code>{{ON}}{{ [Int*] }}</code>.
901 </p>
902
903 </box>
904
905 <box title="The standard library" link="stdlib">
906
907 <p>
908 In OCamlDuce, the Num library from OCaml is included in the standard
909 library. In addition, there are two new module called
910 <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
911 </p>
912
913 <p>
914 The module <code>Cduce_types</code> gives access to the internal
915 representation of x-values. It is currently undocumented.
916 </p>
917
918 <p>
919 The module <code>Ocamlduce</code> provides several useful
920 functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
921 documentation for a description of its interface.
922 </p>
923
924 </box>
925
926 <box title="Code samples" link="code">
927
928
929 <section title="Parsing XML files">
930
931 <p>
932 OCamlDuce does not come with any built-in XML parser. However,
933 the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
934 makes it easy to plug existing XML parsers. Here is some
935 code which demonstrate how to do that with three of
936 the most popular OCaml XML parser libraries:
937 </p>
938
939 <ul>
940 <li><a
941 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
942 <li><a
943 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
944 <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
945 </ul>
946
947 </section>
948
949 <section title="Converting DTD to OCamlDuce types">
950
951 <p>
952 This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
953 from a DTD. It requires PXP.
954 </p>
955
956 <note>This application does not use any of the new features, but it
957 can be useful in the development of OCamlDuce applications.
958 </note>
959
960 </section>
961
962 <section title="Parsing XML Schema, producing valid XHTML output">
963
964 <p>
965 This <a
966 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
967 parses XML Schema Definitions (.xsd files), and produces summaries
968 (toplevel declaration names) in XHTML. OCamlDuce type system ensures
969 that the parser is coherent with the input XML type (any valid XML
970 Schema is accepted) and that the printer is coherent with the output
971 XML type (it is necessarily a valid XHTML document).
972 </p>
973
974 <p>
975 Of course, for such a simple transformation, parsing the XML document
976 into an internal representation is not necessary. A direct XML-to-XML
977 transformation would be easy to write. We wanted to illustrate
978 a complex parsing of XML.
979 </p>
980
981 <p>
982 It it interesting to introduce errors in the parser
983 <code>schema_loader.ml</code> or the printer
984 <code>dump_schema.ml</code> and see how the type system catch them.
985 </p>
986
987 <note>
988 The application uses XML Light to parse XML document.
989 </note>
990
991 <note>
992 Some features of XML Schema are not parsed, such as
993 <code>redefine</code> elements or substitution groups.
994 </note>
995
996 </section>
997
998 </box>
999
1000 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5