/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1793 - (show annotations)
Tue Jul 10 19:23:16 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 38988 byte(s)
[r2005-07-31 23:01:13 by afrisch] Empty log message

Original author: afrisch
Date: 2005-07-31 23:01:13+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="ocaml">
3
4 <title>OCamlDuce</title>
5
6 <left>
7 <local-links href="index,documentation"/>
8 <p>On this page:</p>
9 <boxes-toc/>
10 </left>
11
12 <box>
13
14 <p>
15 OCamlDuce is a merger between <a
16 href="http://caml.inria.fr/">OCaml</a> and
17 <local href="index">CDuce</local>. It comes as a modified
18 version of OCaml which integrates CDuce features: expressions, types,
19 patterns.
20 </p>
21
22 <p>
23 OCamlDuce is distributed under the same licenses as Objective Caml:
24 the Q Public License version 1.0 for the Compiler, and the LGPL
25 version 2 for the Library. The extension has been written by Alain
26 Frisch. Parts of the CDuce implementation, by the same author, have
27 been reused.
28 </p>
29
30 </box>
31
32 <box title="Download and installation" link="install">
33
34 <p>
35 The build procedure for OCamlDuce is exactly the same as for OCaml:
36 <tt>configure, make world, make install</tt>. The names of the tools
37 are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce
38 is based on CVS snapshots of OCaml (between 3.08.3 and the current
39 <tt>release308</tt> branch) and CDuce (between 0.3.91 and the head).
40 </p>
41
42 <ul>
43 <li><a
44 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler,
45 version 0.0.5</a></li>
46 <!--<li><a
47 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support
48 library, version 0.0.4</a></li>-->
49 </ul>
50
51 <p>
52 GODI users can upgrade an existing installation by adding this
53 line to their <tt>etc/godi.conf</tt> file:
54 </p>
55 <sample>
56 GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
57 </sample>
58 <p>
59 and by forcing a recompilation of the <tt>godi-ocaml-src</tt>
60 and <tt>godi-ocaml</tt> packages. <!--They should also build
61 the <tt>godi-xml-support</tt> library.-->
62 </p>
63
64 <!--
65 <p>
66 Some simple examples can be found <a -->
67 <!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p>
68 -->
69
70 </box>
71
72 <box title="Overview" link="overview">
73
74 <p>
75 The goal of the OCamlDuce project is to extend the OCaml language with features
76 to make it easier to write safe and efficient complex applications
77 that need to deal with XML documents. In particular, it relies
78 on a notion of types and patterns to guarantee statically
79 that all the possible input documents are correctly processed, and
80 that only valid output documents are produced.
81 </p>
82
83 <p>
84 In a nutshell, OCamlDuce extends OCaml with a new kind of values
85 (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
86 strings. In order to describe these values, it also extends the type algebra
87 with so-called <em>x-types</em>. The philosophy behind these types is that they
88 represent <em>set of x-values</em>. They can be very precise: indeed,
89 each value can be seen as a singleton type (a set with a single
90 value), and it is possible to form Boolean combinations of x-types
91 (intersection, union, difference).
92 </p>
93
94 <p>
95 OCamlDuce's type system can be understood as a refinement of OCaml.
96 For each sub-expression which is inferred to be of the x-kind (using
97 OCaml unification based type-system), OCamlDuce will try to infer to
98 best possible sound x-type. Here, best means smallest for the natural
99 subtyping relation (set inclusion). The inference algorithm is
100 actually a data-flow analysis: the x-type will collect all the values
101 that can be produced by the expression, considering all the possible
102 data-flow in the program. It it sometimes necessary to provide
103 explicit type annotations to help the type checker infer this type, in
104 particular when you define recursive functions or when you use
105 iterators.
106 </p>
107
108 <p>
109 Subtyping is implicit for x-types: if an expression is inferred to be
110 of x-type <code>t</code>, which is a subtype of <code>s</code>, then
111 it is possible to use this expression in any context which expects a
112 value of type <code>s</code>.
113 </p>
114
115 </box>
116
117 <box title="Getting started" link="start">
118
119 <p>
120 Most of the new language features are enclosed within double curly braces
121 <code>{{ON}}{{...}}</code>. For instance, the following code sample
122 defines a value <code>x</code> as an XML element (with tag
123 <code>a</code>, an attribute <code>href</code>, and a simple
124 string as content):
125 </p>
126
127 <sample><![CDATA[{{ON}}
128 # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
129 val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
130 {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
131 ]]></sample>
132
133 <p>
134 What appears between the curly braces is called an x-expression.
135 Similarly, there are x-types (as seen above), and also x-patterns.
136 The delimiters <code>{{ON}}{{...}}</code> are only used
137 for syntactical reasons, to avoid clashed between OCaml and CDuce
138 syntaxes and lexical conventions. As a matter of fact,
139 an OCaml expression need not be a syntactical x-expression
140 (delimited by double curly braces) to evaluate to an x-value.
141 For instance, once <code>x</code> has been declared as above,
142 the expression <code>x</code> evaluates to an x-value.
143 </p>
144
145
146 <p>
147 It is possible to use an arbitrary
148 OCaml expression as part of an x-expression: it must simply be
149 protected by a new pair of double curly braces. For instance, there is
150 no <code>if-then-else</code> construction for x-expressions, but you
151 can write:
152 </p>
153
154 <sample><![CDATA[{{ON}}
155 # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
156 - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
157 ]]></sample>
158
159 <p>
160 Only the highlighted parts are parsed as x-expressions. The
161 <code>if-then-else</code> sub-expression is parsed as an OCaml
162 expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
163 'z' ]}}</code>).
164 </p>
165
166 </box>
167
168 <box title="X-values" link="values">
169
170 <p>
171 X-values are intended to represent XML documents and fragments
172 thereof: elements, tags, text, sequences. In this section, we
173 present the x-value algebra, the syntax of the corresponding
174 x-expression constructors and the associated x-types.
175 </p>
176
177 <p>
178 There are three kinds of atomic kind of x-values:
179 </p>
180 <ul>
181 <li>Unicode characters;</li>
182 <li>qualified names;</li>
183 <li>arbitrarily large integers.</li>
184 </ul>
185
186 <section title="Characters">
187
188 <p>
189 X-characters are different from OCaml characters. They can represent
190 the range of Unicode codepoints defined in the XML specification.
191 Character literals are delimited by single quotes. The escape
192 sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
193 numerical escape sequence are written <code>\n;</code> where n is an integer
194 literal (note the extra semi-colon). The source code is interpreted as
195 being encoded in iso-8859-1. As a consequence, Unicode characters which are not
196 part of the Latin1 character set must be introduced with this
197 numerical escape mechanism. The x-types for x-characters are:
198 </p>
199 <ul>
200 <li>singletons;</li>
201 <li>intervals, written <code>c -- d</code>, where <code>c</code> and
202 <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
203 }}</code>);</li>
204 <li>the type of all x-characters, written <code>Char</code>;</li>
205 <li>the type of all Latin1 characters, written <code>Latin1Char</code>
206 (defined as <code>\0; -- \255;</code>).</li>
207 </ul>
208
209 </section>
210
211 <section title="Integers">
212
213 <p>
214 X-integers are arbitrarily large. Literals must be written in decimal.
215 Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
216 The x-types for x-integers are:
217 </p>
218 <ul>
219 <li>singletons;</li>
220 <li>intervals, written <code>i -- j</code>, where <code>i</code> and
221 <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
222 }}</code>); it is possible to replace <code>i</code> or <code>j</code>
223 with <code>**</code> to define open-ended intervals, e.g.
224 <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
225 </li>
226 <li>the type of all x-integers, written <code>Int</code>;</li>
227 <li>the type of all the integers which can be represented by a
228 signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
229 <code>Int64</code>).</li>
230 </ul>
231
232 </section>
233
234 <section title="Qualified names">
235
236 <p>
237 Qualified names are intended to represent XML tag names. Conceptually,
238 they are made of a namespace URI and a local name. Since URIs tends
239 to be long, literals are of the form <code>`prefix:local</code>
240 where <code>local</code> is the local name and <code>prefix</code>
241 is an <em>namespace prefix</em> bound to some URI (in the scope of the
242 literal). The local name follows the definitions from
243 the XML Namespaces specification; a dot character must be protected
244 by a backslash and non-Latin1 characters are written as character
245 literals <code>\n;</code>. <a href="#ns">See below</a> for a
246 explanation on how to bind prefixes to URIs. To refer
247 to the default namespace (or the absence of namespace if not default
248 has been defined), the syntax is simply <code>`local</code>.
249 The x-types for qualified names are:
250 </p>
251 <ul>
252 <li>singletons;</li>
253 <li>the type of all qualified names, written <code>Atom</code>;</li>
254 <li>the type of all qualified names from a specified namespace,
255 written <code>`ns:*</code>.</li>
256 </ul>
257 </section>
258
259 <section title="Records">
260
261 <p>
262 X-records are mainly used to represent the set of attributes of an XML
263 element. An x-record is a binding from a finite set of <em>labels</em>
264 to x-values. Labels follows the same syntax as for qualified names
265 without the leading backquote. However, if the namespace prefix is not
266 given, the default namespace does not apply (the namespace URI is
267 empty). The syntax for record x-expressions is <code> { l1=e1
268 ... ln=en }</code> where the <code>li</code> are labels and the
269 <code>ei</code> are x-expressions. Fields can also be separated with a
270 semi-colon. It is legal to omit the expression for a field; the label is then
271 taken as the content of the field (a value with this name must be
272 defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
273 in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
274 y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
275 which labels are authorized/mandatory, and what the types of the
276 corresponding fields are. There are two kind of record x-types:
277 </p>
278
279 <ul>
280 <li>
281 Closed record types, which only allow a finite number of fields:
282 <code>{ l1=t1 ... ln=tn }</code>;
283 </li>
284 <li>
285 Open record types, which allow additional fields (with arbitrary
286 type):
287 <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
288 in the syntax).
289 </li>
290 </ul>
291
292 <p>
293 In both cases, it is possible to make one of
294 the fields optional by changing = to =?.
295 </p>
296
297 <p>
298 The x-type of all x-record is thus <code>{ .. }</code>,
299 and the x-type of x-records with maybe a field <code>l</code>
300 of type <code>Int</code> and maybe arbitrary other fields is
301 <code>{ l=?Int .. }</code>.
302 </p>
303
304 </section>
305
306 <section title="Sequences">
307
308 <p>
309 X-sequences are finite and ordered collections of x-values.
310 The syntax for a sequence x-expression in
311 <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
312 by semi-colons as in OCaml list). Each item <code>ei</code>
313 can either be:
314 </p>
315 <ul>
316 <li>an x-expression;</li>
317 <li><code>!e</code> where <code>e</code> is an x-expression which
318 evaluates to a sequence (whose content is inserted in the sequence
319 which is currently defined); e.g.
320 <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
321 <code>[ 1 2 3 4 ]</code>;</li>
322 <li>a string literal delimited by simple quotes; e.g.
323 <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
324 </ul>
325
326 <p>
327 X-types for sequences are of the form <code>[R]</code>
328 where <code>R</code> is a regular expression over x-types which
329 describe the possible contents of the sequences. The possible
330 forms of regular expressions are:
331 </p>
332
333 <ul>
334 <li><code>t</code> (one single element of x-type <code>t</code>)</li>
335 <li><code>R*</code> (zero or more repetitions)</li>
336 <li><code>R+</code> (one or more repetitions)</li>
337 <li><code>R?</code> (zero or one repetition)</li>
338 <li><code>R1 R2</code> (sequence)</li>
339 <li><code>R1|R2</code> (alternation)</li>
340 <li><code>(R)</code></li>
341 <li><code>/t</code> (guard: the tail of the sequence must comply with
342 <code>t</code>).</li>
343 <li><code>PCDATA</code> (equivalent to Char*).</li>
344 </ul>
345
346 <note>sequence are actually encoded with embedded pairs and a
347 terminator, and sequences types are encoded with product types and
348 recursive types. The encoding is available to the programmer
349 but not described in this manual.
350 </note>
351
352 </section>
353
354 <section title="Strings">
355
356 <p>
357 Strings are nothing but sequences of characters. There are two
358 predefined types <code>String</code> and <code>Latin1</code>
359 (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
360 </p>
361
362 <p>
363 A string literal <code>[ '...' ]</code> can also be written
364 <code>"..." </code> (without the square brackets). Note that simple
365 (resp. double) quotes need to be escaped only when the string is
366 delimited with double (resp. simple) quotes.
367 </p>
368
369 </section>
370
371 <section title="XML elements">
372
373 <p>
374 An XML element is a triple of x-values. The syntax for
375 the corresponding x-expression constructor is
376 <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
377 qualified name literal, it is possible to omit the leading
378 backquote and the surrounding parentheses. Similarly,
379 when <code>e2</code> is an x-record literal, it is possible
380 to omit the curly braces and the parentheses. For instance,
381 one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
382 instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
383 </p>
384
385 <p>
386 XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
387 and the same simplifications applies. For instance, if
388 the namespace prefix <code>ns</code> has been defined,
389 the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
390 it describes XML elements whose tag is in the namespace bound to
391 <code>ns</code>, with an empty content, and with an arbitrary set of
392 attributes. An underscore in place of <code>(t1)</code> is
393 equivalent to <code>(Atom)</code> (any tag).
394 </p>
395
396 </section>
397
398 </box>
399
400 <box title="X-expressions" link="expr">
401
402 <p>
403 In the previous section, we have seen the syntax for x-values
404 constructors (constant literals, sequence, record, element constructors).
405 In this section, we describe the other kinds of x-expressions.
406 </p>
407
408 <section title="Binary infix operators">
409
410 <p>
411 The arithmetic operators on integers follow the usual precedence.
412 They are written <code>+,*,-,div,mod</code> (they are all infix).
413 </p>
414
415 <p>
416 Record concatenation: <code>e1 ++ e2</code>. The x-expressions
417 <code>e1</code> and <code>e2</code> must evaluate to x-records.
418 The result is obtained by concatening them. If a field with the same
419 label is present in both records, the right-most one is selected.
420 </p>
421
422 <p>
423 Sequence concatenation: <code>e1 @ e2</code>, equivalent
424 to <code>[!e1 !e2]</code>.
425 </p>
426
427 </section>
428
429 <section title="Projections, filtering">
430
431 <p>
432 If the x-expression <code>e</code> evaluates to a record or an XML
433 element, the construction <code>e.l</code> will extract the value of
434 field or attribute <code>l</code>. Similarly, the construction
435 <code>e.?l</code> will extract the value of field or attribute
436 <code>l</code> if present, and return the empty sequence
437 <code>[]</code> otherwise.
438 </p>
439
440 <p>
441 If the x-expression <code>e</code> evaluates to a record,
442 the construction <code>e -. l</code> will produce a new record
443 where the field <code>l</code> has been removed (if present).
444 </p>
445
446 <p>
447 If the x-expression <code>e</code> evaluates to an x-sequence,
448 the construction <code>e/</code> will result in a new x-sequence
449 obtained by taking in order all the children of the XML elements
450 from the sequence <code>e</code>. For instance, the x-expression
451 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
452 evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
453 </p>
454
455 <p>
456 If the x-expression <code>e</code> evaluates to an x-sequence,
457 the construction <code>e.(t)</code> (where <code>t</code> is an
458 x-type) will result in a new x-sequence
459 obtained by filtering <code>e</code> to keep only the elements
460 of type <code>t</code>. For instance, the x-expression
461 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
462 evaluates to the x-value <code>[ 4 5 ]</code>.
463 </p>
464 </section>
465
466 <section title="Dynamic type checking">
467
468 <p>
469 If <code>e</code> is an x-expression and <code>t</code> is an x-type,
470 the construction <code>(e :? t)</code> returns the same
471 result as <code>e</code> if it has type <code>t</code>, and otherwise
472 raises a <code>Failure</code> exception whose argument explains
473 why this is not the case.
474 </p>
475
476 <sample><![CDATA[{{ON}}
477 # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
478 f {{ <a>[ 1 2 '3' ] }};;
479 Exception:
480 Failure
481 "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
482 ]]></sample>
483 </section>
484
485 <section title="Pattern matching">
486
487 <p>
488 OCamlDuce comes with a powerful pattern matching operation.
489 X-patterns are described <a href="#patterns">below</a>.
490 The syntax for the pattern matching operation is:
491 <code>match e with p1 -> e1 | ... | pn -> en</code>.
492 The type-system ensures exhaustivivity for the pattern matching
493 and infers precise types for the capture variables.
494 It is also possile to use x-pattern matching as a regular
495 OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
496 match e with {{p1}} -> e1 | ... | {{pn}} -> en
497 function {{p1}} -> e1 | ... | {{pn}} -> en
498 </p>
499
500 <p>
501 Pattern matching follows is first-match policy. The first pattern
502 that succeeds triggers the corresponding branch.
503 </p>
504
505 <note>
506 currently it is impossible to mix normal OCaml patterns and x-patterns
507 in a single pattern matching.
508 </note>
509
510 </section>
511
512 <section title="Local binding">
513
514 <p>
515 The x-expression <code>let p=e1 in e2</code> is equivalent to
516 <code>match e1 with p -> e2</code>. There is also an local binding
517 with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
518 e2</code>.
519 </p>
520
521 </section>
522
523
524 <section title="Iterators">
525
526 <p>
527 OCamlDuce comes with a sequence iterator
528 <code>map e with p1 -> e1 | ... | pn -> en</code> and
529 a tree iterator
530 <code>map* e with p1 -> e1 | ... | pn -> en</code>.
531 </p>
532
533 <p>
534 For both constructions, the argument must evaluate to a sequence.
535 The <code>map</code> iterator applies the patterns to each element
536 of this sequence in turns and produces a new sequence by concatenating
537 all the results (all the right-hand sides must thus produce a
538 sequence). The set of patterns must be exhaustive for all the possible
539 elements of the input sequence.
540 </p>
541
542 <p>
543 The tree iterator is similar except that the patterns need not be
544 exhaustive. If some element of the input sequence is not matched,
545 it is simply copied into the result unless it is an XML element. In
546 this case, the transformation is applied recursively to its content.
547 </p>
548
549 </section>
550
551 <section title="OCaml constructions">
552
553 <p>
554 As a convenience, some of the OCaml expression constructors
555 are allowed as x-expressions (without a need to go back to OCaml
556 with double curly braces): (unqualified) value identifiers and
557 function calls.
558 </p>
559
560 </section>
561
562 </box>
563
564 <box title="More on x-types" link="types">
565
566 <p>
567 We have seen how to write simple x-types. We can then combine
568 them with Boolean connectives:
569 </p>
570
571 <ul>
572 <li><code>t1 &amp; t2</code>: intersection;</li>
573 <li><code>t1 | t2</code>: union;</li>
574 <li><code>t1 - t2</code>: difference.</li>
575 </ul>
576
577 <p>
578 The empty x-type is written <code>Empty</code> (it contains no value),
579 and the universal x-type is written <code>Any</code> (it contains
580 all the x-values) or <code>_</code>.
581 </p>
582
583 <p>
584 When an x-type has been bound to some OCaml identifier
585 (<code>{{ON}}type t = {{...}}</code>), it is possible to use
586 this identifier in another x-type. Recursive definitions
587 are allowed:
588 </p>
589
590 <sample><![CDATA[{{ON}}
591 type t1 = {{ <a>[ t2* ] }}
592 and t2 = {{ <b>[ t1* ] }}
593 ]]></sample>
594
595 <p>
596 Note that x-values are always finite and acyclic. The type checker
597 detects type definition which would yield empty types:
598 </p>
599
600 <sample><![CDATA[{{ON}}
601 # type t = {{ <a>[ t+ ] }};;
602 This definition yields an empty type
603 ]]></sample>
604
605 <p>
606 If <code>t1</code> and <code>t2</code> are record x-types,
607 we can combine them with the infix <code>++</code> operator, which
608 mimics the corresponding operator on expressions (record
609 concatenation). Similarly, we can use the infix <code>@</code>
610 concatenation operator on sequence x-types.
611 </p>
612
613 </box>
614
615 <box title="X-patterns" link="patterns">
616
617 <p>
618 X-patterns follow the same syntax as X-types. In particular,
619 any X-type is a valid X-pattern. In addition to X-types constructors,
620 X-patterns can have:
621 </p>
622
623 <ul>
624 <li>capture variables (lowercase OCaml identifiers);</li>
625 <li>constant bindings <code>(x := c)</code> where x is a capture
626 variable and c is
627 a literal x-constant (this pattern always succeeds and returns the
628 binding x->c).</li>
629 </ul>
630
631 <p>
632 Here is a brief description of the semantics of patterns. Given
633 an input value, a pattern can either succeed or fail. If it succeeds,
634 it also produces a bindings from the capture variables in the pattern
635 to x-values.
636 </p>
637
638 <ul>
639
640 <li>A pattern which is just a type (no capture variable) succeeds if
641 and only if the value has the type.</li>
642
643 <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
644 or <code>p2</code> succeed, and returns the corresponding binding; if
645 both patterns succeeds, <code>p1</code> wins. It is required that
646 <code>p1</code> and <code>p2</code> have the same sets of capture
647 variables. </li>
648
649 <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
650 and <code>p2</code> succeed, and returns the concatenation of the two
651 bindings. It is required that <code>p1</code> and <code>p2</code> have
652 <em>disjoint</em> sets of capture variables. </li>
653
654 </ul>
655
656 <p>
657 In record x-patterns, it is possible to omit the <code>=p</code> part
658 of a field. The content is then replaced with the label name
659 considered as a capture variable. E.g. <code>{ x y=p }</code> is
660 equivalent to <code>{ x=x y=p }</code>.</p>
661
662 <p>It is also possible to add an "else" clause:
663 <code>{ x = (a,_)|(a:=3) }</code>
664 will accept any record with atmost the field <code>x</code>. If the content
665 is a pair, the capture variable a will be bound to its component;
666 otherwise, it is set to <code>3</code>.</p>
667
668 <p>
669 In regular expressions, it is possible to extract whole subsequences
670 with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
671 </p>
672
673 <p>
674 If the same sequence capture variable appears several times (or below a
675 repetition) in a regexp, it is bound to the concatenation of all
676 matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
677 collect in <code>x</code> all the elements of type <code>Int</code> from
678 a sequence. It is not legal to have repeated simple capture variables.
679 </p>
680
681 <p>
682 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
683 as possible). They admit non-greedy variants <code>+?,*?,??</code>.
684 </p>
685 </box>
686
687 <box title="Namespace bindings" link="ns">
688
689 <p>
690 The binding of namespace prefixes to URIs
691 can be done either by toplevel phrases (structure items) or
692 by local declarations:
693 </p>
694
695 <sample>{{ON}}
696 # {{ namespace ns = "http://..." }};;
697 # let x = {{ `ns: x }};;
698 val x : {{`ns:x}} = {{`ns:x}}
699 # let x = {{ let namespace ns = "http://..." in `ns:x }};;
700 val x : {{`ns:x}} = {{`ns:x}}
701 </sample>
702
703 <p>The toplevel definitions can also appear in module interfaces
704 (signatures). A toplevel prefix binding is not exported by a module: its scope
705 is limited to the current structure or signature. It is possible
706 to specify a default namespace, and to reset it:
707 </p>
708
709 <sample>{{ON}}
710 # {{ namespace "http://..." }};;
711 # {{ `x }};;
712 - : {{`ns1:x}} = {{`ns1:x}}
713 # {{ namespace "" }};;
714 # {{ `x }};;
715 - : {{`x}} = {{`x}}
716 </sample>
717
718 <p>
719 Note that the value pretty-printer invented some prefix
720 for the namespace URI. The default prefix declaration also have a
721 local form <code> let namespace "..." in ... </code>.
722 </p>
723
724 </box>
725
726 <box title="More on type-checking" link="typecheck">
727
728 <section title="Type inference">
729
730 <p>
731 As we said above, the programmer is sometimes required to provide type
732 annotations. To know where to put these annotation, it is necessary to
733 get a basic understanding of how type-checking works.
734 </p>
735
736 <p>
737 The OCaml type-checker is run first to detect which sub-expressions
738 are of the x-kind. A second ML type-checking pass is then done to
739 introduce subsumption (implicit subtyping) steps where allowed. After
740 these two passes, the OCamlDuce type checker obtains a data-flow summary of
741 x-values in the whole compilation unit. This is a directed graph,
742 whose edges represent either simple data-flow or complex operation
743 on x-values. The nodes of the graph can be thought as x-type
744 variables. A data-flow edge corresponds to a subtyping constraints,
745 and an operation edge corresponds to a symbolic constraints which
746 mimics the corresponding operation on values.
747 </p>
748
749 <p>
750 Some of the nodes are given an explicit type by the programmer,
751 through type annotations (on expressions or function arguments)
752 or the other usual mechanism in ML (data type declarations,
753 signatures, ...).
754 </p>
755
756 <p>
757 Also, if there is a loop with only subtyping edges in the graph,
758 all the nodes on the loop are merged together.
759 </p>
760
761 <p>
762 After this operation, the graph is required to be acyclic (assuming
763 that the nodes with an explicit type are removed from the graph). It
764 is the responsibility of the programmer to provide enough type
765 annotation to achieve this property. Otherwise, a type error
766 is issued.
767 </p>
768
769 <sample><![CDATA[{{ON}}
770 # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
771 Cycle detected: cannot type-check
772 # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
773 val f : int -> {{String}} = <fun>]]>
774 </sample>
775
776 <p>
777 In the example above, there is a cycle between the result type for
778 <code>f</code> and the type for the sub-expression <code>{{ON}}f
779 {{n-1}}</code>. It is here broken with a type annotation on the result; it could
780 have been broken by a type annotation on the expression <code>{{ON}}f
781 {{n-1}}</code>, or on the function <code>f</code> itself, or by a
782 module signature.
783 </p>
784
785 <p>
786 Let us study another simple example:
787 </p>
788
789 <sample>{{ON}}
790 # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
791 - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
792 </sample>
793
794 <p>
795 The type-checkers detects that the two x-values <code>2</code> and
796 <code>3</code> can flow to the argument of <code>f</code>. Its body
797 is thus type-checked with the assumption that <code>x</code> has type
798 <code>2--3</code>. The computed result type is then <code>3--4</code>.
799 </p>
800
801
802 <p>
803 The type-inference process described above is global by nature. The
804 acyclicity condition is only imposed after a whole compilation unit
805 has been type-checked by OCaml (and the information from the module
806 interface as been integrated). When a type variable is inferred to
807 be of the x-kind, it is never generalized. As a consequence, there
808 is no parametric polymorphism on x-types.
809 </p>
810
811 <p>
812 In the toplevel, type-checking is done after each phrase. Consider
813 the following session:
814 </p>
815
816 <sample><![CDATA[{{ON}}
817 # let f x = {{ x + 1 }};;
818 val f : {{Empty}} -> {{Empty}} = <fun>
819 # let a = f {{ 2 }};;
820 Subtyping failed 2 <= Empty
821 Sample:
822 2
823 ]]></sample>
824
825 <p>
826 The function <code>f</code> is inferred to have type
827 <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
828 phrase is type-checked, the data-flow graph says that no value
829 can flow to <code>x</code>, and thus the input type is empty
830 (and similarly for the result type). If the two phrases
831 were type-checked together (which would be the case it they had
832 been compiled by the compiler, not in the toplevel), the type checker
833 would have correctly inferred that the input type for <code>f</code>
834 must contain <code>2</code>.
835 </p>
836
837 </section>
838
839 <section title="Implicit subtyping">
840
841 <p>
842 Coercion from an x-type to a super type is automatic in OCamlDuce.
843 However, this automatic subsumption does not carry over to OCaml
844 type constructor, even if there are covariant. Consider:
845 </p>
846
847 <sample><![CDATA[{{ON}}
848 # let f (x : {{ Int }} * {{ Int }}) = 1;;
849 val f : {{Int}} * {{Int}} -> int = <fun>
850 # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
851 This expression has type {{0}} * {{0}} but is here used with type
852 {{Int}} * {{Int}}
853 # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
854 val g : {{0}} * {{0}} -> int = <fun>
855 # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
856 val g : {{0}} * {{0}} -> int = <fun>
857 ]]></sample>
858
859 <p>
860 The first attempt to define <code>g</code> fails because the type for
861 <code>x</code> is not an x-type and thus subsumption does not
862 apply. In the second attempt, we extract the two components of the
863 pair; since they are inferred to be x-values, subtyping applies to
864 both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
865 it is legal to unify its type with the input type of <code>f</code>.
866 The third definition for <code>g</code> gives an alternative solution:
867 using explicit OCaml type coercions.
868 </p>
869
870 </section>
871
872 </box>
873
874 <box title="Exchanging values" link="transl">
875
876 <p>
877 OCamlDuce strongly seperates regular OCaml values from the new
878 x-values. They have different syntax, expressions, types, patterns,
879 and even type-checking algorithms. This strong segregation is key point
880 which allowed a simple integration between very different type
881 systems.
882 </p>
883
884 <p>
885 At some point, it is still necessary to cross the frontier and
886 translate OCaml values to x-values or the opposite.
887 </p>
888
889 <p>
890 Fortunately, OCamlDuce provides automatic translations in both
891 directions. Instead of double curly braces, you can
892 enclose x-expressions in curly brace+colon <code>{: ... :}</code>
893 (here, the <code>...</code> is an x-expression).
894 The effect is to translate the result of the x-expression
895 (which must be an x-value) to an OCaml value. Similarly,
896 in an x-expression, you can obtain the x-translation of
897 an OCaml value with the same syntax <code>{: ... :}</code>
898 (here, the <code>...</code> is an OCaml expression).
899 </p>
900
901 <p>
902 Here is how the translation works. To each OCaml type <code>t</code>,
903 we associate an x-type <code>T(t)</code> and a pair of translation
904 function between <code>t</code> and <code>T(t)</code>.
905 Actually, not all the features are supported. For instance,
906 free type variables, abstract types, object types, non-regular
907 recursive types cannot be translated. In particular, since
908 type variables are not allowed, the OCaml type must be fully known.
909 </p>
910
911 <p>
912 The translation for an OCaml type <code>t</code> is defined by structural
913 induction on <code>t</code>. Sum types are
914 translated to union types: a constant constructor <code>A</code> is
915 translated to the qualified name <code>`A</code>; a non-constant
916 constructor <code>A of t1 * ... * tn</code> is translated to
917 <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
918 have the same translation. Record types are translated to closed
919 record x-types. Some other translations:
920 </p>
921
922 <table border="1">
923 <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
924 <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
925 <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
926 <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
927 <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
928 <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
929 <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
930 <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
931 <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
932 <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
933 </table>
934
935 <p>
936 Here is an example:
937 </p>
938
939 <sample>{{ON}}
940 # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
941 - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
942 </sample>
943
944 <p>
945 In this example, the result type of the translation is inferred
946 to be <code>{{ON}}{{ Int }} list</code> (because the type for
947 <code>f</code> is given). The corresponding x-type
948 is <code>{{ON}}{{ [Int*] }}</code>.
949 </p>
950
951 </box>
952
953 <box title="The standard library" link="stdlib">
954
955 <p>
956 In OCamlDuce, the Num library from OCaml is included in the standard
957 library. In addition, there are two new module called
958 <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
959 </p>
960
961 <p>
962 The module <code>Cduce_types</code> gives access to the internal
963 representation of x-values. It is currently undocumented.
964 </p>
965
966 <p>
967 The module <code>Ocamlduce</code> provides several useful
968 functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
969 documentation for a description of its interface.
970 </p>
971
972 </box>
973
974 <box title="Marshaling" link="marshal">
975
976 <p>
977 OCamlDuce use some tricks on its internal representation of x-values
978 to reduce memory usage and improve performance. You need to pay
979 special attention if you want to use OCaml serialization functions
980 (module <code>Marshal</code>, functions
981 <code>input_value/output_value</code>) on x-values. In addition to
982 your values, you also need to save and restore some piece of internal data
983 using the functions <code>Cduce_types.Value.extract_all</code> and
984 <code>Cduce_types.Value.intract_all</code>. Of course, this also
985 applies if the value to be serialized contains deeply nested x-values.
986 </p>
987
988 <p>
989 Here are generic
990 serialization/deserializations functions that illustrate how to do it:
991 </p>
992
993 <sample>
994 let my_output_value oc v =
995 let p = Cduce_types.Value.extract_all () in
996 output_value oc (p,v)
997
998 let my_input_value ic =
999 let (p,v) = input_value ic in
1000 Cduce_types.Value.intract_all p;
1001 v
1002 </sample>
1003
1004 </box>
1005
1006 <box title="Performance" link="perf">
1007
1008 <section title="Strings">
1009
1010 <p>
1011 OCaml users might be surprised by the fact that x-strings are simply
1012 represented as sequences in OCamlDuce. Does this mean that they are
1013 actually stored in memory as linked list? Certainly not! The internal
1014 representation of sequence values uses several tricks to improve
1015 performance and memory usage. In particular, a special form in the
1016 representation can store strings as byte buffers, as in OCaml.
1017 It an XML document is loaded, or if a Caml string is converted
1018 to an x-value, this compact representation will be used.
1019 </p>
1020
1021 </section>
1022
1023 <section title="Concatenation">
1024
1025 <p>
1026 Similarly, OCaml users might be relectutant to use the sequence
1027 concatenation <code>@</code> on sequences. In OCaml, the complexity
1028 of this operator is linear in the size of its first argument (which
1029 need to be copied). OCamlDuce use a special form in its internal
1030 representation to store concatenation in a lazy way. The concatenation
1031 will really by computed only when the value is accessed. This means
1032 that it's perfectly ok to build a long sequence by adding
1033 new elements at the end one by one, as long as you don't
1034 simultaneously inspect the sequence.
1035 </p>
1036
1037 </section>
1038
1039 <section title="Pattern matching">
1040
1041 <p>
1042 Another point which is worth knowing when programming in OCamlDuce
1043 is that patterns can be written in a declarative style without
1044 affective performance. The compiler uses static type information
1045 about matched values to produce efficient code for pattern matching.
1046 To illustrate this, consider the following sample:
1047 </p>
1048
1049 <sample><![CDATA[{{ON}}
1050 x.ml:
1051
1052 type a = {{ <a>[ a* ] }}
1053 type b = {{ <b>[ b* ] }}
1054
1055 let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1056 ]]></sample>
1057
1058 <sample><![CDATA[{{ON}}
1059 y.ml:
1060
1061 type a = {{ <a>[ a* ] }}
1062 type b = {{ <b>[ b* ] }}
1063
1064 let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1065 ]]></sample>
1066
1067 <p>
1068 The two functions have exactly the same semantics, but the first
1069 implementation is more declarative: it uses type checks to distinguish
1070 between <code>a</code> and <code>b</code> instead of saying
1071 <em>how</em> to distinguish between these two types. Imagine
1072 that the definition of these types change to:
1073 </p>
1074
1075 <sample><![CDATA[{{ON}}
1076 type a = {{ <x kind="a">[ a* ] }}
1077 type b = {{ <x kind="b">[ b* ] }}
1078 ]]></sample>
1079
1080 <p>
1081 Then the first implementation still works as expected, but the
1082 second one needs to be rewritten.</p>
1083
1084 <p>Now one might believe that the second implementation is more
1085 efficient because it tells the compiler to check only the root tag,
1086 whereas the first implementation would force
1087 the compiler to produce code to check that all tags in the tree
1088 are <code>a</code>s. But this is not what happens! Actually,
1089 you can check that the compiler will produce exactly the same code
1090 for both implementations. It considers the static type information
1091 about the argument of the pattern matching (here, the input type
1092 of the function), and computes an efficient way to evaluate
1093 patterns for the values of this type.
1094 </p>
1095
1096 </section>
1097
1098 <section title="The map iterator">
1099
1100 <p>
1101 The <code>map ... with ...</code> iterator is implemented in a
1102 tail-recursive way. You can safely use it on very long sequences.
1103 </p>
1104
1105 </section>
1106
1107 </box>
1108
1109 <box title="Code samples" link="code">
1110
1111 <section title="Parsing XML files">
1112
1113 <p>
1114 OCamlDuce does not come with any built-in XML parser. However,
1115 the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
1116 makes it easy to plug existing XML parsers. Here is some
1117 code which demonstrate how to do that with three of
1118 the most popular OCaml XML parser libraries:
1119 </p>
1120
1121 <ul>
1122 <li><a
1123 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
1124 <li><a
1125 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
1126 <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
1127 </ul>
1128
1129 </section>
1130
1131 <section title="Converting DTD to OCamlDuce types">
1132
1133 <p>
1134 This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
1135 from a DTD. It requires PXP.
1136 </p>
1137
1138 <note>This application does not use any of the new features, but it
1139 can be useful in the development of OCamlDuce applications.
1140 </note>
1141
1142 </section>
1143
1144 <section title="Parsing XML Schema, producing valid XHTML output">
1145
1146 <p>
1147 This <a
1148 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
1149 parses XML Schema Definitions (.xsd files), and produces summaries
1150 (toplevel declaration names) in XHTML. OCamlDuce type system ensures
1151 that the parser is coherent with the input XML type (any valid XML
1152 Schema is accepted) and that the printer is coherent with the output
1153 XML type (it is necessarily a valid XHTML document).
1154 </p>
1155
1156 <p>
1157 Of course, for such a simple transformation, parsing the XML document
1158 into an internal representation is not necessary. A direct XML-to-XML
1159 transformation would be easy to write. We wanted to illustrate
1160 a complex parsing of XML.
1161 </p>
1162
1163 <p>
1164 It it interesting to introduce errors in the parser
1165 <code>schema_loader.ml</code> or the printer
1166 <code>dump_schema.ml</code> and see how the type system catches them.
1167 </p>
1168
1169 <note>
1170 The application uses XML Light to parse XML document.
1171 </note>
1172
1173 <note>
1174 Some features of XML Schema are not parsed, such as
1175 <code>redefine</code> elements or substitution groups.
1176 </note>
1177
1178 </section>
1179
1180 </box>
1181
1182 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5