/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1804 - (show annotations)
Tue Jul 10 19:23:57 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 41975 byte(s)
[r2005-08-30 13:41:13 by afrisch] Empty log message

Original author: afrisch
Date: 2005-08-30 13:41:13+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="ocaml">
3
4 <title>OCamlDuce</title>
5
6 <left>
7 <local-links href="index,documentation"/>
8 <p>On this page:</p>
9 <boxes-toc/>
10 </left>
11
12 <box>
13
14 <p>
15 OCamlDuce is a merger between <a
16 href="http://caml.inria.fr/">OCaml</a> and
17 <local href="index">CDuce</local>. It comes as a modified
18 version of OCaml which integrates CDuce features: expressions, types,
19 patterns.
20 </p>
21
22 <p>
23 OCamlDuce is distributed under the same licenses as Objective Caml:
24 the Q Public License version 1.0 for the Compiler, and the LGPL
25 version 2 for the Library. The extension has been written by Alain
26 Frisch. Parts of the CDuce implementation, by the same author, have
27 been reused.
28 </p>
29
30 </box>
31
32 <box title="Download and installation" link="install">
33
34 <p>
35 Currently, OCamlDuce
36 is based on OCaml 3.08.4 and on a CVS snapshots
37 of CDuce (between 0.3.92 and the head).
38 </p>
39
40 <ul>
41 <li><a
42 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl2.tar.gz">Compiler,
43 version 3.08.4, patch level 2</a></li>
44 </ul>
45
46 <p>
47 There are two different installation modes:
48 </p>
49
50 <ul>
51 <li><b>Stand-alone mode</b>. OCamlDuce is used as a drop-in
52 replacement for OCaml. The build procedure is unchanged:
53 <tt>./configure &amp;&amp; make world &amp;&amp; make install</tt>.
54 The tools are named <tt>ocaml, ocamlc, ocamlopt</tt>, ...
55 The standard library is extended with the <tt>num</tt> library
56 and the <tt>Ocamlduce</tt> module.
57 </li>
58
59 <li><b>Package mode</b>. OCamlDuce is installed on top of an existing
60 OCaml installation (whose version number must match), without touching
61 it. The build
62 procedure is: <tt>./configure &amp;&amp; make all &amp;&amp; make opt
63 &amp;&amp; make install</tt>. The <tt>configure</tt> script should be called with
64 the same arguments as the ones used when you built OCaml. For instance,
65 the <tt>LIBDIR</tt> argument is used to find OCaml standard library.
66 The tools names are changed to <tt>ocamlduce, ocamlducec,
67 ocamlduceopt</tt>, ... They use the existing standard library.
68 In addition, a library <tt>ocamlduce.cma</tt> is built.
69 It depends on the <tt>nums.cma</tt> library. The <tt>install</tt>
70 target implements a <tt>Findlib</tt>-based installation. It registers
71 a package named <tt>ocamlduce</tt> and it puts the tools
72 in the package sub-directory (the <tt>BINDIR</tt> and <tt>LIBDIR</tt>
73 arguments to <tt>configure</tt> are not used). The toplevel
74 can be called by <tt>ocamlfind ocamlduce/ocamlduce -I `ocamlfind query ocamlduce`</tt>.
75 </li>
76 </ul>
77
78 <p>
79 GODI users can choose any of these two modes.
80 In order to upgrade an existing installation so as to use
81 OCamlDuce in place of OCaml, they must add this
82 line to their <tt>etc/godi.conf</tt> file:
83 </p>
84 <sample>
85 GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
86 </sample>
87 <p>
88 and force a recompilation of the <tt>godi-ocaml-src</tt>
89 and <tt>godi-ocaml</tt> packages. The alternative is to install OCamlDuce
90 as a GODI package over an existing installation. You don't need
91 to touch the <tt>etc/godi.conf</tt> file. The package
92 name is <tt>godi-ocamlduce</tt>. In order to use the new compilers
93 and tools, you can make the environment variable
94 <tt>OCAMLFIND_CONF</tt> point to the
95 <tt>$GODI/etc/findlib-ocamlduce.conf</tt> file and then
96 uses e.g. <tt>ocamlfind ocamlc -package ocamlduce</tt>.
97 </p>
98
99 </box>
100
101 <box title="Overview" link="overview">
102
103 <p>
104 The goal of the OCamlDuce project is to extend the OCaml language with features
105 to make it easier to write safe and efficient complex applications
106 that need to deal with XML documents. In particular, it relies
107 on a notion of types and patterns to guarantee statically
108 that all the possible input documents are correctly processed, and
109 that only valid output documents are produced.
110 </p>
111
112 <p>
113 In a nutshell, OCamlDuce extends OCaml with a new kind of values
114 (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
115 strings. In order to describe these values, it also extends the type algebra
116 with so-called <em>x-types</em>. The philosophy behind these types is that they
117 represent <em>set of x-values</em>. They can be very precise: indeed,
118 each value can be seen as a singleton type (a set with a single
119 value), and it is possible to form Boolean combinations of x-types
120 (intersection, union, difference).
121 </p>
122
123 <p>
124 OCamlDuce's type system can be understood as a refinement of OCaml.
125 For each sub-expression which is inferred to be of the x-kind (using
126 OCaml unification based type-system), OCamlDuce will try to infer to
127 best possible sound x-type. Here, best means smallest for the natural
128 subtyping relation (set inclusion). The inference algorithm is
129 actually a data-flow analysis: the x-type will collect all the values
130 that can be produced by the expression, considering all the possible
131 data-flow in the program. It it sometimes necessary to provide
132 explicit type annotations to help the type checker infer this type, in
133 particular when you define recursive functions or when you use
134 iterators.
135 </p>
136
137 <p>
138 Subtyping is implicit for x-types: if an expression is inferred to be
139 of x-type <code>t</code>, which is a subtype of <code>s</code>, then
140 it is possible to use this expression in any context which expects a
141 value of type <code>s</code>.
142 </p>
143
144 </box>
145
146 <box title="Getting started" link="start">
147
148 <p>
149 Most of the new language features are enclosed within double curly braces
150 <code>{{ON}}{{...}}</code>. For instance, the following code sample
151 defines a value <code>x</code> as an XML element (with tag
152 <code>a</code>, an attribute <code>href</code>, and a simple
153 string as content):
154 </p>
155
156 <sample><![CDATA[{{ON}}
157 # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
158 val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
159 {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
160 ]]></sample>
161
162 <p>
163 What appears between the curly braces is called an x-expression.
164 Similarly, there are x-types (as seen above), and also x-patterns.
165 The delimiters <code>{{ON}}{{...}}</code> are only used
166 for syntactical reasons, to avoid clashed between OCaml and CDuce
167 syntaxes and lexical conventions. As a matter of fact,
168 an OCaml expression need not be a syntactical x-expression
169 (delimited by double curly braces) to evaluate to an x-value.
170 For instance, once <code>x</code> has been declared as above,
171 the expression <code>x</code> evaluates to an x-value.
172 </p>
173
174
175 <p>
176 It is possible to use an arbitrary
177 OCaml expression as part of an x-expression: it must simply be
178 protected by a new pair of double curly braces. For instance, there is
179 no <code>if-then-else</code> construction for x-expressions, but you
180 can write:
181 </p>
182
183 <sample><![CDATA[{{ON}}
184 # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
185 - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
186 ]]></sample>
187
188 <p>
189 Only the highlighted parts are parsed as x-expressions. The
190 <code>if-then-else</code> sub-expression is parsed as an OCaml
191 expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
192 'z' ]}}</code>).
193 </p>
194
195 </box>
196
197 <box title="X-values" link="values">
198
199 <p>
200 X-values are intended to represent XML documents and fragments
201 thereof: elements, tags, text, sequences. In this section, we
202 present the x-value algebra, the syntax of the corresponding
203 x-expression constructors and the associated x-types.
204 </p>
205
206 <p>
207 There are three kinds of atomic kind of x-values:
208 </p>
209 <ul>
210 <li>Unicode characters;</li>
211 <li>qualified names;</li>
212 <li>arbitrarily large integers.</li>
213 </ul>
214
215 <section title="Characters">
216
217 <p>
218 X-characters are different from OCaml characters. They can represent
219 the range of Unicode codepoints defined in the XML specification.
220 Character literals are delimited by single quotes. The escape
221 sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
222 numerical escape sequence are written <code>\n;</code> where n is an integer
223 literal (note the extra semi-colon). The source code is interpreted as
224 being encoded in iso-8859-1. As a consequence, Unicode characters which are not
225 part of the Latin1 character set must be introduced with this
226 numerical escape mechanism. The x-types for x-characters are:
227 </p>
228 <ul>
229 <li>singletons;</li>
230 <li>intervals, written <code>c -- d</code>, where <code>c</code> and
231 <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
232 }}</code>);</li>
233 <li>the type of all x-characters, written <code>Char</code>;</li>
234 <li>the type of all Latin1 characters, written <code>Latin1Char</code>
235 (defined as <code>\0; -- \255;</code>).</li>
236 </ul>
237
238 </section>
239
240 <section title="Integers">
241
242 <p>
243 X-integers are arbitrarily large. Literals must be written in decimal.
244 Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
245 The x-types for x-integers are:
246 </p>
247 <ul>
248 <li>singletons;</li>
249 <li>intervals, written <code>i -- j</code>, where <code>i</code> and
250 <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
251 }}</code>); it is possible to replace <code>i</code> or <code>j</code>
252 with <code>**</code> to define open-ended intervals, e.g.
253 <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
254 </li>
255 <li>the type of all x-integers, written <code>Int</code>;</li>
256 <li>the type of all the integers which can be represented by a
257 signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
258 <code>Int64</code>).</li>
259 </ul>
260
261 </section>
262
263 <section title="Qualified names">
264
265 <p>
266 Qualified names are intended to represent XML tag names. Conceptually,
267 they are made of a namespace URI and a local name. Since URIs tends
268 to be long, literals are of the form <code>`prefix:local</code>
269 where <code>local</code> is the local name and <code>prefix</code>
270 is an <em>namespace prefix</em> bound to some URI (in the scope of the
271 literal). The local name follows the definitions from
272 the XML Namespaces specification; a dot character must be protected
273 by a backslash and non-Latin1 characters are written as character
274 literals <code>\n;</code>. <a href="#ns">See below</a> for a
275 explanation on how to bind prefixes to URIs. To refer
276 to the default namespace (or the absence of namespace if not default
277 has been defined), the syntax is simply <code>`local</code>.
278 The x-types for qualified names are:
279 </p>
280 <ul>
281 <li>singletons;</li>
282 <li>the type of all qualified names, written <code>Atom</code>;</li>
283 <li>the type of all qualified names from a specified namespace,
284 written <code>`ns:*</code>.</li>
285 </ul>
286 </section>
287
288 <section title="Records">
289
290 <p>
291 X-records are mainly used to represent the set of attributes of an XML
292 element. An x-record is a binding from a finite set of <em>labels</em>
293 to x-values. Labels follows the same syntax as for qualified names
294 without the leading backquote. However, if the namespace prefix is not
295 given, the default namespace does not apply (the namespace URI is
296 empty). The syntax for record x-expressions is <code> { l1=e1
297 ... ln=en }</code> where the <code>li</code> are labels and the
298 <code>ei</code> are x-expressions. Fields can also be separated with a
299 semi-colon. It is legal to omit the expression for a field; the label is then
300 taken as the content of the field (a value with this name must be
301 defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
302 in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
303 y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
304 which labels are authorized/mandatory, and what the types of the
305 corresponding fields are. There are two kind of record x-types:
306 </p>
307
308 <ul>
309 <li>
310 Closed record types, which only allow a finite number of fields:
311 <code>{ l1=t1 ... ln=tn }</code>;
312 </li>
313 <li>
314 Open record types, which allow additional fields (with arbitrary
315 type):
316 <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
317 in the syntax).
318 </li>
319 </ul>
320
321 <p>
322 In both cases, it is possible to make one of
323 the fields optional by changing = to =?.
324 </p>
325
326 <p>
327 The x-type of all x-record is thus <code>{ .. }</code>,
328 and the x-type of x-records with maybe a field <code>l</code>
329 of type <code>Int</code> and maybe arbitrary other fields is
330 <code>{ l=?Int .. }</code>.
331 </p>
332
333 </section>
334
335 <section title="Sequences">
336
337 <p>
338 X-sequences are finite and ordered collections of x-values.
339 The syntax for a sequence x-expression in
340 <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
341 by semi-colons as in OCaml list). Each item <code>ei</code>
342 can either be:
343 </p>
344 <ul>
345 <li>an x-expression;</li>
346 <li><code>!e</code> where <code>e</code> is an x-expression which
347 evaluates to a sequence (whose content is inserted in the sequence
348 which is currently defined); e.g.
349 <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
350 <code>[ 1 2 3 4 ]</code>;</li>
351 <li>a string literal delimited by simple quotes; e.g.
352 <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
353 </ul>
354
355 <p>
356 X-types for sequences are of the form <code>[R]</code>
357 where <code>R</code> is a regular expression over x-types which
358 describe the possible contents of the sequences. The possible
359 forms of regular expressions are:
360 </p>
361
362 <ul>
363 <li><code>t</code> (one single element of x-type <code>t</code>)</li>
364 <li><code>R*</code> (zero or more repetitions)</li>
365 <li><code>R+</code> (one or more repetitions)</li>
366 <li><code>R?</code> (zero or one repetition)</li>
367 <li><code>R1 R2</code> (sequence)</li>
368 <li><code>R1|R2</code> (alternation)</li>
369 <li><code>(R)</code></li>
370 <li><code>/t</code> (guard: the tail of the sequence must comply with
371 <code>t</code>).</li>
372 <li><code>PCDATA</code> (equivalent to Char*).</li>
373 </ul>
374
375 <note>sequence are actually encoded with embedded pairs and a
376 terminator, and sequences types are encoded with product types and
377 recursive types. The encoding is available to the programmer
378 but not described in this manual.
379 </note>
380
381 </section>
382
383 <section title="Strings">
384
385 <p>
386 Strings are nothing but sequences of characters. There are two
387 predefined types <code>String</code> and <code>Latin1</code>
388 (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
389 </p>
390
391 <p>
392 A string literal <code>[ '...' ]</code> can also be written
393 <code>"..." </code> (without the square brackets). Note that simple
394 (resp. double) quotes need to be escaped only when the string is
395 delimited with double (resp. simple) quotes.
396 </p>
397
398 </section>
399
400 <section title="XML elements">
401
402 <p>
403 An XML element is a triple of x-values. The syntax for
404 the corresponding x-expression constructor is
405 <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
406 qualified name literal, it is possible to omit the leading
407 backquote and the surrounding parentheses. Similarly,
408 when <code>e2</code> is an x-record literal, it is possible
409 to omit the curly braces and the parentheses. For instance,
410 one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
411 instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
412 </p>
413
414 <p>
415 XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
416 and the same simplifications applies. For instance, if
417 the namespace prefix <code>ns</code> has been defined,
418 the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
419 it describes XML elements whose tag is in the namespace bound to
420 <code>ns</code>, with an empty content, and with an arbitrary set of
421 attributes. An underscore in place of <code>(t1)</code> is
422 equivalent to <code>(Atom)</code> (any tag).
423 </p>
424
425 </section>
426
427 </box>
428
429 <box title="X-expressions" link="expr">
430
431 <p>
432 In the previous section, we have seen the syntax for x-values
433 constructors (constant literals, sequence, record, element constructors).
434 In this section, we describe the other kinds of x-expressions.
435 </p>
436
437 <section title="Binary infix operators">
438
439 <p>
440 The arithmetic operators on integers follow the usual precedence.
441 They are written <code>+,*,-,div,mod</code> (they are all infix).
442 </p>
443
444 <p>
445 Record concatenation: <code>e1 ++ e2</code>. The x-expressions
446 <code>e1</code> and <code>e2</code> must evaluate to x-records.
447 The result is obtained by concatening them. If a field with the same
448 label is present in both records, the right-most one is selected.
449 </p>
450
451 <p>
452 Sequence concatenation: <code>e1 @ e2</code>, equivalent
453 to <code>[!e1 !e2]</code>.
454 </p>
455
456 </section>
457
458 <section title="Projections, filtering">
459
460 <p>
461 If the x-expression <code>e</code> evaluates to a record or an XML
462 element, the construction <code>e.l</code> will extract the value of
463 field or attribute <code>l</code>. Similarly, the construction
464 <code>e.?l</code> will extract the value of field or attribute
465 <code>l</code> if present, and return the empty sequence
466 <code>[]</code> otherwise.
467 </p>
468
469 <p>
470 If the x-expression <code>e</code> evaluates to a record,
471 the construction <code>e -. l</code> will produce a new record
472 where the field <code>l</code> has been removed (if present).
473 </p>
474
475 <p>
476 If the x-expression <code>e</code> evaluates to an x-sequence,
477 the construction <code>e/</code> will result in a new x-sequence
478 obtained by taking in order all the children of the XML elements
479 from the sequence <code>e</code>. For instance, the x-expression
480 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
481 evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
482 </p>
483
484 <p>
485 If the x-expression <code>e</code> evaluates to an x-sequence,
486 the construction <code>e.(t)</code> (where <code>t</code> is an
487 x-type) will result in a new x-sequence
488 obtained by filtering <code>e</code> to keep only the elements
489 of type <code>t</code>. For instance, the x-expression
490 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
491 evaluates to the x-value <code>[ 4 5 ]</code>.
492 </p>
493 </section>
494
495 <section title="Dynamic type checking">
496
497 <p>
498 If <code>e</code> is an x-expression and <code>t</code> is an x-type,
499 the construction <code>(e :? t)</code> returns the same
500 result as <code>e</code> if it has type <code>t</code>, and otherwise
501 raises a <code>Failure</code> exception whose argument explains
502 why this is not the case.
503 </p>
504
505 <sample><![CDATA[{{ON}}
506 # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
507 f {{ <a>[ 1 2 '3' ] }};;
508 Exception:
509 Failure
510 "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
511 ]]></sample>
512 </section>
513
514 <section title="Pattern matching">
515
516 <p>
517 OCamlDuce comes with a powerful pattern matching operation.
518 X-patterns are described <a href="#patterns">below</a>.
519 The syntax for the pattern matching operation is:
520 <code>match e with p1 -> e1 | ... | pn -> en</code>.
521 The type-system ensures exhaustivivity for the pattern matching
522 and infers precise types for the capture variables.
523 It is also possile to use x-pattern matching as a regular
524 OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
525 match e with {{p1}} -> e1 | ... | {{pn}} -> en
526 function {{p1}} -> e1 | ... | {{pn}} -> en
527 </p>
528
529 <p>
530 Pattern matching follows is first-match policy. The first pattern
531 that succeeds triggers the corresponding branch.
532 </p>
533
534 <note>
535 currently it is impossible to mix normal OCaml patterns and x-patterns
536 in a single pattern matching.
537 </note>
538
539 </section>
540
541 <section title="Local binding">
542
543 <p>
544 The x-expression <code>let p=e1 in e2</code> is equivalent to
545 <code>match e1 with p -> e2</code>. There is also an local binding
546 with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
547 e2</code>.
548 </p>
549
550 </section>
551
552
553 <section title="Iterators">
554
555 <p>
556 OCamlDuce comes with a sequence iterator
557 <code>map e with p1 -> e1 | ... | pn -> en</code> and
558 a tree iterator
559 <code>map* e with p1 -> e1 | ... | pn -> en</code>.
560 </p>
561
562 <p>
563 For both constructions, the argument must evaluate to a sequence.
564 The <code>map</code> iterator applies the patterns to each element
565 of this sequence in turns and produces a new sequence by concatenating
566 all the results (all the right-hand sides must thus produce a
567 sequence). The set of patterns must be exhaustive for all the possible
568 elements of the input sequence.
569 </p>
570
571 <p>
572 The tree iterator is similar except that the patterns need not be
573 exhaustive. If some element of the input sequence is not matched,
574 it is simply copied into the result unless it is an XML element. In
575 this case, the transformation is applied recursively to its content.
576 </p>
577
578 </section>
579
580 <section title="OCaml constructions">
581
582 <p>
583 As a convenience, some of the OCaml expression constructors
584 are allowed as x-expressions (without a need to go back to OCaml
585 with double curly braces): (unqualified) value identifiers and
586 function calls.
587 </p>
588
589 </section>
590
591 </box>
592
593 <box title="More on x-types" link="types">
594
595 <p>
596 We have seen how to write simple x-types. We can then combine
597 them with Boolean connectives:
598 </p>
599
600 <ul>
601 <li><code>t1 &amp; t2</code>: intersection;</li>
602 <li><code>t1 | t2</code>: union;</li>
603 <li><code>t1 - t2</code>: difference.</li>
604 </ul>
605
606 <p>
607 The empty x-type is written <code>Empty</code> (it contains no value),
608 and the universal x-type is written <code>Any</code> (it contains
609 all the x-values) or <code>_</code>.
610 </p>
611
612 <p>
613 When an x-type has been bound to some OCaml identifier
614 (<code>{{ON}}type t = {{...}}</code>), it is possible to use
615 this identifier in another x-type. Recursive definitions
616 are allowed:
617 </p>
618
619 <sample><![CDATA[{{ON}}
620 type t1 = {{ <a>[ t2* ] }}
621 and t2 = {{ <b>[ t1* ] }}
622 ]]></sample>
623
624 <p>
625 Note that x-values are always finite and acyclic. The type checker
626 detects type definition which would yield empty types:
627 </p>
628
629 <sample><![CDATA[{{ON}}
630 # type t = {{ <a>[ t+ ] }};;
631 This definition yields an empty type
632 ]]></sample>
633
634 <p>
635 If <code>t1</code> and <code>t2</code> are record x-types,
636 we can combine them with the infix <code>++</code> operator, which
637 mimics the corresponding operator on expressions (record
638 concatenation). Similarly, we can use the infix <code>@</code>
639 concatenation operator on sequence x-types.
640 </p>
641
642 </box>
643
644 <box title="X-patterns" link="patterns">
645
646 <p>
647 X-patterns follow the same syntax as X-types. In particular,
648 any X-type is a valid X-pattern. In addition to X-types constructors,
649 X-patterns can have:
650 </p>
651
652 <ul>
653 <li>capture variables (lowercase OCaml identifiers);</li>
654 <li>constant bindings <code>(x := c)</code> where x is a capture
655 variable and c is
656 a literal x-constant (this pattern always succeeds and returns the
657 binding x->c).</li>
658 </ul>
659
660 <p>
661 Here is a brief description of the semantics of patterns. Given
662 an input value, a pattern can either succeed or fail. If it succeeds,
663 it also produces a bindings from the capture variables in the pattern
664 to x-values.
665 </p>
666
667 <ul>
668
669 <li>A pattern which is just a type (no capture variable) succeeds if
670 and only if the value has the type.</li>
671
672 <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
673 or <code>p2</code> succeed, and returns the corresponding binding; if
674 both patterns succeeds, <code>p1</code> wins. It is required that
675 <code>p1</code> and <code>p2</code> have the same sets of capture
676 variables. </li>
677
678 <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
679 and <code>p2</code> succeed, and returns the concatenation of the two
680 bindings. It is required that <code>p1</code> and <code>p2</code> have
681 <em>disjoint</em> sets of capture variables. </li>
682
683 </ul>
684
685 <p>
686 In record x-patterns, it is possible to omit the <code>=p</code> part
687 of a field. The content is then replaced with the label name
688 considered as a capture variable. E.g. <code>{ x y=p }</code> is
689 equivalent to <code>{ x=x y=p }</code>.</p>
690
691 <p>It is also possible to add an "else" clause:
692 <code>{ x = (a,_)|(a:=3) }</code>
693 will accept any record with atmost the field <code>x</code>. If the content
694 is a pair, the capture variable a will be bound to its component;
695 otherwise, it is set to <code>3</code>.</p>
696
697 <p>
698 In regular expressions, it is possible to extract whole subsequences
699 with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
700 </p>
701
702 <p>
703 If the same sequence capture variable appears several times (or below a
704 repetition) in a regexp, it is bound to the concatenation of all
705 matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
706 collect in <code>x</code> all the elements of type <code>Int</code> from
707 a sequence. It is not legal to have repeated simple capture variables.
708 </p>
709
710 <p>
711 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
712 as possible). They admit non-greedy variants <code>+?,*?,??</code>.
713 </p>
714 </box>
715
716 <box title="Namespace bindings" link="ns">
717
718 <p>
719 The binding of namespace prefixes to URIs
720 can be done either by toplevel phrases (structure items) or
721 by local declarations:
722 </p>
723
724 <sample>{{ON}}
725 # {{ namespace ns = "http://..." }};;
726 # let x = {{ `ns: x }};;
727 val x : {{`ns:x}} = {{`ns:x}}
728 # let x = {{ let namespace ns = "http://..." in `ns:x }};;
729 val x : {{`ns:x}} = {{`ns:x}}
730 </sample>
731
732 <p>The toplevel definitions can also appear in module interfaces
733 (signatures). A toplevel prefix binding is not exported by a module: its scope
734 is limited to the current structure or signature. It is possible
735 to specify a default namespace, and to reset it:
736 </p>
737
738 <sample>{{ON}}
739 # {{ namespace "http://..." }};;
740 # {{ `x }};;
741 - : {{`ns1:x}} = {{`ns1:x}}
742 # {{ namespace "" }};;
743 # {{ `x }};;
744 - : {{`x}} = {{`x}}
745 </sample>
746
747 <p>
748 Note that the value pretty-printer invented some prefix
749 for the namespace URI. The default prefix declaration also have a
750 local form <code> let namespace "..." in ... </code>.
751 </p>
752
753 </box>
754
755 <box title="More on type-checking" link="typecheck">
756
757 <section title="Type inference">
758
759 <p>
760 As we said above, the programmer is sometimes required to provide type
761 annotations. To know where to put these annotation, it is necessary to
762 get a basic understanding of how type-checking works.
763 </p>
764
765 <p>
766 The OCaml type-checker is run first to detect which sub-expressions
767 are of the x-kind. A second ML type-checking pass is then done to
768 introduce subsumption (implicit subtyping) steps where allowed. After
769 these two passes, the OCamlDuce type checker obtains a data-flow summary of
770 x-values in the whole compilation unit. This is a directed graph,
771 whose edges represent either simple data-flow or complex operation
772 on x-values. The nodes of the graph can be thought as x-type
773 variables. A data-flow edge corresponds to a subtyping constraints,
774 and an operation edge corresponds to a symbolic constraints which
775 mimics the corresponding operation on values.
776 </p>
777
778 <p>
779 Some of the nodes are given an explicit type by the programmer,
780 through type annotations (on expressions or function arguments)
781 or the other usual mechanism in ML (data type declarations,
782 signatures, ...).
783 </p>
784
785 <p>
786 Also, if there is a loop with only subtyping edges in the graph,
787 all the nodes on the loop are merged together.
788 </p>
789
790 <p>
791 After this operation, the graph is required to be acyclic (assuming
792 that the nodes with an explicit type are removed from the graph). It
793 is the responsibility of the programmer to provide enough type
794 annotation to achieve this property. Otherwise, a type error
795 is issued.
796 </p>
797
798 <sample><![CDATA[{{ON}}
799 # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
800 Cycle detected: cannot type-check
801 # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
802 val f : int -> {{String}} = <fun>]]>
803 </sample>
804
805 <p>
806 In the example above, there is a cycle between the result type for
807 <code>f</code> and the type for the sub-expression <code>{{ON}}f
808 {{n-1}}</code>. It is here broken with a type annotation on the result; it could
809 have been broken by a type annotation on the expression <code>{{ON}}f
810 {{n-1}}</code>, or on the function <code>f</code> itself, or by a
811 module signature.
812 </p>
813
814 <p>
815 Let us study another simple example:
816 </p>
817
818 <sample>{{ON}}
819 # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
820 - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
821 </sample>
822
823 <p>
824 The type-checkers detects that the two x-values <code>2</code> and
825 <code>3</code> can flow to the argument of <code>f</code>. Its body
826 is thus type-checked with the assumption that <code>x</code> has type
827 <code>2--3</code>. The computed result type is then <code>3--4</code>.
828 </p>
829
830
831 <p>
832 The type-inference process described above is global by nature. The
833 acyclicity condition is only imposed after a whole compilation unit
834 has been type-checked by OCaml (and the information from the module
835 interface as been integrated). When a type variable is inferred to
836 be of the x-kind, it is never generalized. As a consequence, there
837 is no parametric polymorphism on x-types.
838 </p>
839
840 <p>
841 In the toplevel, type-checking is done after each phrase. Consider
842 the following session:
843 </p>
844
845 <sample><![CDATA[{{ON}}
846 # let f x = {{ x + 1 }};;
847 val f : {{Empty}} -> {{Empty}} = <fun>
848 # let a = f {{ 2 }};;
849 Subtyping failed 2 <= Empty
850 Sample:
851 2
852 ]]></sample>
853
854 <p>
855 The function <code>f</code> is inferred to have type
856 <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
857 phrase is type-checked, the data-flow graph says that no value
858 can flow to <code>x</code>, and thus the input type is empty
859 (and similarly for the result type). If the two phrases
860 were type-checked together (which would be the case it they had
861 been compiled by the compiler, not in the toplevel), the type checker
862 would have correctly inferred that the input type for <code>f</code>
863 must contain <code>2</code>.
864 </p>
865
866 </section>
867
868 <section title="Implicit subtyping">
869
870 <p>
871 Coercion from an x-type to a super type is automatic in OCamlDuce.
872 However, this automatic subsumption does not carry over to OCaml
873 type constructor, even if there are covariant. Consider:
874 </p>
875
876 <sample><![CDATA[{{ON}}
877 # let f (x : {{ Int }} * {{ Int }}) = 1;;
878 val f : {{Int}} * {{Int}} -> int = <fun>
879 # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
880 This expression has type {{0}} * {{0}} but is here used with type
881 {{Int}} * {{Int}}
882 # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
883 val g : {{0}} * {{0}} -> int = <fun>
884 # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
885 val g : {{0}} * {{0}} -> int = <fun>
886 ]]></sample>
887
888 <p>
889 The first attempt to define <code>g</code> fails because the type for
890 <code>x</code> is not an x-type and thus subsumption does not
891 apply. In the second attempt, we extract the two components of the
892 pair; since they are inferred to be x-values, subtyping applies to
893 both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
894 it is legal to unify its type with the input type of <code>f</code>.
895 The third definition for <code>g</code> gives an alternative solution:
896 using explicit OCaml type coercions.
897 </p>
898
899 </section>
900
901 </box>
902
903 <box title="Exchanging values" link="transl">
904
905 <p>
906 OCamlDuce strongly seperates regular OCaml values from the new
907 x-values. They have different syntax, expressions, types, patterns,
908 and even type-checking algorithms. This strong segregation is key point
909 which allowed a simple integration between very different type
910 systems.
911 </p>
912
913 <p>
914 At some point, it is still necessary to cross the frontier and
915 translate OCaml values to x-values or the opposite.
916 </p>
917
918 <p>
919 Fortunately, OCamlDuce provides automatic translations in both
920 directions. Instead of double curly braces, you can
921 enclose x-expressions in curly brace+colon <code>{: ... :}</code>
922 (here, the <code>...</code> is an x-expression).
923 The effect is to translate the result of the x-expression
924 (which must be an x-value) to an OCaml value. Similarly,
925 in an x-expression, you can obtain the x-translation of
926 an OCaml value with the same syntax <code>{: ... :}</code>
927 (here, the <code>...</code> is an OCaml expression).
928 </p>
929
930 <p>
931 Here is how the translation works. To each OCaml type <code>t</code>,
932 we associate an x-type <code>T(t)</code> and a pair of translation
933 function between <code>t</code> and <code>T(t)</code>.
934 Actually, not all the features are supported. For instance,
935 free type variables, abstract types, object types, non-regular
936 recursive types cannot be translated. In particular, since
937 type variables are not allowed, the OCaml type must be fully known.
938 </p>
939
940 <p>
941 The translation for an OCaml type <code>t</code> is defined by structural
942 induction on <code>t</code>. Sum types are
943 translated to union types: a constant constructor <code>A</code> is
944 translated to the qualified name <code>`A</code>; a non-constant
945 constructor <code>A of t1 * ... * tn</code> is translated to
946 <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
947 have the same translation. Record types are translated to closed
948 record x-types. Some other translations:
949 </p>
950
951 <table border="1">
952 <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
953 <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
954 <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
955 <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
956 <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
957 <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
958 <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
959 <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
960 <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
961 <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
962 </table>
963
964 <p>
965 Here is an example:
966 </p>
967
968 <sample>{{ON}}
969 # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
970 - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
971 </sample>
972
973 <p>
974 In this example, the result type of the translation is inferred
975 to be <code>{{ON}}{{ Int }} list</code> (because the type for
976 <code>f</code> is given). The corresponding x-type
977 is <code>{{ON}}{{ [Int*] }}</code>.
978 </p>
979
980 </box>
981
982 <box title="The standard library" link="stdlib">
983
984 <p>
985 In OCamlDuce, the Num library from OCaml is included in the standard
986 library. In addition, there are two new module called
987 <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
988 </p>
989
990 <p>
991 The module <code>Cduce_types</code> gives access to the internal
992 representation of x-values. It is currently undocumented.
993 </p>
994
995 <p>
996 The module <code>Ocamlduce</code> provides several useful
997 functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
998 documentation for a description of its interface.
999 </p>
1000
1001 </box>
1002
1003 <box title="Marshaling" link="marshal">
1004
1005 <p>
1006 OCamlDuce use some tricks on its internal representation of x-values
1007 to reduce memory usage and improve performance. You need to pay
1008 special attention if you want to use OCaml serialization functions
1009 (module <code>Marshal</code>, functions
1010 <code>input_value/output_value</code>) on x-values. In addition to
1011 your values, you also need to save and restore some piece of internal data
1012 using the functions <code>Cduce_types.Value.extract_all</code> and
1013 <code>Cduce_types.Value.intract_all</code>. Of course, this also
1014 applies if the value to be serialized contains deeply nested x-values.
1015 </p>
1016
1017 <p>
1018 Here are generic
1019 serialization/deserializations functions that illustrate how to do it:
1020 </p>
1021
1022 <sample>
1023 let my_output_value oc v =
1024 let p = Cduce_types.Value.extract_all () in
1025 output_value oc (p,v)
1026
1027 let my_input_value ic =
1028 let (p,v) = input_value ic in
1029 Cduce_types.Value.intract_all p;
1030 v
1031 </sample>
1032
1033 </box>
1034
1035 <box title="Performance" link="perf">
1036
1037 <section title="Strings">
1038
1039 <p>
1040 OCaml users might be surprised by the fact that x-strings are simply
1041 represented as sequences in OCamlDuce. Does this mean that they are
1042 actually stored in memory as linked list? Certainly not! The internal
1043 representation of sequence values uses several tricks to improve
1044 performance and memory usage. In particular, a special form in the
1045 representation can store strings as byte buffers, as in OCaml.
1046 It an XML document is loaded, or if a Caml string is converted
1047 to an x-value, this compact representation will be used.
1048 </p>
1049
1050 </section>
1051
1052 <section title="Concatenation">
1053
1054 <p>
1055 Similarly, OCaml users might be relectutant to use the sequence
1056 concatenation <code>@</code> on sequences. In OCaml, the complexity
1057 of this operator is linear in the size of its first argument (which
1058 need to be copied). OCamlDuce use a special form in its internal
1059 representation to store concatenation in a lazy way. The concatenation
1060 will really by computed only when the value is accessed. This means
1061 that it's perfectly ok to build a long sequence by adding
1062 new elements at the end one by one, as long as you don't
1063 simultaneously inspect the sequence.
1064 </p>
1065
1066 </section>
1067
1068 <section title="Pattern matching">
1069
1070 <p>
1071 Another point which is worth knowing when programming in OCamlDuce
1072 is that patterns can be written in a declarative style without
1073 affective performance. The compiler uses static type information
1074 about matched values to produce efficient code for pattern matching.
1075 To illustrate this, consider the following sample:
1076 </p>
1077
1078 <sample><![CDATA[{{ON}}
1079 x.ml:
1080
1081 type a = {{ <a>[ a* ] }}
1082 type b = {{ <b>[ b* ] }}
1083
1084 let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1085 ]]></sample>
1086
1087 <sample><![CDATA[{{ON}}
1088 y.ml:
1089
1090 type a = {{ <a>[ a* ] }}
1091 type b = {{ <b>[ b* ] }}
1092
1093 let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1094 ]]></sample>
1095
1096 <p>
1097 The two functions have exactly the same semantics, but the first
1098 implementation is more declarative: it uses type checks to distinguish
1099 between <code>a</code> and <code>b</code> instead of saying
1100 <em>how</em> to distinguish between these two types. Imagine
1101 that the definition of these types change to:
1102 </p>
1103
1104 <sample><![CDATA[{{ON}}
1105 type a = {{ <x kind="a">[ a* ] }}
1106 type b = {{ <x kind="b">[ b* ] }}
1107 ]]></sample>
1108
1109 <p>
1110 Then the first implementation still works as expected, but the
1111 second one needs to be rewritten.</p>
1112
1113 <p>Now one might believe that the second implementation is more
1114 efficient because it tells the compiler to check only the root tag,
1115 whereas the first implementation would force
1116 the compiler to produce code to check that all tags in the tree
1117 are <code>a</code>s. But this is not what happens! Actually,
1118 you can check that the compiler will produce exactly the same code
1119 for both implementations. It considers the static type information
1120 about the argument of the pattern matching (here, the input type
1121 of the function), and computes an efficient way to evaluate
1122 patterns for the values of this type.
1123 </p>
1124
1125 </section>
1126
1127 <section title="The map iterator">
1128
1129 <p>
1130 The <code>map ... with ...</code> iterator is implemented in a
1131 tail-recursive way. You can safely use it on very long sequences.
1132 </p>
1133
1134 </section>
1135
1136 </box>
1137
1138 <box title="OCaml and OCamlDuce" link="ocaml">
1139
1140 <p>
1141 Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1142 OCaml release. This means that OCamlDuce can use OCaml-generated
1143 <tt>.cmi</tt> files and that it produces an OCaml-compatible
1144 <tt>.cmi</tt> file if the interface does not use any x-type
1145 (this file is equal to what would have been obtained by using OCaml).
1146 </p>
1147
1148 <p>
1149 It is thus possible to use existing libraries which were compiled for
1150 OCaml 3.08.4. It is also possible to use OCamlDuce to compile
1151 some modules and use them in an OCaml project provided their interface
1152 is pure OCaml.
1153 </p>
1154
1155
1156 </box>
1157
1158 <box title="Code samples" link="code">
1159
1160 <section title="Parsing XML files">
1161
1162 <p>
1163 OCamlDuce does not come with any built-in XML parser. However,
1164 the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
1165 makes it easy to plug existing XML parsers. Here is some
1166 code which demonstrate how to do that with three of
1167 the most popular OCaml XML parser libraries:
1168 </p>
1169
1170 <ul>
1171 <li><a
1172 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
1173 <li><a
1174 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
1175 <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
1176 </ul>
1177
1178 </section>
1179
1180 <section title="Converting DTD to OCamlDuce types">
1181
1182 <p>
1183 This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
1184 from a DTD. It requires PXP.
1185 </p>
1186
1187 <note>This application does not use any of the new features, but it
1188 can be useful in the development of OCamlDuce applications.
1189 </note>
1190
1191 </section>
1192
1193 <section title="Parsing XML Schema, producing valid XHTML output">
1194
1195 <p>
1196 This <a
1197 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
1198 parses XML Schema Definitions (.xsd files), and produces summaries
1199 (toplevel declaration names) in XHTML. OCamlDuce type system ensures
1200 that the parser is coherent with the input XML type (any valid XML
1201 Schema is accepted) and that the printer is coherent with the output
1202 XML type (it is necessarily a valid XHTML document).
1203 </p>
1204
1205 <p>
1206 Of course, for such a simple transformation, parsing the XML document
1207 into an internal representation is not necessary. A direct XML-to-XML
1208 transformation would be easy to write. We wanted to illustrate
1209 a complex parsing of XML.
1210 </p>
1211
1212 <p>
1213 It it interesting to introduce errors in the parser
1214 <code>schema_loader.ml</code> or the printer
1215 <code>dump_schema.ml</code> and see how the type system catches them.
1216 </p>
1217
1218 <note>
1219 The application uses XML Light to parse XML document.
1220 </note>
1221
1222 <note>
1223 Some features of XML Schema are not parsed, such as
1224 <code>redefine</code> elements or substitution groups.
1225 </note>
1226
1227 </section>
1228
1229 <section title="String regular expressions">
1230
1231 <p>
1232 OCamlDuce supports regular expression types and patterns, not only
1233 for sequences of XML elements, but also for strings. The following
1234 example shows how to use regular expressions to split a string
1235 of the form <code>name1=val1,...,namen=valn</code> with
1236 <code>n>0</code> into
1237 a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>.
1238 The <code>*?</code> operator in regular expressions means ``ungreedy
1239 match'' (match the shortest possible subsequence). The last
1240 pattern describes precisely strings which are not matched by
1241 the other cases. It would be possible to replace it with
1242 the wildcard <code>_</code>.
1243 </p>
1244
1245 <sample><![CDATA[{{ON}}
1246 let rec split (s : {{ String }}) =
1247 match s with
1248 | {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest)
1249 | {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ]
1250 | {{ Any - [ _* '=' _* ] }} -> failwith "split"
1251 ]]></sample>
1252
1253 </section>
1254
1255 </box>
1256
1257 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5