/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1808 - (show annotations)
Tue Jul 10 19:24:09 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 42584 byte(s)
[r2005-09-24 14:57:03 by afrisch] Empty log message

Original author: afrisch
Date: 2005-09-24 14:57:03+00:00
1 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2 <page name="ocaml">
3
4 <title>OCamlDuce</title>
5
6 <left>
7 <local-links href="index,documentation"/>
8 <p>On this page:</p>
9 <boxes-toc/>
10 </left>
11
12 <box>
13
14 <p>
15 OCamlDuce is a merger between <a
16 href="http://caml.inria.fr/">OCaml</a> and
17 <local href="index">CDuce</local>. It comes as a modified
18 version of OCaml which integrates CDuce features: expressions, types,
19 patterns.
20 </p>
21
22 <p>
23 OCamlDuce is distributed under the same licenses as Objective Caml:
24 the Q Public License version 1.0 for the Compiler, and the LGPL
25 version 2 for the Library. The extension has been written by Alain
26 Frisch. Parts of the CDuce implementation, by the same author, have
27 been reused.
28 </p>
29
30 </box>
31
32 <box title="Download and installation" link="install">
33
34 <p>
35 Currently, OCamlDuce
36 is based on OCaml 3.08.4 and on a CVS snapshots
37 of CDuce (between 0.3.92 and the head).
38 </p>
39
40 <ul>
41 <li><a
42 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl2.tar.gz">Compiler,
43 version 3.08.4, patch level 2</a></li>
44 </ul>
45
46 <p>
47 There are two different installation modes:
48 </p>
49
50 <ul>
51 <li><b>Stand-alone mode</b>. OCamlDuce is used as a drop-in
52 replacement for OCaml. The build procedure is unchanged:
53 <tt>./configure &amp;&amp; make world &amp;&amp; make install</tt>.
54 The tools are named <tt>ocaml, ocamlc, ocamlopt</tt>, ...
55 The standard library is extended with the <tt>num</tt> library
56 and the <tt>Ocamlduce</tt> module.
57 </li>
58
59 <li><b>Package mode</b>. OCamlDuce is installed on top of an existing
60 OCaml installation (whose version number must match), without touching
61 it. The build
62 procedure is: <tt>./configure &amp;&amp; make all &amp;&amp; make opt
63 &amp;&amp; make install</tt>. The <tt>configure</tt> script should be called with
64 the same arguments as the ones used when you built OCaml. For instance,
65 the <tt>LIBDIR</tt> argument is used to find OCaml standard library.
66 The tools names are changed to <tt>ocamlduce, ocamlducec,
67 ocamlduceopt</tt>, ... They use the existing standard library.
68 In addition, a library <tt>ocamlduce.cma</tt> is built.
69 It depends on the <tt>nums.cma</tt> library. The <tt>install</tt>
70 target implements a <tt>Findlib</tt>-based installation. It registers
71 a package named <tt>ocamlduce</tt> and it puts the tools
72 in the package sub-directory (the <tt>BINDIR</tt> and <tt>LIBDIR</tt>
73 arguments to <tt>configure</tt> are not used). The toplevel
74 can be called by <tt>ocamlfind ocamlduce/ocamlduce -I `ocamlfind query ocamlduce`</tt>.
75 </li>
76 </ul>
77
78 </box>
79
80 <box title="Ports and packages" link="ports">
81
82 <section title="GODI">
83 <p>
84 GODI users can choose any of the two installation modes.
85 In order to upgrade an existing installation so as to use
86 OCamlDuce in place of OCaml, they must add this
87 line to their <tt>etc/godi.conf</tt> file:
88 </p>
89 <sample>
90 GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
91 </sample>
92 <p>
93 and force a recompilation of the <tt>godi-ocaml-src</tt>
94 and <tt>godi-ocaml</tt> packages. The alternative is to install
95 OCamlDuce
96 as a GODI package over an existing installation. You don't need
97 to touch the <tt>etc/godi.conf</tt> file. The package
98 name is <tt>godi-ocamlduce</tt>. In order to use the new compilers
99 and tools, you can make the environment variable
100 <tt>OCAMLFIND_CONF</tt> point to the
101 <tt>$GODI/etc/findlib-ocamlduce.conf</tt> file and then
102 uses e.g. <tt>ocamlfind ocamlc -package ocamlduce</tt>.
103 </p>
104 </section>
105
106 <section title="DarwinPorts and OpenBSD">
107
108 <p>
109 Anil Madhavapeddy contributed two ports of OCamlDuce for DarwinPorts
110 (in dports/lang/ocamlduce) and for OpenBSD (in ports/lang/ocamlduce).
111 </p>
112
113 </section>
114
115 </box>
116
117 <box title="Overview" link="overview">
118
119 <p>
120 The goal of the OCamlDuce project is to extend the OCaml language with features
121 to make it easier to write safe and efficient complex applications
122 that need to deal with XML documents. In particular, it relies
123 on a notion of types and patterns to guarantee statically
124 that all the possible input documents are correctly processed, and
125 that only valid output documents are produced.
126 </p>
127
128 <p>
129 In a nutshell, OCamlDuce extends OCaml with a new kind of values
130 (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
131 strings. In order to describe these values, it also extends the type algebra
132 with so-called <em>x-types</em>. The philosophy behind these types is that they
133 represent <em>set of x-values</em>. They can be very precise: indeed,
134 each value can be seen as a singleton type (a set with a single
135 value), and it is possible to form Boolean combinations of x-types
136 (intersection, union, difference).
137 </p>
138
139 <p>
140 OCamlDuce's type system can be understood as a refinement of OCaml.
141 For each sub-expression which is inferred to be of the x-kind (using
142 OCaml unification based type-system), OCamlDuce will try to infer to
143 best possible sound x-type. Here, best means smallest for the natural
144 subtyping relation (set inclusion). The inference algorithm is
145 actually a data-flow analysis: the x-type will collect all the values
146 that can be produced by the expression, considering all the possible
147 data-flow in the program. It it sometimes necessary to provide
148 explicit type annotations to help the type checker infer this type, in
149 particular when you define recursive functions or when you use
150 iterators.
151 </p>
152
153 <p>
154 Subtyping is implicit for x-types: if an expression is inferred to be
155 of x-type <code>t</code>, which is a subtype of <code>s</code>, then
156 it is possible to use this expression in any context which expects a
157 value of type <code>s</code>.
158 </p>
159
160 </box>
161
162 <box title="Getting started" link="start">
163
164 <p>
165 Most of the new language features are enclosed within double curly braces
166 <code>{{ON}}{{...}}</code>. For instance, the following code sample
167 defines a value <code>x</code> as an XML element (with tag
168 <code>a</code>, an attribute <code>href</code>, and a simple
169 string as content):
170 </p>
171
172 <sample><![CDATA[{{ON}}
173 # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
174 val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
175 {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
176 ]]></sample>
177
178 <p>
179 What appears between the curly braces is called an x-expression.
180 Similarly, there are x-types (as seen above), and also x-patterns.
181 The delimiters <code>{{ON}}{{...}}</code> are only used
182 for syntactical reasons, to avoid clashed between OCaml and CDuce
183 syntaxes and lexical conventions. As a matter of fact,
184 an OCaml expression need not be a syntactical x-expression
185 (delimited by double curly braces) to evaluate to an x-value.
186 For instance, once <code>x</code> has been declared as above,
187 the expression <code>x</code> evaluates to an x-value.
188 </p>
189
190
191 <p>
192 It is possible to use an arbitrary
193 OCaml expression as part of an x-expression: it must simply be
194 protected by a new pair of double curly braces. For instance, there is
195 no <code>if-then-else</code> construction for x-expressions, but you
196 can write:
197 </p>
198
199 <sample><![CDATA[{{ON}}
200 # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
201 - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
202 ]]></sample>
203
204 <p>
205 Only the highlighted parts are parsed as x-expressions. The
206 <code>if-then-else</code> sub-expression is parsed as an OCaml
207 expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
208 'z' ]}}</code>).
209 </p>
210
211 </box>
212
213 <box title="X-values" link="values">
214
215 <p>
216 X-values are intended to represent XML documents and fragments
217 thereof: elements, tags, text, sequences. In this section, we
218 present the x-value algebra, the syntax of the corresponding
219 x-expression constructors and the associated x-types.
220 </p>
221
222 <p>
223 There are three kinds of atomic kind of x-values:
224 </p>
225 <ul>
226 <li>Unicode characters;</li>
227 <li>qualified names;</li>
228 <li>arbitrarily large integers.</li>
229 </ul>
230
231 <section title="Characters">
232
233 <p>
234 X-characters are different from OCaml characters. They can represent
235 the range of Unicode codepoints defined in the XML specification.
236 Character literals are delimited by single quotes. The escape
237 sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
238 numerical escape sequence are written <code>\n;</code> where n is an integer
239 literal (note the extra semi-colon). The source code is interpreted as
240 being encoded in iso-8859-1. As a consequence, Unicode characters which are not
241 part of the Latin1 character set must be introduced with this
242 numerical escape mechanism. The x-types for x-characters are:
243 </p>
244 <ul>
245 <li>singletons;</li>
246 <li>intervals, written <code>c -- d</code>, where <code>c</code> and
247 <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
248 }}</code>);</li>
249 <li>the type of all x-characters, written <code>Char</code>;</li>
250 <li>the type of all Latin1 characters, written <code>Latin1Char</code>
251 (defined as <code>\0; -- \255;</code>).</li>
252 </ul>
253
254 </section>
255
256 <section title="Integers">
257
258 <p>
259 X-integers are arbitrarily large. Literals must be written in decimal.
260 Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
261 The x-types for x-integers are:
262 </p>
263 <ul>
264 <li>singletons;</li>
265 <li>intervals, written <code>i -- j</code>, where <code>i</code> and
266 <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
267 }}</code>); it is possible to replace <code>i</code> or <code>j</code>
268 with <code>**</code> to define open-ended intervals, e.g.
269 <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
270 </li>
271 <li>the type of all x-integers, written <code>Int</code>;</li>
272 <li>the type of all the integers which can be represented by a
273 signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
274 <code>Int64</code>).</li>
275 </ul>
276
277 </section>
278
279 <section title="Qualified names">
280
281 <p>
282 Qualified names are intended to represent XML tag names. Conceptually,
283 they are made of a namespace URI and a local name. Since URIs tends
284 to be long, literals are of the form <code>`prefix:local</code>
285 where <code>local</code> is the local name and <code>prefix</code>
286 is an <em>namespace prefix</em> bound to some URI (in the scope of the
287 literal). The local name follows the definitions from
288 the XML Namespaces specification; a dot character must be protected
289 by a backslash and non-Latin1 characters are written as character
290 literals <code>\n;</code>. <a href="#ns">See below</a> for a
291 explanation on how to bind prefixes to URIs. To refer
292 to the default namespace (or the absence of namespace if not default
293 has been defined), the syntax is simply <code>`local</code>.
294 The x-types for qualified names are:
295 </p>
296 <ul>
297 <li>singletons;</li>
298 <li>the type of all qualified names, written <code>Atom</code>;</li>
299 <li>the type of all qualified names from a specified namespace,
300 written <code>`ns:*</code>.</li>
301 </ul>
302 </section>
303
304 <section title="Records">
305
306 <p>
307 X-records are mainly used to represent the set of attributes of an XML
308 element. An x-record is a binding from a finite set of <em>labels</em>
309 to x-values. Labels follows the same syntax as for qualified names
310 without the leading backquote. However, if the namespace prefix is not
311 given, the default namespace does not apply (the namespace URI is
312 empty). The syntax for record x-expressions is <code> { l1=e1
313 ... ln=en }</code> where the <code>li</code> are labels and the
314 <code>ei</code> are x-expressions. Fields can also be separated with a
315 semi-colon. It is legal to omit the expression for a field; the label is then
316 taken as the content of the field (a value with this name must be
317 defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
318 in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
319 y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
320 which labels are authorized/mandatory, and what the types of the
321 corresponding fields are. There are two kind of record x-types:
322 </p>
323
324 <ul>
325 <li>
326 Closed record types, which only allow a finite number of fields:
327 <code>{ l1=t1 ... ln=tn }</code>;
328 </li>
329 <li>
330 Open record types, which allow additional fields (with arbitrary
331 type):
332 <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
333 in the syntax).
334 </li>
335 </ul>
336
337 <p>
338 In both cases, it is possible to make one of
339 the fields optional by changing = to =?.
340 </p>
341
342 <p>
343 The x-type of all x-record is thus <code>{ .. }</code>,
344 and the x-type of x-records with maybe a field <code>l</code>
345 of type <code>Int</code> and maybe arbitrary other fields is
346 <code>{ l=?Int .. }</code>.
347 </p>
348
349 </section>
350
351 <section title="Sequences">
352
353 <p>
354 X-sequences are finite and ordered collections of x-values.
355 The syntax for a sequence x-expression in
356 <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
357 by semi-colons as in OCaml list). Each item <code>ei</code>
358 can either be:
359 </p>
360 <ul>
361 <li>an x-expression;</li>
362 <li><code>!e</code> where <code>e</code> is an x-expression which
363 evaluates to a sequence (whose content is inserted in the sequence
364 which is currently defined); e.g.
365 <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
366 <code>[ 1 2 3 4 ]</code>;</li>
367 <li>a string literal delimited by simple quotes; e.g.
368 <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
369 </ul>
370
371 <p>
372 X-types for sequences are of the form <code>[R]</code>
373 where <code>R</code> is a regular expression over x-types which
374 describe the possible contents of the sequences. The possible
375 forms of regular expressions are:
376 </p>
377
378 <ul>
379 <li><code>t</code> (one single element of x-type <code>t</code>)</li>
380 <li><code>R*</code> (zero or more repetitions)</li>
381 <li><code>R+</code> (one or more repetitions)</li>
382 <li><code>R?</code> (zero or one repetition)</li>
383 <li><code>R1 R2</code> (sequence)</li>
384 <li><code>R1|R2</code> (alternation)</li>
385 <li><code>(R)</code></li>
386 <li><code>/t</code> (guard: the tail of the sequence must comply with
387 <code>t</code>).</li>
388 <li><code>PCDATA</code> (equivalent to Char*).</li>
389 </ul>
390
391 <note>sequence are actually encoded with embedded pairs and a
392 terminator, and sequences types are encoded with product types and
393 recursive types. The encoding is available to the programmer
394 but not described in this manual.
395 </note>
396
397 </section>
398
399 <section title="Strings">
400
401 <p>
402 Strings are nothing but sequences of characters. There are two
403 predefined types <code>String</code> and <code>Latin1</code>
404 (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
405 </p>
406
407 <p>
408 A string literal <code>[ '...' ]</code> can also be written
409 <code>"..." </code> (without the square brackets). Note that simple
410 (resp. double) quotes need to be escaped only when the string is
411 delimited with double (resp. simple) quotes.
412 </p>
413
414 </section>
415
416 <section title="XML elements">
417
418 <p>
419 An XML element is a triple of x-values. The syntax for
420 the corresponding x-expression constructor is
421 <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
422 qualified name literal, it is possible to omit the leading
423 backquote and the surrounding parentheses. Similarly,
424 when <code>e2</code> is an x-record literal, it is possible
425 to omit the curly braces and the parentheses. For instance,
426 one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
427 instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
428 </p>
429
430 <p>
431 XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
432 and the same simplifications applies. For instance, if
433 the namespace prefix <code>ns</code> has been defined,
434 the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
435 it describes XML elements whose tag is in the namespace bound to
436 <code>ns</code>, with an empty content, and with an arbitrary set of
437 attributes. An underscore in place of <code>(t1)</code> is
438 equivalent to <code>(Atom)</code> (any tag).
439 </p>
440
441 </section>
442
443 </box>
444
445 <box title="X-expressions" link="expr">
446
447 <p>
448 In the previous section, we have seen the syntax for x-values
449 constructors (constant literals, sequence, record, element constructors).
450 In this section, we describe the other kinds of x-expressions.
451 </p>
452
453 <section title="Binary infix operators">
454
455 <p>
456 The arithmetic operators on integers follow the usual precedence.
457 They are written <code>+,*,-,div,mod</code> (they are all infix).
458 </p>
459
460 <p>
461 Record concatenation: <code>e1 ++ e2</code>. The x-expressions
462 <code>e1</code> and <code>e2</code> must evaluate to x-records.
463 The result is obtained by concatening them. If a field with the same
464 label is present in both records, the right-most one is selected.
465 </p>
466
467 <p>
468 Sequence concatenation: <code>e1 @ e2</code>, equivalent
469 to <code>[!e1 !e2]</code>.
470 </p>
471
472 </section>
473
474 <section title="Projections, filtering">
475
476 <p>
477 If the x-expression <code>e</code> evaluates to a record or an XML
478 element, the construction <code>e.l</code> will extract the value of
479 field or attribute <code>l</code>. Similarly, the construction
480 <code>e.?l</code> will extract the value of field or attribute
481 <code>l</code> if present, and return the empty sequence
482 <code>[]</code> otherwise.
483 </p>
484
485 <p>
486 If the x-expression <code>e</code> evaluates to a record,
487 the construction <code>e -. l</code> will produce a new record
488 where the field <code>l</code> has been removed (if present).
489 </p>
490
491 <p>
492 If the x-expression <code>e</code> evaluates to an x-sequence,
493 the construction <code>e/</code> will result in a new x-sequence
494 obtained by taking in order all the children of the XML elements
495 from the sequence <code>e</code>. For instance, the x-expression
496 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
497 evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
498 </p>
499
500 <p>
501 If the x-expression <code>e</code> evaluates to an x-sequence,
502 the construction <code>e.(t)</code> (where <code>t</code> is an
503 x-type) will result in a new x-sequence
504 obtained by filtering <code>e</code> to keep only the elements
505 of type <code>t</code>. For instance, the x-expression
506 <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
507 evaluates to the x-value <code>[ 4 5 ]</code>.
508 </p>
509 </section>
510
511 <section title="Dynamic type checking">
512
513 <p>
514 If <code>e</code> is an x-expression and <code>t</code> is an x-type,
515 the construction <code>(e :? t)</code> returns the same
516 result as <code>e</code> if it has type <code>t</code>, and otherwise
517 raises a <code>Failure</code> exception whose argument explains
518 why this is not the case.
519 </p>
520
521 <sample><![CDATA[{{ON}}
522 # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
523 f {{ <a>[ 1 2 '3' ] }};;
524 Exception:
525 Failure
526 "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
527 ]]></sample>
528 </section>
529
530 <section title="Pattern matching">
531
532 <p>
533 OCamlDuce comes with a powerful pattern matching operation.
534 X-patterns are described <a href="#patterns">below</a>.
535 The syntax for the pattern matching operation is:
536 <code>match e with p1 -> e1 | ... | pn -> en</code>.
537 The type-system ensures exhaustivivity for the pattern matching
538 and infers precise types for the capture variables.
539 It is also possile to use x-pattern matching as a regular
540 OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
541 match e with {{p1}} -> e1 | ... | {{pn}} -> en
542 function {{p1}} -> e1 | ... | {{pn}} -> en
543 </p>
544
545 <p>
546 Pattern matching follows is first-match policy. The first pattern
547 that succeeds triggers the corresponding branch.
548 </p>
549
550 <note>
551 currently it is impossible to mix normal OCaml patterns and x-patterns
552 in a single pattern matching.
553 </note>
554
555 </section>
556
557 <section title="Local binding">
558
559 <p>
560 The x-expression <code>let p=e1 in e2</code> is equivalent to
561 <code>match e1 with p -> e2</code>. There is also an local binding
562 with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
563 e2</code>.
564 </p>
565
566 </section>
567
568
569 <section title="Iterators">
570
571 <p>
572 OCamlDuce comes with a sequence iterator
573 <code>map e with p1 -> e1 | ... | pn -> en</code> and
574 a tree iterator
575 <code>map* e with p1 -> e1 | ... | pn -> en</code>.
576 </p>
577
578 <p>
579 For both constructions, the argument must evaluate to a sequence.
580 The <code>map</code> iterator applies the patterns to each element
581 of this sequence in turns and produces a new sequence by concatenating
582 all the results (all the right-hand sides must thus produce a
583 sequence). The set of patterns must be exhaustive for all the possible
584 elements of the input sequence.
585 </p>
586
587 <p>
588 The tree iterator is similar except that the patterns need not be
589 exhaustive. If some element of the input sequence is not matched,
590 it is simply copied into the result unless it is an XML element. In
591 this case, the transformation is applied recursively to its content.
592 </p>
593
594 </section>
595
596 <section title="OCaml constructions">
597
598 <p>
599 As a convenience, some of the OCaml expression constructors
600 are allowed as x-expressions (without a need to go back to OCaml
601 with double curly braces): (unqualified) value identifiers and
602 function calls.
603 </p>
604
605 </section>
606
607 </box>
608
609 <box title="More on x-types" link="types">
610
611 <p>
612 We have seen how to write simple x-types. We can then combine
613 them with Boolean connectives:
614 </p>
615
616 <ul>
617 <li><code>t1 &amp; t2</code>: intersection;</li>
618 <li><code>t1 | t2</code>: union;</li>
619 <li><code>t1 - t2</code>: difference.</li>
620 </ul>
621
622 <p>
623 The empty x-type is written <code>Empty</code> (it contains no value),
624 and the universal x-type is written <code>Any</code> (it contains
625 all the x-values) or <code>_</code>.
626 </p>
627
628 <p>
629 When an x-type has been bound to some OCaml identifier
630 (<code>{{ON}}type t = {{...}}</code>), it is possible to use
631 this identifier in another x-type. Recursive definitions
632 are allowed:
633 </p>
634
635 <sample><![CDATA[{{ON}}
636 type t1 = {{ <a>[ t2* ] }}
637 and t2 = {{ <b>[ t1* ] }}
638 ]]></sample>
639
640 <p>
641 Note that x-values are always finite and acyclic. The type checker
642 detects type definition which would yield empty types:
643 </p>
644
645 <sample><![CDATA[{{ON}}
646 # type t = {{ <a>[ t+ ] }};;
647 This definition yields an empty type
648 ]]></sample>
649
650 <p>
651 If <code>t1</code> and <code>t2</code> are record x-types,
652 we can combine them with the infix <code>++</code> operator, which
653 mimics the corresponding operator on expressions (record
654 concatenation). Similarly, we can use the infix <code>@</code>
655 concatenation operator on sequence x-types.
656 </p>
657
658 </box>
659
660 <box title="X-patterns" link="patterns">
661
662 <p>
663 X-patterns follow the same syntax as X-types. In particular,
664 any X-type is a valid X-pattern. In addition to X-types constructors,
665 X-patterns can have:
666 </p>
667
668 <ul>
669 <li>capture variables (lowercase OCaml identifiers);</li>
670 <li>constant bindings <code>(x := c)</code> where x is a capture
671 variable and c is
672 a literal x-constant (this pattern always succeeds and returns the
673 binding x->c).</li>
674 </ul>
675
676 <p>
677 Here is a brief description of the semantics of patterns. Given
678 an input value, a pattern can either succeed or fail. If it succeeds,
679 it also produces a bindings from the capture variables in the pattern
680 to x-values.
681 </p>
682
683 <ul>
684
685 <li>A pattern which is just a type (no capture variable) succeeds if
686 and only if the value has the type.</li>
687
688 <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
689 or <code>p2</code> succeed, and returns the corresponding binding; if
690 both patterns succeeds, <code>p1</code> wins. It is required that
691 <code>p1</code> and <code>p2</code> have the same sets of capture
692 variables. </li>
693
694 <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
695 and <code>p2</code> succeed, and returns the concatenation of the two
696 bindings. It is required that <code>p1</code> and <code>p2</code> have
697 <em>disjoint</em> sets of capture variables. </li>
698
699 </ul>
700
701 <p>
702 In record x-patterns, it is possible to omit the <code>=p</code> part
703 of a field. The content is then replaced with the label name
704 considered as a capture variable (or as a previously defined type).
705 E.g. <code>{ x y=p }</code> is
706 equivalent to <code>{ x=x y=p }</code>.</p>
707
708 <p>It is also possible to add an "else" clause:
709 <code>{ x = (a,_)|(a:=3) }</code>
710 will accept any record with atmost the field <code>x</code>. If the content
711 is a pair, the capture variable a will be bound to its component;
712 otherwise, it is set to <code>3</code>.</p>
713
714 <p>
715 In regular expressions, it is possible to extract whole subsequences
716 with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
717 </p>
718
719 <p>
720 If the same sequence capture variable appears several times (or below a
721 repetition) in a regexp, it is bound to the concatenation of all
722 matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
723 collect in <code>x</code> all the elements of type <code>Int</code> from
724 a sequence. It is not legal to have repeated simple capture variables.
725 </p>
726
727 <p>
728 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
729 as possible). They admit non-greedy variants <code>+?,*?,??</code>.
730 </p>
731 </box>
732
733 <box title="Namespace bindings" link="ns">
734
735 <p>
736 The binding of namespace prefixes to URIs
737 can be done either by toplevel phrases (structure items) or
738 by local declarations:
739 </p>
740
741 <sample>{{ON}}
742 # {{ namespace ns = "http://..." }};;
743 # let x = {{ `ns: x }};;
744 val x : {{`ns:x}} = {{`ns:x}}
745 # let x = {{ let namespace ns = "http://..." in `ns:x }};;
746 val x : {{`ns:x}} = {{`ns:x}}
747 </sample>
748
749 <p>The toplevel definitions can also appear in module interfaces
750 (signatures). A toplevel prefix binding is not exported by a module: its scope
751 is limited to the current structure or signature. It is possible
752 to specify a default namespace, and to reset it:
753 </p>
754
755 <sample>{{ON}}
756 # {{ namespace "http://..." }};;
757 # {{ `x }};;
758 - : {{`ns1:x}} = {{`ns1:x}}
759 # {{ namespace "" }};;
760 # {{ `x }};;
761 - : {{`x}} = {{`x}}
762 </sample>
763
764 <p>
765 Note that the value pretty-printer invented some prefix
766 for the namespace URI. The default prefix declaration also have a
767 local form <code> let namespace "..." in ... </code>.
768 </p>
769
770 </box>
771
772 <box title="More on type-checking" link="typecheck">
773
774 <section title="Type inference">
775
776 <p>
777 As we said above, the programmer is sometimes required to provide type
778 annotations. To know where to put these annotation, it is necessary to
779 get a basic understanding of how type-checking works.
780 </p>
781
782 <p>
783 The OCaml type-checker is run first to detect which sub-expressions
784 are of the x-kind. A second ML type-checking pass is then done to
785 introduce subsumption (implicit subtyping) steps where allowed. After
786 these two passes, the OCamlDuce type checker obtains a data-flow summary of
787 x-values in the whole compilation unit. This is a directed graph,
788 whose edges represent either simple data-flow or complex operation
789 on x-values. The nodes of the graph can be thought as x-type
790 variables. A data-flow edge corresponds to a subtyping constraints,
791 and an operation edge corresponds to a symbolic constraints which
792 mimics the corresponding operation on values.
793 </p>
794
795 <p>
796 Some of the nodes are given an explicit type by the programmer,
797 through type annotations (on expressions or function arguments)
798 or the other usual mechanism in ML (data type declarations,
799 signatures, ...).
800 </p>
801
802 <p>
803 Also, if there is a loop with only subtyping edges in the graph,
804 all the nodes on the loop are merged together.
805 </p>
806
807 <p>
808 After this operation, the graph is required to be acyclic (assuming
809 that the nodes with an explicit type are removed from the graph). It
810 is the responsibility of the programmer to provide enough type
811 annotation to achieve this property. Otherwise, a type error
812 is issued.
813 </p>
814
815 <sample><![CDATA[{{ON}}
816 # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
817 Cycle detected: cannot type-check
818 # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
819 val f : int -> {{String}} = <fun>]]>
820 </sample>
821
822 <p>
823 In the example above, there is a cycle between the result type for
824 <code>f</code> and the type for the sub-expression <code>{{ON}}f
825 {{n-1}}</code>. It is here broken with a type annotation on the result; it could
826 have been broken by a type annotation on the expression <code>{{ON}}f
827 {{n-1}}</code>, or on the function <code>f</code> itself, or by a
828 module signature.
829 </p>
830
831 <p>
832 Let us study another simple example:
833 </p>
834
835 <sample>{{ON}}
836 # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
837 - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
838 </sample>
839
840 <p>
841 The type-checkers detects that the two x-values <code>2</code> and
842 <code>3</code> can flow to the argument of <code>f</code>. Its body
843 is thus type-checked with the assumption that <code>x</code> has type
844 <code>2--3</code>. The computed result type is then <code>3--4</code>.
845 </p>
846
847
848 <p>
849 The type-inference process described above is global by nature. The
850 acyclicity condition is only imposed after a whole compilation unit
851 has been type-checked by OCaml (and the information from the module
852 interface as been integrated). When a type variable is inferred to
853 be of the x-kind, it is never generalized. As a consequence, there
854 is no parametric polymorphism on x-types.
855 </p>
856
857 <p>
858 In the toplevel, type-checking is done after each phrase. Consider
859 the following session:
860 </p>
861
862 <sample><![CDATA[{{ON}}
863 # let f x = {{ x + 1 }};;
864 val f : {{Empty}} -> {{Empty}} = <fun>
865 # let a = f {{ 2 }};;
866 Subtyping failed 2 <= Empty
867 Sample:
868 2
869 ]]></sample>
870
871 <p>
872 The function <code>f</code> is inferred to have type
873 <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
874 phrase is type-checked, the data-flow graph says that no value
875 can flow to <code>x</code>, and thus the input type is empty
876 (and similarly for the result type). If the two phrases
877 were type-checked together (which would be the case it they had
878 been compiled by the compiler, not in the toplevel), the type checker
879 would have correctly inferred that the input type for <code>f</code>
880 must contain <code>2</code>.
881 </p>
882
883 </section>
884
885 <section title="Implicit subtyping">
886
887 <p>
888 Coercion from an x-type to a super type is automatic in OCamlDuce.
889 However, this automatic subsumption does not carry over to OCaml
890 type constructor, even if there are covariant. Consider:
891 </p>
892
893 <sample><![CDATA[{{ON}}
894 # let f (x : {{ Int }} * {{ Int }}) = 1;;
895 val f : {{Int}} * {{Int}} -> int = <fun>
896 # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
897 This expression has type {{0}} * {{0}} but is here used with type
898 {{Int}} * {{Int}}
899 # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
900 val g : {{0}} * {{0}} -> int = <fun>
901 # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
902 val g : {{0}} * {{0}} -> int = <fun>
903 ]]></sample>
904
905 <p>
906 The first attempt to define <code>g</code> fails because the type for
907 <code>x</code> is not an x-type and thus subsumption does not
908 apply. In the second attempt, we extract the two components of the
909 pair; since they are inferred to be x-values, subtyping applies to
910 both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
911 it is legal to unify its type with the input type of <code>f</code>.
912 The third definition for <code>g</code> gives an alternative solution:
913 using explicit OCaml type coercions.
914 </p>
915
916 </section>
917
918 </box>
919
920 <box title="Exchanging values" link="transl">
921
922 <p>
923 OCamlDuce strongly seperates regular OCaml values from the new
924 x-values. They have different syntax, expressions, types, patterns,
925 and even type-checking algorithms. This strong segregation is key point
926 which allowed a simple integration between very different type
927 systems.
928 </p>
929
930 <p>
931 At some point, it is still necessary to cross the frontier and
932 translate OCaml values to x-values or the opposite.
933 </p>
934
935 <p>
936 Fortunately, OCamlDuce provides automatic translations in both
937 directions. Instead of double curly braces, you can
938 enclose x-expressions in curly brace+colon <code>{: ... :}</code>
939 (here, the <code>...</code> is an x-expression).
940 The effect is to translate the result of the x-expression
941 (which must be an x-value) to an OCaml value. Similarly,
942 in an x-expression, you can obtain the x-translation of
943 an OCaml value with the same syntax <code>{: ... :}</code>
944 (here, the <code>...</code> is an OCaml expression).
945 </p>
946
947 <p>
948 Here is how the translation works. To each OCaml type <code>t</code>,
949 we associate an x-type <code>T(t)</code> and a pair of translation
950 function between <code>t</code> and <code>T(t)</code>.
951 Actually, not all the features are supported. For instance,
952 free type variables, abstract types, object types, non-regular
953 recursive types cannot be translated. In particular, since
954 type variables are not allowed, the OCaml type must be fully known.
955 </p>
956
957 <p>
958 The translation for an OCaml type <code>t</code> is defined by structural
959 induction on <code>t</code>. Sum types are
960 translated to union types: a constant constructor <code>A</code> is
961 translated to the qualified name <code>`A</code>; a non-constant
962 constructor <code>A of t1 * ... * tn</code> is translated to
963 <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
964 have the same translation. Record types are translated to closed
965 record x-types. Some other translations:
966 </p>
967
968 <table border="1">
969 <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
970 <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
971 <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
972 <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
973 <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
974 <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
975 <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
976 <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
977 <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
978 <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
979 </table>
980
981 <p>
982 Here is an example:
983 </p>
984
985 <sample>{{ON}}
986 # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
987 - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
988 </sample>
989
990 <p>
991 In this example, the result type of the translation is inferred
992 to be <code>{{ON}}{{ Int }} list</code> (because the type for
993 <code>f</code> is given). The corresponding x-type
994 is <code>{{ON}}{{ [Int*] }}</code>.
995 </p>
996
997 </box>
998
999 <box title="The standard library" link="stdlib">
1000
1001 <p>
1002 In OCamlDuce, the Num library from OCaml is included in the standard
1003 library. In addition, there are two new module called
1004 <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
1005 </p>
1006
1007 <p>
1008 The module <code>Cduce_types</code> gives access to the internal
1009 representation of x-values. It is currently undocumented.
1010 </p>
1011
1012 <p>
1013 The module <code>Ocamlduce</code> provides several useful
1014 functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
1015 documentation for a description of its interface.
1016 </p>
1017
1018 </box>
1019
1020 <box title="Marshaling" link="marshal">
1021
1022 <p>
1023 OCamlDuce use some tricks on its internal representation of x-values
1024 to reduce memory usage and improve performance. You need to pay
1025 special attention if you want to use OCaml serialization functions
1026 (module <code>Marshal</code>, functions
1027 <code>input_value/output_value</code>) on x-values. In addition to
1028 your values, you also need to save and restore some piece of internal data
1029 using the functions <code>Cduce_types.Value.extract_all</code> and
1030 <code>Cduce_types.Value.intract_all</code>. Of course, this also
1031 applies if the value to be serialized contains deeply nested x-values.
1032 </p>
1033
1034 <p>
1035 Here are generic
1036 serialization/deserializations functions that illustrate how to do it:
1037 </p>
1038
1039 <sample>
1040 let my_output_value oc v =
1041 let p = Cduce_types.Value.extract_all () in
1042 output_value oc (p,v)
1043
1044 let my_input_value ic =
1045 let (p,v) = input_value ic in
1046 Cduce_types.Value.intract_all p;
1047 v
1048 </sample>
1049
1050 </box>
1051
1052 <box title="Performance" link="perf">
1053
1054 <section title="Strings">
1055
1056 <p>
1057 OCaml users might be surprised by the fact that x-strings are simply
1058 represented as sequences in OCamlDuce. Does this mean that they are
1059 actually stored in memory as linked list? Certainly not! The internal
1060 representation of sequence values uses several tricks to improve
1061 performance and memory usage. In particular, a special form in the
1062 representation can store strings as byte buffers, as in OCaml.
1063 It an XML document is loaded, or if a Caml string is converted
1064 to an x-value, this compact representation will be used.
1065 </p>
1066
1067 </section>
1068
1069 <section title="Concatenation">
1070
1071 <p>
1072 Similarly, OCaml users might be relectutant to use the sequence
1073 concatenation <code>@</code> on sequences. In OCaml, the complexity
1074 of this operator is linear in the size of its first argument (which
1075 need to be copied). OCamlDuce use a special form in its internal
1076 representation to store concatenation in a lazy way. The concatenation
1077 will really by computed only when the value is accessed. This means
1078 that it's perfectly ok to build a long sequence by adding
1079 new elements at the end one by one, as long as you don't
1080 simultaneously inspect the sequence.
1081 </p>
1082
1083 </section>
1084
1085 <section title="Pattern matching">
1086
1087 <p>
1088 Another point which is worth knowing when programming in OCamlDuce
1089 is that patterns can be written in a declarative style without
1090 affective performance. The compiler uses static type information
1091 about matched values to produce efficient code for pattern matching.
1092 To illustrate this, consider the following sample:
1093 </p>
1094
1095 <sample><![CDATA[{{ON}}
1096 x.ml:
1097
1098 type a = {{ <a>[ a* ] }}
1099 type b = {{ <b>[ b* ] }}
1100
1101 let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1102 ]]></sample>
1103
1104 <sample><![CDATA[{{ON}}
1105 y.ml:
1106
1107 type a = {{ <a>[ a* ] }}
1108 type b = {{ <b>[ b* ] }}
1109
1110 let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1111 ]]></sample>
1112
1113 <p>
1114 The two functions have exactly the same semantics, but the first
1115 implementation is more declarative: it uses type checks to distinguish
1116 between <code>a</code> and <code>b</code> instead of saying
1117 <em>how</em> to distinguish between these two types. Imagine
1118 that the definition of these types change to:
1119 </p>
1120
1121 <sample><![CDATA[{{ON}}
1122 type a = {{ <x kind="a">[ a* ] }}
1123 type b = {{ <x kind="b">[ b* ] }}
1124 ]]></sample>
1125
1126 <p>
1127 Then the first implementation still works as expected, but the
1128 second one needs to be rewritten.</p>
1129
1130 <p>Now one might believe that the second implementation is more
1131 efficient because it tells the compiler to check only the root tag,
1132 whereas the first implementation would force
1133 the compiler to produce code to check that all tags in the tree
1134 are <code>a</code>s. But this is not what happens! Actually,
1135 you can check that the compiler will produce exactly the same code
1136 for both implementations. It considers the static type information
1137 about the argument of the pattern matching (here, the input type
1138 of the function), and computes an efficient way to evaluate
1139 patterns for the values of this type.
1140 </p>
1141
1142 </section>
1143
1144 <section title="The map iterator">
1145
1146 <p>
1147 The <code>map ... with ...</code> iterator is implemented in a
1148 tail-recursive way. You can safely use it on very long sequences.
1149 </p>
1150
1151 </section>
1152
1153 </box>
1154
1155 <box title="OCaml and OCamlDuce" link="ocaml">
1156
1157 <p>
1158 Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1159 OCaml release. This means that OCamlDuce can use OCaml-generated
1160 <tt>.cmi</tt> files and that it produces an OCaml-compatible
1161 <tt>.cmi</tt> file if the interface does not use any x-type
1162 (this file is equal to what would have been obtained by using OCaml).
1163 </p>
1164
1165 <p>
1166 It is thus possible to use existing libraries which were compiled for
1167 OCaml 3.08.4. It is also possible to use OCamlDuce to compile
1168 some modules and use them in an OCaml project provided their interface
1169 is pure OCaml.
1170 </p>
1171
1172
1173 </box>
1174
1175 <box title="Code samples" link="code">
1176
1177 <section title="Parsing XML files">
1178
1179 <p>
1180 OCamlDuce does not come with any built-in XML parser. However,
1181 the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
1182 makes it easy to plug existing XML parsers. Here is some
1183 code which demonstrate how to do that with three of
1184 the most popular OCaml XML parser libraries:
1185 </p>
1186
1187 <ul>
1188 <li><a
1189 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
1190 <li><a
1191 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
1192 <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
1193 </ul>
1194
1195 </section>
1196
1197 <section title="Converting DTD to OCamlDuce types">
1198
1199 <p>
1200 This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
1201 from a DTD. It requires PXP.
1202 </p>
1203
1204 <note>This application does not use any of the new features, but it
1205 can be useful in the development of OCamlDuce applications.
1206 </note>
1207
1208 </section>
1209
1210 <section title="Parsing XML Schema, producing valid XHTML output">
1211
1212 <p>
1213 This <a
1214 href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
1215 parses XML Schema Definitions (.xsd files), and produces summaries
1216 (toplevel declaration names) in XHTML. OCamlDuce type system ensures
1217 that the parser is coherent with the input XML type (any valid XML
1218 Schema is accepted) and that the printer is coherent with the output
1219 XML type (it is necessarily a valid XHTML document).
1220 </p>
1221
1222 <p>
1223 Of course, for such a simple transformation, parsing the XML document
1224 into an internal representation is not necessary. A direct XML-to-XML
1225 transformation would be easy to write. We wanted to illustrate
1226 a complex parsing of XML.
1227 </p>
1228
1229 <p>
1230 It it interesting to introduce errors in the parser
1231 <code>schema_loader.ml</code> or the printer
1232 <code>dump_schema.ml</code> and see how the type system catches them.
1233 </p>
1234
1235 <note>
1236 The application uses XML Light to parse XML document.
1237 </note>
1238
1239 <note>
1240 Some features of XML Schema are not parsed, such as
1241 <code>redefine</code> elements or substitution groups.
1242 </note>
1243
1244 </section>
1245
1246 <section title="String regular expressions">
1247
1248 <p>
1249 OCamlDuce supports regular expression types and patterns, not only
1250 for sequences of XML elements, but also for strings. The following
1251 example shows how to use regular expressions to split a string
1252 of the form <code>name1=val1,...,namen=valn</code> with
1253 <code>n>0</code> into
1254 a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>.
1255 The <code>*?</code> operator in regular expressions means ``ungreedy
1256 match'' (match the shortest possible subsequence). The last
1257 pattern describes precisely strings which are not matched by
1258 the other cases. It would be possible to replace it with
1259 the wildcard <code>_</code>.
1260 </p>
1261
1262 <sample><![CDATA[{{ON}}
1263 let rec split (s : {{ String }}) =
1264 match s with
1265 | {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest)
1266 | {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ]
1267 | {{ Any - [ _* '=' _* ] }} -> failwith "split"
1268 ]]></sample>
1269
1270 </section>
1271
1272 </box>
1273
1274 <box title="Applications in OCamlDuce" link="appli">
1275
1276 <ul>
1277 <li><a
1278 href="http://anil.recoil.org/projects/review2atom.html">Review2Atom</a>
1279 by Anil Madhavapeddy: translates paper review files in XML format into
1280 an Atom feed suitable for aggregation.
1281 </li>
1282 </ul>
1283
1284 </box>
1285
1286
1287 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5