/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1877 - (hide annotations)
Tue Jul 10 19:28:43 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 43377 byte(s)
[r2006-05-10 22:04:12 by afrisch] Empty log message

Original author: afrisch
Date: 2006-05-10 22:04:12+00:00
1 abate 1634 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2     <page name="ocaml">
3    
4 abate 1787 <title>OCamlDuce</title>
5 abate 1634
6     <left>
7     <local-links href="index,documentation"/>
8 abate 1787 <p>On this page:</p>
9     <boxes-toc/>
10 abate 1634 </left>
11    
12     <box>
13    
14     <p>
15 abate 1787 OCamlDuce is a merger between <a
16     href="http://caml.inria.fr/">OCaml</a> and
17     <local href="index">CDuce</local>. It comes as a modified
18 abate 1832 version of OCaml which integrates CDuce features: XML expressions,
19     regular expression types and patterns, iterators.
20 abate 1634 </p>
21    
22 abate 1790 <p>
23 abate 1832 OCamlDuce is distributed under the Q Public License version 1.0.
24 abate 1790 </p>
25    
26 abate 1832 <ul>
27     <li>A <a
28 abate 1823 href="http://cristal.inria.fr/~frisch/ocamlcduce/ocamlduce.pdf">technical
29 abate 1832 report</a> describes the theory behind OCamlDuce's type system (to be
30     presented in PLAN-X 2006).</li>
31     <li><local href="ocaml_install">How to get OCamlDuce:</local> download,
32     installation instructions, packages.</li>
33     <li><local href="ocaml_manual">User's manual</local>.</li>
34     <li><local href="ocaml_code">Code samples and
35     applications</local>.</li>
36     <li><local href="mailing">Mailing lists</local>.</li>
37     </ul>
38 abate 1815
39 abate 1787 </box>
40    
41 abate 1832 <page name="ocaml_install">
42     <title>Getting OCamlDuce</title>
43    
44 abate 1787 <box title="Download and installation" link="install">
45    
46 abate 1634 <p>
47 abate 1800 Currently, OCamlDuce
48 abate 1877 is based on OCaml 3.09.2 and CDuce 0.4.0.
49 abate 1634 </p>
50    
51     <ul>
52     <li><a
53 abate 1811 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl5.tar.gz">Compiler,
54 abate 1821 version 3.08.4, patch level 5</a> (to be used with OCaml 3.08.4)</li>
55     <li><a
56 abate 1840 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.1pl1.tar.gz">Compiler,
57 abate 1831 version 3.09.1</a> (to be used with OCaml 3.09.1)</li>
58 abate 1876 <li><a
59     href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.2.tar.gz">Compiler,
60     version 3.09.2</a> (to be used with OCaml 3.09.2)</li>
61 abate 1634 </ul>
62    
63     <p>
64 abate 1876 The following describes the installation procedure for the
65     3.09.2 release.
66     OCamlDuce is installed on top of an existing OCaml
67     installation (whose version number must match) and it requires
68     a recent version of findlib. The build procedure
69 abate 1821 is: <tt>make all &amp;&amp; make opt &amp;&amp; make
70     install</tt>. The configuration is taken from OCaml's
71     <tt>Makefile.config</tt>.
72 abate 1800 </p>
73    
74 abate 1821 <p>
75     The tools are named <tt>ocamlduce, ocamlducec, ocamlduceopt,
76 abate 1876 ocamlducedep, ocamlducemktop, ocamlducemktop, ocamlducefind</tt>.
77     They are installed in the same directory as the ocaml compiler itself.
78 abate 1821 </p>
79 abate 1800
80 abate 1821 <p>
81 abate 1876 In addition, a library called <tt>ocamlduce.cma/.cmxa</tt> is built.
82     It depends on the <tt>nums</tt> library. A findlib package named
83     <tt>ocamlduce</tt> is created by the <tt>make install</tt> target.
84     Normally, you don't need to care about the package except if you
85     insist to link your modules with the regular OCaml compilers (not
86     OCamlDuce), but there is no good reason to do so.
87 abate 1821 </p>
88 abate 1800
89 abate 1840 <p>
90     To generate the ocamldoc documentation for the <tt>Ocamlduce</tt>
91     module: <tt>make htdoc</tt>.
92     </p>
93    
94 abate 1876 <section title="Compiling, linking, calling the toplevel">
95    
96     <p>Starting from OCamlDuce 3.09.2, you don't need to struggle with
97     extra command-line options. You must simply use the OCamlDuce tools:</p>
98    
99     <sample>
100     {{Call the toplevel:}} ocamlduce
101     {{Compile:}} ocamlducec -c x.ml
102     {{Link:}} ocamlducec -o x x.cmo
103     {{Use ocamlfind:}} ocamlducefind ocamlc -o -linkpkg -package pcre x.ml
104     </sample>
105    
106     </section>
107    
108    
109 abate 1823 <section title="Building from the CVS">
110    
111     <p>
112     The following commands will extract the current development version of
113     OCamlDuce (from OCaml and CDuce CVS repositories):
114     </p>
115    
116     <sample>
117     cvs -f -d ":pserver:anoncvs@camlcvs.inria.fr:/caml" co -r cducetrunk ocaml
118     cvs -f -d ":pserver:anonymous@cvs.cduce.org:/cvsroot" co cduce
119     (cd ocaml/cduce; make link)
120     </sample>
121    
122     </section>
123    
124 abate 1808 </box>
125    
126     <box title="Ports and packages" link="ports">
127    
128     <section title="GODI">
129 abate 1821
130 abate 1800 <p>
131 abate 1821 There is a <tt>godi-ocamlduce</tt> package available in GODI
132 abate 1877 (sections 3.08 and 3.09).
133 abate 1634 </p>
134 abate 1821
135 abate 1808 </section>
136 abate 1634
137 abate 1808 <section title="DarwinPorts and OpenBSD">
138    
139     <p>
140     Anil Madhavapeddy contributed two ports of OCamlDuce for DarwinPorts
141     (in dports/lang/ocamlduce) and for OpenBSD (in ports/lang/ocamlduce).
142     </p>
143    
144     </section>
145    
146 abate 1634 </box>
147    
148 abate 1832 </page>
149    
150     <page name="ocaml_manual">
151     <title>OCamlDuce: manual</title>
152    
153 abate 1787 <box title="Overview" link="overview">
154    
155     <p>
156 abate 1791 The goal of the OCamlDuce project is to extend the OCaml language with features
157     to make it easier to write safe and efficient complex applications
158     that need to deal with XML documents. In particular, it relies
159     on a notion of types and patterns to guarantee statically
160     that all the possible input documents are correctly processed, and
161     that only valid output documents are produced.
162     </p>
163    
164     <p>
165 abate 1788 In a nutshell, OCamlDuce extends OCaml with a new kind of values
166     (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
167     strings. In order to describe these values, it also extends the type algebra
168 abate 1787 with so-called <em>x-types</em>. The philosophy behind these types is that they
169     represent <em>set of x-values</em>. They can be very precise: indeed,
170     each value can be seen as a singleton type (a set with a single
171     value), and it is possible to form Boolean combinations of x-types
172     (intersection, union, difference).
173     </p>
174    
175     <p>
176     OCamlDuce's type system can be understood as a refinement of OCaml.
177     For each sub-expression which is inferred to be of the x-kind (using
178     OCaml unification based type-system), OCamlDuce will try to infer to
179     best possible sound x-type. Here, best means smallest for the natural
180     subtyping relation (set inclusion). The inference algorithm is
181     actually a data-flow analysis: the x-type will collect all the values
182     that can be produced by the expression, considering all the possible
183     data-flow in the program. It it sometimes necessary to provide
184     explicit type annotations to help the type checker infer this type, in
185     particular when you define recursive functions or when you use
186     iterators.
187     </p>
188    
189     <p>
190     Subtyping is implicit for x-types: if an expression is inferred to be
191     of x-type <code>t</code>, which is a subtype of <code>s</code>, then
192     it is possible to use this expression in any context which expects a
193     value of type <code>s</code>.
194     </p>
195    
196     </box>
197    
198     <box title="Getting started" link="start">
199    
200     <p>
201     Most of the new language features are enclosed within double curly braces
202     <code>{{ON}}{{...}}</code>. For instance, the following code sample
203     defines a value <code>x</code> as an XML element (with tag
204     <code>a</code>, an attribute <code>href</code>, and a simple
205     string as content):
206     </p>
207    
208     <sample><![CDATA[{{ON}}
209     # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
210     val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
211     {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
212     ]]></sample>
213    
214     <p>
215     What appears between the curly braces is called an x-expression.
216     Similarly, there are x-types (as seen above), and also x-patterns.
217     The delimiters <code>{{ON}}{{...}}</code> are only used
218     for syntactical reasons, to avoid clashed between OCaml and CDuce
219     syntaxes and lexical conventions. As a matter of fact,
220     an OCaml expression need not be a syntactical x-expression
221     (delimited by double curly braces) to evaluate to an x-value.
222     For instance, once <code>x</code> has been declared as above,
223     the expression <code>x</code> evaluates to an x-value.
224     </p>
225    
226    
227     <p>
228     It is possible to use an arbitrary
229     OCaml expression as part of an x-expression: it must simply be
230     protected by a new pair of double curly braces. For instance, there is
231     no <code>if-then-else</code> construction for x-expressions, but you
232     can write:
233     </p>
234    
235     <sample><![CDATA[{{ON}}
236     # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
237     - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
238     ]]></sample>
239    
240     <p>
241     Only the highlighted parts are parsed as x-expressions. The
242     <code>if-then-else</code> sub-expression is parsed as an OCaml
243     expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
244     'z' ]}}</code>).
245     </p>
246    
247     </box>
248    
249     <box title="X-values" link="values">
250    
251     <p>
252     X-values are intended to represent XML documents and fragments
253     thereof: elements, tags, text, sequences. In this section, we
254     present the x-value algebra, the syntax of the corresponding
255     x-expression constructors and the associated x-types.
256     </p>
257    
258     <p>
259     There are three kinds of atomic kind of x-values:
260     </p>
261     <ul>
262     <li>Unicode characters;</li>
263     <li>qualified names;</li>
264     <li>arbitrarily large integers.</li>
265     </ul>
266    
267     <section title="Characters">
268    
269     <p>
270     X-characters are different from OCaml characters. They can represent
271     the range of Unicode codepoints defined in the XML specification.
272     Character literals are delimited by single quotes. The escape
273     sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
274     numerical escape sequence are written <code>\n;</code> where n is an integer
275     literal (note the extra semi-colon). The source code is interpreted as
276     being encoded in iso-8859-1. As a consequence, Unicode characters which are not
277     part of the Latin1 character set must be introduced with this
278     numerical escape mechanism. The x-types for x-characters are:
279     </p>
280     <ul>
281     <li>singletons;</li>
282     <li>intervals, written <code>c -- d</code>, where <code>c</code> and
283     <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
284     }}</code>);</li>
285     <li>the type of all x-characters, written <code>Char</code>;</li>
286     <li>the type of all Latin1 characters, written <code>Latin1Char</code>
287     (defined as <code>\0; -- \255;</code>).</li>
288     </ul>
289    
290     </section>
291    
292     <section title="Integers">
293    
294     <p>
295     X-integers are arbitrarily large. Literals must be written in decimal.
296     Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
297     The x-types for x-integers are:
298     </p>
299     <ul>
300     <li>singletons;</li>
301     <li>intervals, written <code>i -- j</code>, where <code>i</code> and
302     <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
303     }}</code>); it is possible to replace <code>i</code> or <code>j</code>
304     with <code>**</code> to define open-ended intervals, e.g.
305     <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
306     </li>
307     <li>the type of all x-integers, written <code>Int</code>;</li>
308     <li>the type of all the integers which can be represented by a
309     signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
310     <code>Int64</code>).</li>
311     </ul>
312    
313     </section>
314    
315     <section title="Qualified names">
316    
317     <p>
318     Qualified names are intended to represent XML tag names. Conceptually,
319     they are made of a namespace URI and a local name. Since URIs tends
320     to be long, literals are of the form <code>`prefix:local</code>
321     where <code>local</code> is the local name and <code>prefix</code>
322     is an <em>namespace prefix</em> bound to some URI (in the scope of the
323     literal). The local name follows the definitions from
324     the XML Namespaces specification; a dot character must be protected
325     by a backslash and non-Latin1 characters are written as character
326     literals <code>\n;</code>. <a href="#ns">See below</a> for a
327     explanation on how to bind prefixes to URIs. To refer
328     to the default namespace (or the absence of namespace if not default
329     has been defined), the syntax is simply <code>`local</code>.
330     The x-types for qualified names are:
331     </p>
332     <ul>
333     <li>singletons;</li>
334     <li>the type of all qualified names, written <code>Atom</code>;</li>
335     <li>the type of all qualified names from a specified namespace,
336     written <code>`ns:*</code>.</li>
337     </ul>
338     </section>
339    
340     <section title="Records">
341    
342     <p>
343     X-records are mainly used to represent the set of attributes of an XML
344     element. An x-record is a binding from a finite set of <em>labels</em>
345     to x-values. Labels follows the same syntax as for qualified names
346     without the leading backquote. However, if the namespace prefix is not
347     given, the default namespace does not apply (the namespace URI is
348     empty). The syntax for record x-expressions is <code> { l1=e1
349     ... ln=en }</code> where the <code>li</code> are labels and the
350     <code>ei</code> are x-expressions. Fields can also be separated with a
351     semi-colon. It is legal to omit the expression for a field; the label is then
352     taken as the content of the field (a value with this name must be
353     defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
354     in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
355     y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
356     which labels are authorized/mandatory, and what the types of the
357     corresponding fields are. There are two kind of record x-types:
358     </p>
359    
360     <ul>
361     <li>
362     Closed record types, which only allow a finite number of fields:
363     <code>{ l1=t1 ... ln=tn }</code>;
364     </li>
365     <li>
366     Open record types, which allow additional fields (with arbitrary
367     type):
368     <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
369     in the syntax).
370     </li>
371     </ul>
372    
373     <p>
374     In both cases, it is possible to make one of
375     the fields optional by changing = to =?.
376     </p>
377    
378     <p>
379     The x-type of all x-record is thus <code>{ .. }</code>,
380     and the x-type of x-records with maybe a field <code>l</code>
381     of type <code>Int</code> and maybe arbitrary other fields is
382     <code>{ l=?Int .. }</code>.
383     </p>
384    
385     </section>
386    
387     <section title="Sequences">
388    
389     <p>
390     X-sequences are finite and ordered collections of x-values.
391     The syntax for a sequence x-expression in
392     <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
393     by semi-colons as in OCaml list). Each item <code>ei</code>
394     can either be:
395     </p>
396     <ul>
397     <li>an x-expression;</li>
398     <li><code>!e</code> where <code>e</code> is an x-expression which
399     evaluates to a sequence (whose content is inserted in the sequence
400     which is currently defined); e.g.
401     <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
402     <code>[ 1 2 3 4 ]</code>;</li>
403     <li>a string literal delimited by simple quotes; e.g.
404     <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
405     </ul>
406    
407     <p>
408     X-types for sequences are of the form <code>[R]</code>
409     where <code>R</code> is a regular expression over x-types which
410     describe the possible contents of the sequences. The possible
411     forms of regular expressions are:
412     </p>
413    
414     <ul>
415     <li><code>t</code> (one single element of x-type <code>t</code>)</li>
416     <li><code>R*</code> (zero or more repetitions)</li>
417     <li><code>R+</code> (one or more repetitions)</li>
418     <li><code>R?</code> (zero or one repetition)</li>
419     <li><code>R1 R2</code> (sequence)</li>
420     <li><code>R1|R2</code> (alternation)</li>
421     <li><code>(R)</code></li>
422     <li><code>/t</code> (guard: the tail of the sequence must comply with
423     <code>t</code>).</li>
424     <li><code>PCDATA</code> (equivalent to Char*).</li>
425     </ul>
426    
427     <note>sequence are actually encoded with embedded pairs and a
428     terminator, and sequences types are encoded with product types and
429     recursive types. The encoding is available to the programmer
430     but not described in this manual.
431     </note>
432    
433     </section>
434    
435     <section title="Strings">
436    
437     <p>
438     Strings are nothing but sequences of characters. There are two
439     predefined types <code>String</code> and <code>Latin1</code>
440     (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
441     </p>
442    
443     <p>
444     A string literal <code>[ '...' ]</code> can also be written
445     <code>"..." </code> (without the square brackets). Note that simple
446     (resp. double) quotes need to be escaped only when the string is
447     delimited with double (resp. simple) quotes.
448     </p>
449    
450     </section>
451    
452     <section title="XML elements">
453    
454     <p>
455     An XML element is a triple of x-values. The syntax for
456     the corresponding x-expression constructor is
457     <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
458     qualified name literal, it is possible to omit the leading
459     backquote and the surrounding parentheses. Similarly,
460     when <code>e2</code> is an x-record literal, it is possible
461     to omit the curly braces and the parentheses. For instance,
462     one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
463     instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
464     </p>
465    
466     <p>
467     XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
468     and the same simplifications applies. For instance, if
469     the namespace prefix <code>ns</code> has been defined,
470     the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
471     it describes XML elements whose tag is in the namespace bound to
472     <code>ns</code>, with an empty content, and with an arbitrary set of
473     attributes. An underscore in place of <code>(t1)</code> is
474     equivalent to <code>(Atom)</code> (any tag).
475     </p>
476    
477     </section>
478    
479     </box>
480    
481     <box title="X-expressions" link="expr">
482    
483     <p>
484     In the previous section, we have seen the syntax for x-values
485     constructors (constant literals, sequence, record, element constructors).
486     In this section, we describe the other kinds of x-expressions.
487     </p>
488    
489     <section title="Binary infix operators">
490    
491     <p>
492     The arithmetic operators on integers follow the usual precedence.
493     They are written <code>+,*,-,div,mod</code> (they are all infix).
494     </p>
495    
496     <p>
497     Record concatenation: <code>e1 ++ e2</code>. The x-expressions
498     <code>e1</code> and <code>e2</code> must evaluate to x-records.
499     The result is obtained by concatening them. If a field with the same
500     label is present in both records, the right-most one is selected.
501     </p>
502    
503     <p>
504     Sequence concatenation: <code>e1 @ e2</code>, equivalent
505     to <code>[!e1 !e2]</code>.
506     </p>
507    
508     </section>
509    
510     <section title="Projections, filtering">
511    
512     <p>
513     If the x-expression <code>e</code> evaluates to a record or an XML
514     element, the construction <code>e.l</code> will extract the value of
515     field or attribute <code>l</code>. Similarly, the construction
516     <code>e.?l</code> will extract the value of field or attribute
517     <code>l</code> if present, and return the empty sequence
518     <code>[]</code> otherwise.
519     </p>
520    
521     <p>
522     If the x-expression <code>e</code> evaluates to a record,
523     the construction <code>e -. l</code> will produce a new record
524     where the field <code>l</code> has been removed (if present).
525     </p>
526    
527     <p>
528     If the x-expression <code>e</code> evaluates to an x-sequence,
529     the construction <code>e/</code> will result in a new x-sequence
530     obtained by taking in order all the children of the XML elements
531     from the sequence <code>e</code>. For instance, the x-expression
532     <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
533     evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
534     </p>
535    
536     <p>
537     If the x-expression <code>e</code> evaluates to an x-sequence,
538     the construction <code>e.(t)</code> (where <code>t</code> is an
539     x-type) will result in a new x-sequence
540     obtained by filtering <code>e</code> to keep only the elements
541     of type <code>t</code>. For instance, the x-expression
542     <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
543     evaluates to the x-value <code>[ 4 5 ]</code>.
544     </p>
545     </section>
546    
547     <section title="Dynamic type checking">
548    
549     <p>
550     If <code>e</code> is an x-expression and <code>t</code> is an x-type,
551     the construction <code>(e :? t)</code> returns the same
552     result as <code>e</code> if it has type <code>t</code>, and otherwise
553     raises a <code>Failure</code> exception whose argument explains
554     why this is not the case.
555     </p>
556    
557     <sample><![CDATA[{{ON}}
558     # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
559     f {{ <a>[ 1 2 '3' ] }};;
560     Exception:
561     Failure
562     "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
563     ]]></sample>
564     </section>
565    
566     <section title="Pattern matching">
567    
568     <p>
569     OCamlDuce comes with a powerful pattern matching operation.
570     X-patterns are described <a href="#patterns">below</a>.
571     The syntax for the pattern matching operation is:
572     <code>match e with p1 -> e1 | ... | pn -> en</code>.
573     The type-system ensures exhaustivivity for the pattern matching
574     and infers precise types for the capture variables.
575     It is also possile to use x-pattern matching as a regular
576     OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
577     match e with {{p1}} -> e1 | ... | {{pn}} -> en
578     function {{p1}} -> e1 | ... | {{pn}} -> en
579     </p>
580    
581 abate 1792 <p>
582     Pattern matching follows is first-match policy. The first pattern
583     that succeeds triggers the corresponding branch.
584     </p>
585    
586 abate 1787 <note>
587     currently it is impossible to mix normal OCaml patterns and x-patterns
588     in a single pattern matching.
589     </note>
590    
591     </section>
592    
593     <section title="Local binding">
594    
595     <p>
596     The x-expression <code>let p=e1 in e2</code> is equivalent to
597     <code>match e1 with p -> e2</code>. There is also an local binding
598     with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
599     e2</code>.
600     </p>
601    
602     </section>
603    
604    
605     <section title="Iterators">
606    
607     <p>
608     OCamlDuce comes with a sequence iterator
609     <code>map e with p1 -> e1 | ... | pn -> en</code> and
610     a tree iterator
611     <code>map* e with p1 -> e1 | ... | pn -> en</code>.
612     </p>
613    
614     <p>
615     For both constructions, the argument must evaluate to a sequence.
616     The <code>map</code> iterator applies the patterns to each element
617     of this sequence in turns and produces a new sequence by concatenating
618     all the results (all the right-hand sides must thus produce a
619     sequence). The set of patterns must be exhaustive for all the possible
620     elements of the input sequence.
621     </p>
622    
623     <p>
624     The tree iterator is similar except that the patterns need not be
625     exhaustive. If some element of the input sequence is not matched,
626     it is simply copied into the result unless it is an XML element. In
627     this case, the transformation is applied recursively to its content.
628     </p>
629    
630     </section>
631    
632     <section title="OCaml constructions">
633    
634     <p>
635     As a convenience, some of the OCaml expression constructors
636     are allowed as x-expressions (without a need to go back to OCaml
637     with double curly braces): (unqualified) value identifiers and
638     function calls.
639     </p>
640    
641     </section>
642    
643     </box>
644    
645     <box title="More on x-types" link="types">
646    
647     <p>
648     We have seen how to write simple x-types. We can then combine
649     them with Boolean connectives:
650     </p>
651    
652     <ul>
653     <li><code>t1 &amp; t2</code>: intersection;</li>
654     <li><code>t1 | t2</code>: union;</li>
655     <li><code>t1 - t2</code>: difference.</li>
656     </ul>
657    
658     <p>
659     The empty x-type is written <code>Empty</code> (it contains no value),
660     and the universal x-type is written <code>Any</code> (it contains
661     all the x-values) or <code>_</code>.
662     </p>
663    
664     <p>
665     When an x-type has been bound to some OCaml identifier
666     (<code>{{ON}}type t = {{...}}</code>), it is possible to use
667     this identifier in another x-type. Recursive definitions
668     are allowed:
669     </p>
670    
671     <sample><![CDATA[{{ON}}
672     type t1 = {{ <a>[ t2* ] }}
673     and t2 = {{ <b>[ t1* ] }}
674     ]]></sample>
675    
676     <p>
677     Note that x-values are always finite and acyclic. The type checker
678     detects type definition which would yield empty types:
679     </p>
680    
681     <sample><![CDATA[{{ON}}
682     # type t = {{ <a>[ t+ ] }};;
683     This definition yields an empty type
684     ]]></sample>
685    
686     <p>
687     If <code>t1</code> and <code>t2</code> are record x-types,
688     we can combine them with the infix <code>++</code> operator, which
689     mimics the corresponding operator on expressions (record
690     concatenation). Similarly, we can use the infix <code>@</code>
691     concatenation operator on sequence x-types.
692     </p>
693    
694     </box>
695    
696     <box title="X-patterns" link="patterns">
697    
698     <p>
699     X-patterns follow the same syntax as X-types. In particular,
700     any X-type is a valid X-pattern. In addition to X-types constructors,
701     X-patterns can have:
702     </p>
703    
704     <ul>
705     <li>capture variables (lowercase OCaml identifiers);</li>
706     <li>constant bindings <code>(x := c)</code> where x is a capture
707     variable and c is
708     a literal x-constant (this pattern always succeeds and returns the
709     binding x->c).</li>
710     </ul>
711    
712     <p>
713 abate 1792 Here is a brief description of the semantics of patterns. Given
714     an input value, a pattern can either succeed or fail. If it succeeds,
715     it also produces a bindings from the capture variables in the pattern
716     to x-values.
717     </p>
718 abate 1787
719 abate 1792 <ul>
720    
721     <li>A pattern which is just a type (no capture variable) succeeds if
722     and only if the value has the type.</li>
723    
724     <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
725     or <code>p2</code> succeed, and returns the corresponding binding; if
726     both patterns succeeds, <code>p1</code> wins. It is required that
727     <code>p1</code> and <code>p2</code> have the same sets of capture
728     variables. </li>
729    
730     <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
731     and <code>p2</code> succeed, and returns the concatenation of the two
732     bindings. It is required that <code>p1</code> and <code>p2</code> have
733     <em>disjoint</em> sets of capture variables. </li>
734    
735     </ul>
736    
737     <p>
738     In record x-patterns, it is possible to omit the <code>=p</code> part
739     of a field. The content is then replaced with the label name
740 abate 1806 considered as a capture variable (or as a previously defined type).
741     E.g. <code>{ x y=p }</code> is
742 abate 1792 equivalent to <code>{ x=x y=p }</code>.</p>
743    
744 abate 1787 <p>It is also possible to add an "else" clause:
745     <code>{ x = (a,_)|(a:=3) }</code>
746     will accept any record with atmost the field <code>x</code>. If the content
747     is a pair, the capture variable a will be bound to its component;
748     otherwise, it is set to <code>3</code>.</p>
749    
750     <p>
751     In regular expressions, it is possible to extract whole subsequences
752     with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
753     </p>
754    
755     <p>
756     If the same sequence capture variable appears several times (or below a
757     repetition) in a regexp, it is bound to the concatenation of all
758     matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
759     collect in <code>x</code> all the elements of type <code>Int</code> from
760 abate 1792 a sequence. It is not legal to have repeated simple capture variables.
761     </p>
762 abate 1787
763     <p>
764 abate 1788 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
765     as possible). They admit non-greedy variants <code>+?,*?,??</code>.
766 abate 1787 </p>
767     </box>
768    
769     <box title="Namespace bindings" link="ns">
770    
771     <p>
772     The binding of namespace prefixes to URIs
773     can be done either by toplevel phrases (structure items) or
774     by local declarations:
775     </p>
776    
777     <sample>{{ON}}
778     # {{ namespace ns = "http://..." }};;
779     # let x = {{ `ns: x }};;
780     val x : {{`ns:x}} = {{`ns:x}}
781     # let x = {{ let namespace ns = "http://..." in `ns:x }};;
782     val x : {{`ns:x}} = {{`ns:x}}
783     </sample>
784    
785     <p>The toplevel definitions can also appear in module interfaces
786     (signatures). A toplevel prefix binding is not exported by a module: its scope
787     is limited to the current structure or signature. It is possible
788     to specify a default namespace, and to reset it:
789     </p>
790    
791     <sample>{{ON}}
792     # {{ namespace "http://..." }};;
793     # {{ `x }};;
794     - : {{`ns1:x}} = {{`ns1:x}}
795     # {{ namespace "" }};;
796     # {{ `x }};;
797     - : {{`x}} = {{`x}}
798     </sample>
799    
800     <p>
801     Note that the value pretty-printer invented some prefix
802     for the namespace URI. The default prefix declaration also have a
803     local form <code> let namespace "..." in ... </code>.
804     </p>
805    
806     </box>
807    
808 abate 1788 <box title="More on type-checking" link="typecheck">
809 abate 1787
810 abate 1788 <section title="Type inference">
811    
812 abate 1787 <p>
813 abate 1788 As we said above, the programmer is sometimes required to provide type
814     annotations. To know where to put these annotation, it is necessary to
815     get a basic understanding of how type-checking works.
816 abate 1787 </p>
817    
818 abate 1788 <p>
819     The OCaml type-checker is run first to detect which sub-expressions
820     are of the x-kind. A second ML type-checking pass is then done to
821     introduce subsumption (implicit subtyping) steps where allowed. After
822     these two passes, the OCamlDuce type checker obtains a data-flow summary of
823     x-values in the whole compilation unit. This is a directed graph,
824     whose edges represent either simple data-flow or complex operation
825     on x-values. The nodes of the graph can be thought as x-type
826     variables. A data-flow edge corresponds to a subtyping constraints,
827     and an operation edge corresponds to a symbolic constraints which
828     mimics the corresponding operation on values.
829     </p>
830    
831     <p>
832     Some of the nodes are given an explicit type by the programmer,
833     through type annotations (on expressions or function arguments)
834     or the other usual mechanism in ML (data type declarations,
835     signatures, ...).
836     </p>
837    
838     <p>
839     Also, if there is a loop with only subtyping edges in the graph,
840     all the nodes on the loop are merged together.
841     </p>
842    
843     <p>
844     After this operation, the graph is required to be acyclic (assuming
845     that the nodes with an explicit type are removed from the graph). It
846     is the responsibility of the programmer to provide enough type
847     annotation to achieve this property. Otherwise, a type error
848     is issued.
849     </p>
850    
851     <sample><![CDATA[{{ON}}
852     # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
853     Cycle detected: cannot type-check
854     # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
855     val f : int -> {{String}} = <fun>]]>
856     </sample>
857    
858     <p>
859     In the example above, there is a cycle between the result type for
860     <code>f</code> and the type for the sub-expression <code>{{ON}}f
861     {{n-1}}</code>. It is here broken with a type annotation on the result; it could
862     have been broken by a type annotation on the expression <code>{{ON}}f
863     {{n-1}}</code>, or on the function <code>f</code> itself, or by a
864     module signature.
865     </p>
866    
867     <p>
868     Let us study another simple example:
869     </p>
870    
871     <sample>{{ON}}
872     # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
873     - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
874     </sample>
875    
876     <p>
877     The type-checkers detects that the two x-values <code>2</code> and
878     <code>3</code> can flow to the argument of <code>f</code>. Its body
879     is thus type-checked with the assumption that <code>x</code> has type
880     <code>2--3</code>. The computed result type is then <code>3--4</code>.
881     </p>
882    
883    
884     <p>
885     The type-inference process described above is global by nature. The
886     acyclicity condition is only imposed after a whole compilation unit
887     has been type-checked by OCaml (and the information from the module
888     interface as been integrated). When a type variable is inferred to
889     be of the x-kind, it is never generalized. As a consequence, there
890     is no parametric polymorphism on x-types.
891     </p>
892    
893     <p>
894     In the toplevel, type-checking is done after each phrase. Consider
895     the following session:
896     </p>
897    
898     <sample><![CDATA[{{ON}}
899     # let f x = {{ x + 1 }};;
900     val f : {{Empty}} -> {{Empty}} = <fun>
901     # let a = f {{ 2 }};;
902     Subtyping failed 2 <= Empty
903     Sample:
904     2
905     ]]></sample>
906    
907     <p>
908     The function <code>f</code> is inferred to have type
909     <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
910     phrase is type-checked, the data-flow graph says that no value
911     can flow to <code>x</code>, and thus the input type is empty
912     (and similarly for the result type). If the two phrases
913     were type-checked together (which would be the case it they had
914     been compiled by the compiler, not in the toplevel), the type checker
915     would have correctly inferred that the input type for <code>f</code>
916     must contain <code>2</code>.
917     </p>
918    
919     </section>
920    
921     <section title="Implicit subtyping">
922    
923     <p>
924     Coercion from an x-type to a super type is automatic in OCamlDuce.
925     However, this automatic subsumption does not carry over to OCaml
926     type constructor, even if there are covariant. Consider:
927     </p>
928    
929     <sample><![CDATA[{{ON}}
930     # let f (x : {{ Int }} * {{ Int }}) = 1;;
931     val f : {{Int}} * {{Int}} -> int = <fun>
932     # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
933     This expression has type {{0}} * {{0}} but is here used with type
934     {{Int}} * {{Int}}
935     # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
936     val g : {{0}} * {{0}} -> int = <fun>
937     # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
938     val g : {{0}} * {{0}} -> int = <fun>
939     ]]></sample>
940    
941     <p>
942     The first attempt to define <code>g</code> fails because the type for
943     <code>x</code> is not an x-type and thus subsumption does not
944     apply. In the second attempt, we extract the two components of the
945     pair; since they are inferred to be x-values, subtyping applies to
946     both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
947     it is legal to unify its type with the input type of <code>f</code>.
948     The third definition for <code>g</code> gives an alternative solution:
949     using explicit OCaml type coercions.
950     </p>
951    
952     </section>
953    
954 abate 1787 </box>
955    
956 abate 1788 <box title="Exchanging values" link="transl">
957    
958     <p>
959     OCamlDuce strongly seperates regular OCaml values from the new
960     x-values. They have different syntax, expressions, types, patterns,
961     and even type-checking algorithms. This strong segregation is key point
962     which allowed a simple integration between very different type
963     systems.
964     </p>
965    
966     <p>
967     At some point, it is still necessary to cross the frontier and
968     translate OCaml values to x-values or the opposite.
969     </p>
970    
971     <p>
972     Fortunately, OCamlDuce provides automatic translations in both
973     directions. Instead of double curly braces, you can
974     enclose x-expressions in curly brace+colon <code>{: ... :}</code>
975     (here, the <code>...</code> is an x-expression).
976     The effect is to translate the result of the x-expression
977     (which must be an x-value) to an OCaml value. Similarly,
978     in an x-expression, you can obtain the x-translation of
979     an OCaml value with the same syntax <code>{: ... :}</code>
980     (here, the <code>...</code> is an OCaml expression).
981     </p>
982    
983     <p>
984     Here is how the translation works. To each OCaml type <code>t</code>,
985     we associate an x-type <code>T(t)</code> and a pair of translation
986     function between <code>t</code> and <code>T(t)</code>.
987     Actually, not all the features are supported. For instance,
988     free type variables, abstract types, object types, non-regular
989     recursive types cannot be translated. In particular, since
990     type variables are not allowed, the OCaml type must be fully known.
991     </p>
992    
993     <p>
994 abate 1789 The translation for an OCaml type <code>t</code> is defined by structural
995     induction on <code>t</code>. Sum types are
996 abate 1788 translated to union types: a constant constructor <code>A</code> is
997     translated to the qualified name <code>`A</code>; a non-constant
998     constructor <code>A of t1 * ... * tn</code> is translated to
999     <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
1000     have the same translation. Record types are translated to closed
1001     record x-types. Some other translations:
1002     </p>
1003    
1004     <table border="1">
1005     <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
1006     <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
1007     <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
1008     <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
1009     <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
1010     <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
1011     <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
1012     <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
1013     <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
1014     <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
1015     </table>
1016    
1017     <p>
1018     Here is an example:
1019     </p>
1020    
1021     <sample>{{ON}}
1022     # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
1023     - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
1024     </sample>
1025    
1026     <p>
1027     In this example, the result type of the translation is inferred
1028     to be <code>{{ON}}{{ Int }} list</code> (because the type for
1029     <code>f</code> is given). The corresponding x-type
1030     is <code>{{ON}}{{ [Int*] }}</code>.
1031     </p>
1032    
1033     </box>
1034    
1035 abate 1789 <box title="The standard library" link="stdlib">
1036    
1037     <p>
1038     In OCamlDuce, the Num library from OCaml is included in the standard
1039     library. In addition, there are two new module called
1040     <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
1041     </p>
1042    
1043     <p>
1044     The module <code>Cduce_types</code> gives access to the internal
1045     representation of x-values. It is currently undocumented.
1046     </p>
1047    
1048     <p>
1049     The module <code>Ocamlduce</code> provides several useful
1050     functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
1051     documentation for a description of its interface.
1052     </p>
1053    
1054     </box>
1055    
1056 abate 1792 <box title="Marshaling" link="marshal">
1057    
1058     <p>
1059     OCamlDuce use some tricks on its internal representation of x-values
1060     to reduce memory usage and improve performance. You need to pay
1061 abate 1793 special attention if you want to use OCaml serialization functions
1062 abate 1792 (module <code>Marshal</code>, functions
1063     <code>input_value/output_value</code>) on x-values. In addition to
1064     your values, you also need to save and restore some piece of internal data
1065     using the functions <code>Cduce_types.Value.extract_all</code> and
1066     <code>Cduce_types.Value.intract_all</code>. Of course, this also
1067     applies if the value to be serialized contains deeply nested x-values.
1068     </p>
1069    
1070     <p>
1071     Here are generic
1072     serialization/deserializations functions that illustrate how to do it:
1073     </p>
1074    
1075     <sample>
1076     let my_output_value oc v =
1077     let p = Cduce_types.Value.extract_all () in
1078     output_value oc (p,v)
1079    
1080     let my_input_value ic =
1081     let (p,v) = input_value ic in
1082     Cduce_types.Value.intract_all p;
1083     v
1084     </sample>
1085    
1086     </box>
1087    
1088     <box title="Performance" link="perf">
1089    
1090     <section title="Strings">
1091    
1092     <p>
1093     OCaml users might be surprised by the fact that x-strings are simply
1094     represented as sequences in OCamlDuce. Does this mean that they are
1095     actually stored in memory as linked list? Certainly not! The internal
1096     representation of sequence values uses several tricks to improve
1097     performance and memory usage. In particular, a special form in the
1098     representation can store strings as byte buffers, as in OCaml.
1099     It an XML document is loaded, or if a Caml string is converted
1100     to an x-value, this compact representation will be used.
1101     </p>
1102    
1103     </section>
1104    
1105     <section title="Concatenation">
1106    
1107     <p>
1108     Similarly, OCaml users might be relectutant to use the sequence
1109     concatenation <code>@</code> on sequences. In OCaml, the complexity
1110     of this operator is linear in the size of its first argument (which
1111     need to be copied). OCamlDuce use a special form in its internal
1112     representation to store concatenation in a lazy way. The concatenation
1113     will really by computed only when the value is accessed. This means
1114     that it's perfectly ok to build a long sequence by adding
1115     new elements at the end one by one, as long as you don't
1116     simultaneously inspect the sequence.
1117     </p>
1118    
1119     </section>
1120    
1121     <section title="Pattern matching">
1122    
1123     <p>
1124     Another point which is worth knowing when programming in OCamlDuce
1125     is that patterns can be written in a declarative style without
1126     affective performance. The compiler uses static type information
1127     about matched values to produce efficient code for pattern matching.
1128     To illustrate this, consider the following sample:
1129     </p>
1130    
1131     <sample><![CDATA[{{ON}}
1132     x.ml:
1133    
1134     type a = {{ <a>[ a* ] }}
1135     type b = {{ <b>[ b* ] }}
1136    
1137     let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1138     ]]></sample>
1139    
1140     <sample><![CDATA[{{ON}}
1141     y.ml:
1142    
1143     type a = {{ <a>[ a* ] }}
1144     type b = {{ <b>[ b* ] }}
1145    
1146     let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1147     ]]></sample>
1148    
1149     <p>
1150     The two functions have exactly the same semantics, but the first
1151     implementation is more declarative: it uses type checks to distinguish
1152     between <code>a</code> and <code>b</code> instead of saying
1153     <em>how</em> to distinguish between these two types. Imagine
1154     that the definition of these types change to:
1155     </p>
1156    
1157     <sample><![CDATA[{{ON}}
1158     type a = {{ <x kind="a">[ a* ] }}
1159     type b = {{ <x kind="b">[ b* ] }}
1160     ]]></sample>
1161    
1162     <p>
1163     Then the first implementation still works as expected, but the
1164     second one needs to be rewritten.</p>
1165    
1166     <p>Now one might believe that the second implementation is more
1167     efficient because it tells the compiler to check only the root tag,
1168     whereas the first implementation would force
1169     the compiler to produce code to check that all tags in the tree
1170     are <code>a</code>s. But this is not what happens! Actually,
1171     you can check that the compiler will produce exactly the same code
1172     for both implementations. It considers the static type information
1173     about the argument of the pattern matching (here, the input type
1174     of the function), and computes an efficient way to evaluate
1175     patterns for the values of this type.
1176     </p>
1177    
1178     </section>
1179    
1180     <section title="The map iterator">
1181    
1182     <p>
1183     The <code>map ... with ...</code> iterator is implemented in a
1184     tail-recursive way. You can safely use it on very long sequences.
1185     </p>
1186    
1187     </section>
1188    
1189     </box>
1190    
1191 abate 1799 <box title="OCaml and OCamlDuce" link="ocaml">
1192    
1193     <p>
1194     Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1195     OCaml release. This means that OCamlDuce can use OCaml-generated
1196     <tt>.cmi</tt> files and that it produces an OCaml-compatible
1197     <tt>.cmi</tt> file if the interface does not use any x-type
1198     (this file is equal to what would have been obtained by using OCaml).
1199     </p>
1200    
1201     <p>
1202     It is thus possible to use existing libraries which were compiled for
1203 abate 1821 OCaml. It is also possible to use OCamlDuce to compile
1204 abate 1799 some modules and use them in an OCaml project provided their interface
1205     is pure OCaml.
1206     </p>
1207    
1208     </box>
1209    
1210 abate 1832 </page>
1211    
1212     <page name="ocaml_code">
1213     <title>OCamlDuce: code samples and applications</title>
1214    
1215 abate 1789 <box title="Code samples" link="code">
1216    
1217     <section title="Parsing XML files">
1218    
1219     <p>
1220     OCamlDuce does not come with any built-in XML parser. However,
1221     the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
1222     makes it easy to plug existing XML parsers. Here is some
1223     code which demonstrate how to do that with three of
1224     the most popular OCaml XML parser libraries:
1225     </p>
1226    
1227     <ul>
1228     <li><a
1229     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
1230     <li><a
1231     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
1232     <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
1233     </ul>
1234    
1235     </section>
1236    
1237     <section title="Converting DTD to OCamlDuce types">
1238    
1239     <p>
1240     This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
1241     from a DTD. It requires PXP.
1242     </p>
1243    
1244     <note>This application does not use any of the new features, but it
1245     can be useful in the development of OCamlDuce applications.
1246     </note>
1247    
1248     </section>
1249    
1250     <section title="Parsing XML Schema, producing valid XHTML output">
1251    
1252     <p>
1253     This <a
1254     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
1255     parses XML Schema Definitions (.xsd files), and produces summaries
1256     (toplevel declaration names) in XHTML. OCamlDuce type system ensures
1257     that the parser is coherent with the input XML type (any valid XML
1258     Schema is accepted) and that the printer is coherent with the output
1259     XML type (it is necessarily a valid XHTML document).
1260     </p>
1261    
1262     <p>
1263     Of course, for such a simple transformation, parsing the XML document
1264     into an internal representation is not necessary. A direct XML-to-XML
1265     transformation would be easy to write. We wanted to illustrate
1266     a complex parsing of XML.
1267     </p>
1268    
1269     <p>
1270     It it interesting to introduce errors in the parser
1271     <code>schema_loader.ml</code> or the printer
1272 abate 1792 <code>dump_schema.ml</code> and see how the type system catches them.
1273 abate 1789 </p>
1274    
1275     <note>
1276     The application uses XML Light to parse XML document.
1277     </note>
1278    
1279     <note>
1280     Some features of XML Schema are not parsed, such as
1281     <code>redefine</code> elements or substitution groups.
1282     </note>
1283    
1284 abate 1811 <note>
1285     To compile the application with the provided Makefile,
1286     you must make the environment variable <code>OCAMLFIND_CONF</code>
1287     point to the <code>$GODI/etc/findlib-ocamlduce.conf</code> file.
1288     </note>
1289    
1290 abate 1789 </section>
1291    
1292 abate 1802 <section title="String regular expressions">
1293    
1294     <p>
1295     OCamlDuce supports regular expression types and patterns, not only
1296     for sequences of XML elements, but also for strings. The following
1297     example shows how to use regular expressions to split a string
1298     of the form <code>name1=val1,...,namen=valn</code> with
1299     <code>n>0</code> into
1300     a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>.
1301     The <code>*?</code> operator in regular expressions means ``ungreedy
1302     match'' (match the shortest possible subsequence). The last
1303     pattern describes precisely strings which are not matched by
1304     the other cases. It would be possible to replace it with
1305     the wildcard <code>_</code>.
1306     </p>
1307    
1308     <sample><![CDATA[{{ON}}
1309     let rec split (s : {{ String }}) =
1310     match s with
1311     | {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest)
1312     | {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ]
1313     | {{ Any - [ _* '=' _* ] }} -> failwith "split"
1314     ]]></sample>
1315    
1316     </section>
1317    
1318 abate 1789 </box>
1319    
1320 abate 1808 <box title="Applications in OCamlDuce" link="appli">
1321    
1322     <ul>
1323     <li><a
1324     href="http://anil.recoil.org/projects/review2atom.html">Review2Atom</a>
1325     by Anil Madhavapeddy: translates paper review files in XML format into
1326     an Atom feed suitable for aggregation.
1327     </li>
1328     </ul>
1329    
1330     </box>
1331    
1332 abate 1832 </page>
1333 abate 1808
1334 abate 1634 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5