/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1876 - (hide annotations)
Tue Jul 10 19:28:40 2007 UTC (5 years, 11 months ago) by abate
File MIME type: text/xml
File size: 43437 byte(s)
[r2006-05-10 21:19:45 by afrisch] Empty log message

Original author: afrisch
Date: 2006-05-10 21:19:46+00:00
1 abate 1634 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2     <page name="ocaml">
3    
4 abate 1787 <title>OCamlDuce</title>
5 abate 1634
6     <left>
7     <local-links href="index,documentation"/>
8 abate 1787 <p>On this page:</p>
9     <boxes-toc/>
10 abate 1634 </left>
11    
12     <box>
13    
14     <p>
15 abate 1787 OCamlDuce is a merger between <a
16     href="http://caml.inria.fr/">OCaml</a> and
17     <local href="index">CDuce</local>. It comes as a modified
18 abate 1832 version of OCaml which integrates CDuce features: XML expressions,
19     regular expression types and patterns, iterators.
20 abate 1634 </p>
21    
22 abate 1790 <p>
23 abate 1832 OCamlDuce is distributed under the Q Public License version 1.0.
24 abate 1790 </p>
25    
26 abate 1832 <ul>
27     <li>A <a
28 abate 1823 href="http://cristal.inria.fr/~frisch/ocamlcduce/ocamlduce.pdf">technical
29 abate 1832 report</a> describes the theory behind OCamlDuce's type system (to be
30     presented in PLAN-X 2006).</li>
31     <li><local href="ocaml_install">How to get OCamlDuce:</local> download,
32     installation instructions, packages.</li>
33     <li><local href="ocaml_manual">User's manual</local>.</li>
34     <li><local href="ocaml_code">Code samples and
35     applications</local>.</li>
36     <li><local href="mailing">Mailing lists</local>.</li>
37     </ul>
38 abate 1815
39 abate 1787 </box>
40    
41 abate 1832 <page name="ocaml_install">
42     <title>Getting OCamlDuce</title>
43    
44 abate 1787 <box title="Download and installation" link="install">
45    
46 abate 1634 <p>
47 abate 1800 Currently, OCamlDuce
48 abate 1831 is based on OCaml 3.09.1 and CDuce 0.4.0.
49 abate 1634 </p>
50    
51     <ul>
52     <li><a
53 abate 1811 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl5.tar.gz">Compiler,
54 abate 1821 version 3.08.4, patch level 5</a> (to be used with OCaml 3.08.4)</li>
55     <li><a
56 abate 1840 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.1pl1.tar.gz">Compiler,
57 abate 1831 version 3.09.1</a> (to be used with OCaml 3.09.1)</li>
58 abate 1876 <li><a
59     href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.2.tar.gz">Compiler,
60     version 3.09.2</a> (to be used with OCaml 3.09.2)</li>
61 abate 1634 </ul>
62    
63     <p>
64 abate 1876 The following describes the installation procedure for the
65     3.09.2 release.
66     OCamlDuce is installed on top of an existing OCaml
67     installation (whose version number must match) and it requires
68     a recent version of findlib. The build procedure
69 abate 1821 is: <tt>make all &amp;&amp; make opt &amp;&amp; make
70     install</tt>. The configuration is taken from OCaml's
71     <tt>Makefile.config</tt>.
72 abate 1800 </p>
73    
74 abate 1821 <p>
75     The tools are named <tt>ocamlduce, ocamlducec, ocamlduceopt,
76 abate 1876 ocamlducedep, ocamlducemktop, ocamlducemktop, ocamlducefind</tt>.
77     They are installed in the same directory as the ocaml compiler itself.
78 abate 1821 </p>
79 abate 1800
80 abate 1821 <p>
81 abate 1876 In addition, a library called <tt>ocamlduce.cma/.cmxa</tt> is built.
82     It depends on the <tt>nums</tt> library. A findlib package named
83     <tt>ocamlduce</tt> is created by the <tt>make install</tt> target.
84     Normally, you don't need to care about the package except if you
85     insist to link your modules with the regular OCaml compilers (not
86     OCamlDuce), but there is no good reason to do so.
87 abate 1821 </p>
88 abate 1800
89 abate 1840 <p>
90     To generate the ocamldoc documentation for the <tt>Ocamlduce</tt>
91     module: <tt>make htdoc</tt>.
92     </p>
93    
94 abate 1876 <section title="Compiling, linking, calling the toplevel">
95    
96     <p>Starting from OCamlDuce 3.09.2, you don't need to struggle with
97     extra command-line options. You must simply use the OCamlDuce tools:</p>
98    
99     <sample>
100     {{Call the toplevel:}} ocamlduce
101     {{Compile:}} ocamlducec -c x.ml
102     {{Link:}} ocamlducec -o x x.cmo
103     {{Use ocamlfind:}} ocamlducefind ocamlc -o -linkpkg -package pcre x.ml
104     </sample>
105    
106     </section>
107    
108    
109 abate 1823 <section title="Building from the CVS">
110    
111     <p>
112     The following commands will extract the current development version of
113     OCamlDuce (from OCaml and CDuce CVS repositories):
114     </p>
115    
116     <sample>
117     cvs -f -d ":pserver:anoncvs@camlcvs.inria.fr:/caml" co -r cducetrunk ocaml
118     cvs -f -d ":pserver:anonymous@cvs.cduce.org:/cvsroot" co cduce
119     (cd ocaml/cduce; make link)
120     </sample>
121    
122     </section>
123    
124 abate 1808 </box>
125    
126     <box title="Ports and packages" link="ports">
127    
128     <section title="GODI">
129 abate 1821
130 abate 1800 <p>
131 abate 1821 There is a <tt>godi-ocamlduce</tt> package available in GODI
132 abate 1876 (sections 3.08 and 3.09). Currently, there is no GODI package for
133     OCamlDuce 3.09.2.
134 abate 1634 </p>
135 abate 1821
136 abate 1808 </section>
137 abate 1634
138 abate 1808 <section title="DarwinPorts and OpenBSD">
139    
140     <p>
141     Anil Madhavapeddy contributed two ports of OCamlDuce for DarwinPorts
142     (in dports/lang/ocamlduce) and for OpenBSD (in ports/lang/ocamlduce).
143     </p>
144    
145     </section>
146    
147 abate 1634 </box>
148    
149 abate 1832 </page>
150    
151     <page name="ocaml_manual">
152     <title>OCamlDuce: manual</title>
153    
154 abate 1787 <box title="Overview" link="overview">
155    
156     <p>
157 abate 1791 The goal of the OCamlDuce project is to extend the OCaml language with features
158     to make it easier to write safe and efficient complex applications
159     that need to deal with XML documents. In particular, it relies
160     on a notion of types and patterns to guarantee statically
161     that all the possible input documents are correctly processed, and
162     that only valid output documents are produced.
163     </p>
164    
165     <p>
166 abate 1788 In a nutshell, OCamlDuce extends OCaml with a new kind of values
167     (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
168     strings. In order to describe these values, it also extends the type algebra
169 abate 1787 with so-called <em>x-types</em>. The philosophy behind these types is that they
170     represent <em>set of x-values</em>. They can be very precise: indeed,
171     each value can be seen as a singleton type (a set with a single
172     value), and it is possible to form Boolean combinations of x-types
173     (intersection, union, difference).
174     </p>
175    
176     <p>
177     OCamlDuce's type system can be understood as a refinement of OCaml.
178     For each sub-expression which is inferred to be of the x-kind (using
179     OCaml unification based type-system), OCamlDuce will try to infer to
180     best possible sound x-type. Here, best means smallest for the natural
181     subtyping relation (set inclusion). The inference algorithm is
182     actually a data-flow analysis: the x-type will collect all the values
183     that can be produced by the expression, considering all the possible
184     data-flow in the program. It it sometimes necessary to provide
185     explicit type annotations to help the type checker infer this type, in
186     particular when you define recursive functions or when you use
187     iterators.
188     </p>
189    
190     <p>
191     Subtyping is implicit for x-types: if an expression is inferred to be
192     of x-type <code>t</code>, which is a subtype of <code>s</code>, then
193     it is possible to use this expression in any context which expects a
194     value of type <code>s</code>.
195     </p>
196    
197     </box>
198    
199     <box title="Getting started" link="start">
200    
201     <p>
202     Most of the new language features are enclosed within double curly braces
203     <code>{{ON}}{{...}}</code>. For instance, the following code sample
204     defines a value <code>x</code> as an XML element (with tag
205     <code>a</code>, an attribute <code>href</code>, and a simple
206     string as content):
207     </p>
208    
209     <sample><![CDATA[{{ON}}
210     # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
211     val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
212     {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
213     ]]></sample>
214    
215     <p>
216     What appears between the curly braces is called an x-expression.
217     Similarly, there are x-types (as seen above), and also x-patterns.
218     The delimiters <code>{{ON}}{{...}}</code> are only used
219     for syntactical reasons, to avoid clashed between OCaml and CDuce
220     syntaxes and lexical conventions. As a matter of fact,
221     an OCaml expression need not be a syntactical x-expression
222     (delimited by double curly braces) to evaluate to an x-value.
223     For instance, once <code>x</code> has been declared as above,
224     the expression <code>x</code> evaluates to an x-value.
225     </p>
226    
227    
228     <p>
229     It is possible to use an arbitrary
230     OCaml expression as part of an x-expression: it must simply be
231     protected by a new pair of double curly braces. For instance, there is
232     no <code>if-then-else</code> construction for x-expressions, but you
233     can write:
234     </p>
235    
236     <sample><![CDATA[{{ON}}
237     # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
238     - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
239     ]]></sample>
240    
241     <p>
242     Only the highlighted parts are parsed as x-expressions. The
243     <code>if-then-else</code> sub-expression is parsed as an OCaml
244     expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
245     'z' ]}}</code>).
246     </p>
247    
248     </box>
249    
250     <box title="X-values" link="values">
251    
252     <p>
253     X-values are intended to represent XML documents and fragments
254     thereof: elements, tags, text, sequences. In this section, we
255     present the x-value algebra, the syntax of the corresponding
256     x-expression constructors and the associated x-types.
257     </p>
258    
259     <p>
260     There are three kinds of atomic kind of x-values:
261     </p>
262     <ul>
263     <li>Unicode characters;</li>
264     <li>qualified names;</li>
265     <li>arbitrarily large integers.</li>
266     </ul>
267    
268     <section title="Characters">
269    
270     <p>
271     X-characters are different from OCaml characters. They can represent
272     the range of Unicode codepoints defined in the XML specification.
273     Character literals are delimited by single quotes. The escape
274     sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
275     numerical escape sequence are written <code>\n;</code> where n is an integer
276     literal (note the extra semi-colon). The source code is interpreted as
277     being encoded in iso-8859-1. As a consequence, Unicode characters which are not
278     part of the Latin1 character set must be introduced with this
279     numerical escape mechanism. The x-types for x-characters are:
280     </p>
281     <ul>
282     <li>singletons;</li>
283     <li>intervals, written <code>c -- d</code>, where <code>c</code> and
284     <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
285     }}</code>);</li>
286     <li>the type of all x-characters, written <code>Char</code>;</li>
287     <li>the type of all Latin1 characters, written <code>Latin1Char</code>
288     (defined as <code>\0; -- \255;</code>).</li>
289     </ul>
290    
291     </section>
292    
293     <section title="Integers">
294    
295     <p>
296     X-integers are arbitrarily large. Literals must be written in decimal.
297     Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
298     The x-types for x-integers are:
299     </p>
300     <ul>
301     <li>singletons;</li>
302     <li>intervals, written <code>i -- j</code>, where <code>i</code> and
303     <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
304     }}</code>); it is possible to replace <code>i</code> or <code>j</code>
305     with <code>**</code> to define open-ended intervals, e.g.
306     <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
307     </li>
308     <li>the type of all x-integers, written <code>Int</code>;</li>
309     <li>the type of all the integers which can be represented by a
310     signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
311     <code>Int64</code>).</li>
312     </ul>
313    
314     </section>
315    
316     <section title="Qualified names">
317    
318     <p>
319     Qualified names are intended to represent XML tag names. Conceptually,
320     they are made of a namespace URI and a local name. Since URIs tends
321     to be long, literals are of the form <code>`prefix:local</code>
322     where <code>local</code> is the local name and <code>prefix</code>
323     is an <em>namespace prefix</em> bound to some URI (in the scope of the
324     literal). The local name follows the definitions from
325     the XML Namespaces specification; a dot character must be protected
326     by a backslash and non-Latin1 characters are written as character
327     literals <code>\n;</code>. <a href="#ns">See below</a> for a
328     explanation on how to bind prefixes to URIs. To refer
329     to the default namespace (or the absence of namespace if not default
330     has been defined), the syntax is simply <code>`local</code>.
331     The x-types for qualified names are:
332     </p>
333     <ul>
334     <li>singletons;</li>
335     <li>the type of all qualified names, written <code>Atom</code>;</li>
336     <li>the type of all qualified names from a specified namespace,
337     written <code>`ns:*</code>.</li>
338     </ul>
339     </section>
340    
341     <section title="Records">
342    
343     <p>
344     X-records are mainly used to represent the set of attributes of an XML
345     element. An x-record is a binding from a finite set of <em>labels</em>
346     to x-values. Labels follows the same syntax as for qualified names
347     without the leading backquote. However, if the namespace prefix is not
348     given, the default namespace does not apply (the namespace URI is
349     empty). The syntax for record x-expressions is <code> { l1=e1
350     ... ln=en }</code> where the <code>li</code> are labels and the
351     <code>ei</code> are x-expressions. Fields can also be separated with a
352     semi-colon. It is legal to omit the expression for a field; the label is then
353     taken as the content of the field (a value with this name must be
354     defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
355     in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
356     y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
357     which labels are authorized/mandatory, and what the types of the
358     corresponding fields are. There are two kind of record x-types:
359     </p>
360    
361     <ul>
362     <li>
363     Closed record types, which only allow a finite number of fields:
364     <code>{ l1=t1 ... ln=tn }</code>;
365     </li>
366     <li>
367     Open record types, which allow additional fields (with arbitrary
368     type):
369     <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
370     in the syntax).
371     </li>
372     </ul>
373    
374     <p>
375     In both cases, it is possible to make one of
376     the fields optional by changing = to =?.
377     </p>
378    
379     <p>
380     The x-type of all x-record is thus <code>{ .. }</code>,
381     and the x-type of x-records with maybe a field <code>l</code>
382     of type <code>Int</code> and maybe arbitrary other fields is
383     <code>{ l=?Int .. }</code>.
384     </p>
385    
386     </section>
387    
388     <section title="Sequences">
389    
390     <p>
391     X-sequences are finite and ordered collections of x-values.
392     The syntax for a sequence x-expression in
393     <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
394     by semi-colons as in OCaml list). Each item <code>ei</code>
395     can either be:
396     </p>
397     <ul>
398     <li>an x-expression;</li>
399     <li><code>!e</code> where <code>e</code> is an x-expression which
400     evaluates to a sequence (whose content is inserted in the sequence
401     which is currently defined); e.g.
402     <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
403     <code>[ 1 2 3 4 ]</code>;</li>
404     <li>a string literal delimited by simple quotes; e.g.
405     <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
406     </ul>
407    
408     <p>
409     X-types for sequences are of the form <code>[R]</code>
410     where <code>R</code> is a regular expression over x-types which
411     describe the possible contents of the sequences. The possible
412     forms of regular expressions are:
413     </p>
414    
415     <ul>
416     <li><code>t</code> (one single element of x-type <code>t</code>)</li>
417     <li><code>R*</code> (zero or more repetitions)</li>
418     <li><code>R+</code> (one or more repetitions)</li>
419     <li><code>R?</code> (zero or one repetition)</li>
420     <li><code>R1 R2</code> (sequence)</li>
421     <li><code>R1|R2</code> (alternation)</li>
422     <li><code>(R)</code></li>
423     <li><code>/t</code> (guard: the tail of the sequence must comply with
424     <code>t</code>).</li>
425     <li><code>PCDATA</code> (equivalent to Char*).</li>
426     </ul>
427    
428     <note>sequence are actually encoded with embedded pairs and a
429     terminator, and sequences types are encoded with product types and
430     recursive types. The encoding is available to the programmer
431     but not described in this manual.
432     </note>
433    
434     </section>
435    
436     <section title="Strings">
437    
438     <p>
439     Strings are nothing but sequences of characters. There are two
440     predefined types <code>String</code> and <code>Latin1</code>
441     (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
442     </p>
443    
444     <p>
445     A string literal <code>[ '...' ]</code> can also be written
446     <code>"..." </code> (without the square brackets). Note that simple
447     (resp. double) quotes need to be escaped only when the string is
448     delimited with double (resp. simple) quotes.
449     </p>
450    
451     </section>
452    
453     <section title="XML elements">
454    
455     <p>
456     An XML element is a triple of x-values. The syntax for
457     the corresponding x-expression constructor is
458     <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
459     qualified name literal, it is possible to omit the leading
460     backquote and the surrounding parentheses. Similarly,
461     when <code>e2</code> is an x-record literal, it is possible
462     to omit the curly braces and the parentheses. For instance,
463     one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
464     instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
465     </p>
466    
467     <p>
468     XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
469     and the same simplifications applies. For instance, if
470     the namespace prefix <code>ns</code> has been defined,
471     the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
472     it describes XML elements whose tag is in the namespace bound to
473     <code>ns</code>, with an empty content, and with an arbitrary set of
474     attributes. An underscore in place of <code>(t1)</code> is
475     equivalent to <code>(Atom)</code> (any tag).
476     </p>
477    
478     </section>
479    
480     </box>
481    
482     <box title="X-expressions" link="expr">
483    
484     <p>
485     In the previous section, we have seen the syntax for x-values
486     constructors (constant literals, sequence, record, element constructors).
487     In this section, we describe the other kinds of x-expressions.
488     </p>
489    
490     <section title="Binary infix operators">
491    
492     <p>
493     The arithmetic operators on integers follow the usual precedence.
494     They are written <code>+,*,-,div,mod</code> (they are all infix).
495     </p>
496    
497     <p>
498     Record concatenation: <code>e1 ++ e2</code>. The x-expressions
499     <code>e1</code> and <code>e2</code> must evaluate to x-records.
500     The result is obtained by concatening them. If a field with the same
501     label is present in both records, the right-most one is selected.
502     </p>
503    
504     <p>
505     Sequence concatenation: <code>e1 @ e2</code>, equivalent
506     to <code>[!e1 !e2]</code>.
507     </p>
508    
509     </section>
510    
511     <section title="Projections, filtering">
512    
513     <p>
514     If the x-expression <code>e</code> evaluates to a record or an XML
515     element, the construction <code>e.l</code> will extract the value of
516     field or attribute <code>l</code>. Similarly, the construction
517     <code>e.?l</code> will extract the value of field or attribute
518     <code>l</code> if present, and return the empty sequence
519     <code>[]</code> otherwise.
520     </p>
521    
522     <p>
523     If the x-expression <code>e</code> evaluates to a record,
524     the construction <code>e -. l</code> will produce a new record
525     where the field <code>l</code> has been removed (if present).
526     </p>
527    
528     <p>
529     If the x-expression <code>e</code> evaluates to an x-sequence,
530     the construction <code>e/</code> will result in a new x-sequence
531     obtained by taking in order all the children of the XML elements
532     from the sequence <code>e</code>. For instance, the x-expression
533     <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
534     evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
535     </p>
536    
537     <p>
538     If the x-expression <code>e</code> evaluates to an x-sequence,
539     the construction <code>e.(t)</code> (where <code>t</code> is an
540     x-type) will result in a new x-sequence
541     obtained by filtering <code>e</code> to keep only the elements
542     of type <code>t</code>. For instance, the x-expression
543     <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
544     evaluates to the x-value <code>[ 4 5 ]</code>.
545     </p>
546     </section>
547    
548     <section title="Dynamic type checking">
549    
550     <p>
551     If <code>e</code> is an x-expression and <code>t</code> is an x-type,
552     the construction <code>(e :? t)</code> returns the same
553     result as <code>e</code> if it has type <code>t</code>, and otherwise
554     raises a <code>Failure</code> exception whose argument explains
555     why this is not the case.
556     </p>
557    
558     <sample><![CDATA[{{ON}}
559     # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
560     f {{ <a>[ 1 2 '3' ] }};;
561     Exception:
562     Failure
563     "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
564     ]]></sample>
565     </section>
566    
567     <section title="Pattern matching">
568    
569     <p>
570     OCamlDuce comes with a powerful pattern matching operation.
571     X-patterns are described <a href="#patterns">below</a>.
572     The syntax for the pattern matching operation is:
573     <code>match e with p1 -> e1 | ... | pn -> en</code>.
574     The type-system ensures exhaustivivity for the pattern matching
575     and infers precise types for the capture variables.
576     It is also possile to use x-pattern matching as a regular
577     OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
578     match e with {{p1}} -> e1 | ... | {{pn}} -> en
579     function {{p1}} -> e1 | ... | {{pn}} -> en
580     </p>
581    
582 abate 1792 <p>
583     Pattern matching follows is first-match policy. The first pattern
584     that succeeds triggers the corresponding branch.
585     </p>
586    
587 abate 1787 <note>
588     currently it is impossible to mix normal OCaml patterns and x-patterns
589     in a single pattern matching.
590     </note>
591    
592     </section>
593    
594     <section title="Local binding">
595    
596     <p>
597     The x-expression <code>let p=e1 in e2</code> is equivalent to
598     <code>match e1 with p -> e2</code>. There is also an local binding
599     with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
600     e2</code>.
601     </p>
602    
603     </section>
604    
605    
606     <section title="Iterators">
607    
608     <p>
609     OCamlDuce comes with a sequence iterator
610     <code>map e with p1 -> e1 | ... | pn -> en</code> and
611     a tree iterator
612     <code>map* e with p1 -> e1 | ... | pn -> en</code>.
613     </p>
614    
615     <p>
616     For both constructions, the argument must evaluate to a sequence.
617     The <code>map</code> iterator applies the patterns to each element
618     of this sequence in turns and produces a new sequence by concatenating
619     all the results (all the right-hand sides must thus produce a
620     sequence). The set of patterns must be exhaustive for all the possible
621     elements of the input sequence.
622     </p>
623    
624     <p>
625     The tree iterator is similar except that the patterns need not be
626     exhaustive. If some element of the input sequence is not matched,
627     it is simply copied into the result unless it is an XML element. In
628     this case, the transformation is applied recursively to its content.
629     </p>
630    
631     </section>
632    
633     <section title="OCaml constructions">
634    
635     <p>
636     As a convenience, some of the OCaml expression constructors
637     are allowed as x-expressions (without a need to go back to OCaml
638     with double curly braces): (unqualified) value identifiers and
639     function calls.
640     </p>
641    
642     </section>
643    
644     </box>
645    
646     <box title="More on x-types" link="types">
647    
648     <p>
649     We have seen how to write simple x-types. We can then combine
650     them with Boolean connectives:
651     </p>
652    
653     <ul>
654     <li><code>t1 &amp; t2</code>: intersection;</li>
655     <li><code>t1 | t2</code>: union;</li>
656     <li><code>t1 - t2</code>: difference.</li>
657     </ul>
658    
659     <p>
660     The empty x-type is written <code>Empty</code> (it contains no value),
661     and the universal x-type is written <code>Any</code> (it contains
662     all the x-values) or <code>_</code>.
663     </p>
664    
665     <p>
666     When an x-type has been bound to some OCaml identifier
667     (<code>{{ON}}type t = {{...}}</code>), it is possible to use
668     this identifier in another x-type. Recursive definitions
669     are allowed:
670     </p>
671    
672     <sample><![CDATA[{{ON}}
673     type t1 = {{ <a>[ t2* ] }}
674     and t2 = {{ <b>[ t1* ] }}
675     ]]></sample>
676    
677     <p>
678     Note that x-values are always finite and acyclic. The type checker
679     detects type definition which would yield empty types:
680     </p>
681    
682     <sample><![CDATA[{{ON}}
683     # type t = {{ <a>[ t+ ] }};;
684     This definition yields an empty type
685     ]]></sample>
686    
687     <p>
688     If <code>t1</code> and <code>t2</code> are record x-types,
689     we can combine them with the infix <code>++</code> operator, which
690     mimics the corresponding operator on expressions (record
691     concatenation). Similarly, we can use the infix <code>@</code>
692     concatenation operator on sequence x-types.
693     </p>
694    
695     </box>
696    
697     <box title="X-patterns" link="patterns">
698    
699     <p>
700     X-patterns follow the same syntax as X-types. In particular,
701     any X-type is a valid X-pattern. In addition to X-types constructors,
702     X-patterns can have:
703     </p>
704    
705     <ul>
706     <li>capture variables (lowercase OCaml identifiers);</li>
707     <li>constant bindings <code>(x := c)</code> where x is a capture
708     variable and c is
709     a literal x-constant (this pattern always succeeds and returns the
710     binding x->c).</li>
711     </ul>
712    
713     <p>
714 abate 1792 Here is a brief description of the semantics of patterns. Given
715     an input value, a pattern can either succeed or fail. If it succeeds,
716     it also produces a bindings from the capture variables in the pattern
717     to x-values.
718     </p>
719 abate 1787
720 abate 1792 <ul>
721    
722     <li>A pattern which is just a type (no capture variable) succeeds if
723     and only if the value has the type.</li>
724    
725     <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
726     or <code>p2</code> succeed, and returns the corresponding binding; if
727     both patterns succeeds, <code>p1</code> wins. It is required that
728     <code>p1</code> and <code>p2</code> have the same sets of capture
729     variables. </li>
730    
731     <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
732     and <code>p2</code> succeed, and returns the concatenation of the two
733     bindings. It is required that <code>p1</code> and <code>p2</code> have
734     <em>disjoint</em> sets of capture variables. </li>
735    
736     </ul>
737    
738     <p>
739     In record x-patterns, it is possible to omit the <code>=p</code> part
740     of a field. The content is then replaced with the label name
741 abate 1806 considered as a capture variable (or as a previously defined type).
742     E.g. <code>{ x y=p }</code> is
743 abate 1792 equivalent to <code>{ x=x y=p }</code>.</p>
744    
745 abate 1787 <p>It is also possible to add an "else" clause:
746     <code>{ x = (a,_)|(a:=3) }</code>
747     will accept any record with atmost the field <code>x</code>. If the content
748     is a pair, the capture variable a will be bound to its component;
749     otherwise, it is set to <code>3</code>.</p>
750    
751     <p>
752     In regular expressions, it is possible to extract whole subsequences
753     with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
754     </p>
755    
756     <p>
757     If the same sequence capture variable appears several times (or below a
758     repetition) in a regexp, it is bound to the concatenation of all
759     matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
760     collect in <code>x</code> all the elements of type <code>Int</code> from
761 abate 1792 a sequence. It is not legal to have repeated simple capture variables.
762     </p>
763 abate 1787
764     <p>
765 abate 1788 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
766     as possible). They admit non-greedy variants <code>+?,*?,??</code>.
767 abate 1787 </p>
768     </box>
769    
770     <box title="Namespace bindings" link="ns">
771    
772     <p>
773     The binding of namespace prefixes to URIs
774     can be done either by toplevel phrases (structure items) or
775     by local declarations:
776     </p>
777    
778     <sample>{{ON}}
779     # {{ namespace ns = "http://..." }};;
780     # let x = {{ `ns: x }};;
781     val x : {{`ns:x}} = {{`ns:x}}
782     # let x = {{ let namespace ns = "http://..." in `ns:x }};;
783     val x : {{`ns:x}} = {{`ns:x}}
784     </sample>
785    
786     <p>The toplevel definitions can also appear in module interfaces
787     (signatures). A toplevel prefix binding is not exported by a module: its scope
788     is limited to the current structure or signature. It is possible
789     to specify a default namespace, and to reset it:
790     </p>
791    
792     <sample>{{ON}}
793     # {{ namespace "http://..." }};;
794     # {{ `x }};;
795     - : {{`ns1:x}} = {{`ns1:x}}
796     # {{ namespace "" }};;
797     # {{ `x }};;
798     - : {{`x}} = {{`x}}
799     </sample>
800    
801     <p>
802     Note that the value pretty-printer invented some prefix
803     for the namespace URI. The default prefix declaration also have a
804     local form <code> let namespace "..." in ... </code>.
805     </p>
806    
807     </box>
808    
809 abate 1788 <box title="More on type-checking" link="typecheck">
810 abate 1787
811 abate 1788 <section title="Type inference">
812    
813 abate 1787 <p>
814 abate 1788 As we said above, the programmer is sometimes required to provide type
815     annotations. To know where to put these annotation, it is necessary to
816     get a basic understanding of how type-checking works.
817 abate 1787 </p>
818    
819 abate 1788 <p>
820     The OCaml type-checker is run first to detect which sub-expressions
821     are of the x-kind. A second ML type-checking pass is then done to
822     introduce subsumption (implicit subtyping) steps where allowed. After
823     these two passes, the OCamlDuce type checker obtains a data-flow summary of
824     x-values in the whole compilation unit. This is a directed graph,
825     whose edges represent either simple data-flow or complex operation
826     on x-values. The nodes of the graph can be thought as x-type
827     variables. A data-flow edge corresponds to a subtyping constraints,
828     and an operation edge corresponds to a symbolic constraints which
829     mimics the corresponding operation on values.
830     </p>
831    
832     <p>
833     Some of the nodes are given an explicit type by the programmer,
834     through type annotations (on expressions or function arguments)
835     or the other usual mechanism in ML (data type declarations,
836     signatures, ...).
837     </p>
838    
839     <p>
840     Also, if there is a loop with only subtyping edges in the graph,
841     all the nodes on the loop are merged together.
842     </p>
843    
844     <p>
845     After this operation, the graph is required to be acyclic (assuming
846     that the nodes with an explicit type are removed from the graph). It
847     is the responsibility of the programmer to provide enough type
848     annotation to achieve this property. Otherwise, a type error
849     is issued.
850     </p>
851    
852     <sample><![CDATA[{{ON}}
853     # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
854     Cycle detected: cannot type-check
855     # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
856     val f : int -> {{String}} = <fun>]]>
857     </sample>
858    
859     <p>
860     In the example above, there is a cycle between the result type for
861     <code>f</code> and the type for the sub-expression <code>{{ON}}f
862     {{n-1}}</code>. It is here broken with a type annotation on the result; it could
863     have been broken by a type annotation on the expression <code>{{ON}}f
864     {{n-1}}</code>, or on the function <code>f</code> itself, or by a
865     module signature.
866     </p>
867    
868     <p>
869     Let us study another simple example:
870     </p>
871    
872     <sample>{{ON}}
873     # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
874     - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
875     </sample>
876    
877     <p>
878     The type-checkers detects that the two x-values <code>2</code> and
879     <code>3</code> can flow to the argument of <code>f</code>. Its body
880     is thus type-checked with the assumption that <code>x</code> has type
881     <code>2--3</code>. The computed result type is then <code>3--4</code>.
882     </p>
883    
884    
885     <p>
886     The type-inference process described above is global by nature. The
887     acyclicity condition is only imposed after a whole compilation unit
888     has been type-checked by OCaml (and the information from the module
889     interface as been integrated). When a type variable is inferred to
890     be of the x-kind, it is never generalized. As a consequence, there
891     is no parametric polymorphism on x-types.
892     </p>
893    
894     <p>
895     In the toplevel, type-checking is done after each phrase. Consider
896     the following session:
897     </p>
898    
899     <sample><![CDATA[{{ON}}
900     # let f x = {{ x + 1 }};;
901     val f : {{Empty}} -> {{Empty}} = <fun>
902     # let a = f {{ 2 }};;
903     Subtyping failed 2 <= Empty
904     Sample:
905     2
906     ]]></sample>
907    
908     <p>
909     The function <code>f</code> is inferred to have type
910     <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
911     phrase is type-checked, the data-flow graph says that no value
912     can flow to <code>x</code>, and thus the input type is empty
913     (and similarly for the result type). If the two phrases
914     were type-checked together (which would be the case it they had
915     been compiled by the compiler, not in the toplevel), the type checker
916     would have correctly inferred that the input type for <code>f</code>
917     must contain <code>2</code>.
918     </p>
919    
920     </section>
921    
922     <section title="Implicit subtyping">
923    
924     <p>
925     Coercion from an x-type to a super type is automatic in OCamlDuce.
926     However, this automatic subsumption does not carry over to OCaml
927     type constructor, even if there are covariant. Consider:
928     </p>
929    
930     <sample><![CDATA[{{ON}}
931     # let f (x : {{ Int }} * {{ Int }}) = 1;;
932     val f : {{Int}} * {{Int}} -> int = <fun>
933     # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
934     This expression has type {{0}} * {{0}} but is here used with type
935     {{Int}} * {{Int}}
936     # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
937     val g : {{0}} * {{0}} -> int = <fun>
938     # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
939     val g : {{0}} * {{0}} -> int = <fun>
940     ]]></sample>
941    
942     <p>
943     The first attempt to define <code>g</code> fails because the type for
944     <code>x</code> is not an x-type and thus subsumption does not
945     apply. In the second attempt, we extract the two components of the
946     pair; since they are inferred to be x-values, subtyping applies to
947     both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
948     it is legal to unify its type with the input type of <code>f</code>.
949     The third definition for <code>g</code> gives an alternative solution:
950     using explicit OCaml type coercions.
951     </p>
952    
953     </section>
954    
955 abate 1787 </box>
956    
957 abate 1788 <box title="Exchanging values" link="transl">
958    
959     <p>
960     OCamlDuce strongly seperates regular OCaml values from the new
961     x-values. They have different syntax, expressions, types, patterns,
962     and even type-checking algorithms. This strong segregation is key point
963     which allowed a simple integration between very different type
964     systems.
965     </p>
966    
967     <p>
968     At some point, it is still necessary to cross the frontier and
969     translate OCaml values to x-values or the opposite.
970     </p>
971    
972     <p>
973     Fortunately, OCamlDuce provides automatic translations in both
974     directions. Instead of double curly braces, you can
975     enclose x-expressions in curly brace+colon <code>{: ... :}</code>
976     (here, the <code>...</code> is an x-expression).
977     The effect is to translate the result of the x-expression
978     (which must be an x-value) to an OCaml value. Similarly,
979     in an x-expression, you can obtain the x-translation of
980     an OCaml value with the same syntax <code>{: ... :}</code>
981     (here, the <code>...</code> is an OCaml expression).
982     </p>
983    
984     <p>
985     Here is how the translation works. To each OCaml type <code>t</code>,
986     we associate an x-type <code>T(t)</code> and a pair of translation
987     function between <code>t</code> and <code>T(t)</code>.
988     Actually, not all the features are supported. For instance,
989     free type variables, abstract types, object types, non-regular
990     recursive types cannot be translated. In particular, since
991     type variables are not allowed, the OCaml type must be fully known.
992     </p>
993    
994     <p>
995 abate 1789 The translation for an OCaml type <code>t</code> is defined by structural
996     induction on <code>t</code>. Sum types are
997 abate 1788 translated to union types: a constant constructor <code>A</code> is
998     translated to the qualified name <code>`A</code>; a non-constant
999     constructor <code>A of t1 * ... * tn</code> is translated to
1000     <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
1001     have the same translation. Record types are translated to closed
1002     record x-types. Some other translations:
1003     </p>
1004    
1005     <table border="1">
1006     <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
1007     <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
1008     <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
1009     <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
1010     <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
1011     <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
1012     <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
1013     <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
1014     <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
1015     <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
1016     </table>
1017    
1018     <p>
1019     Here is an example:
1020     </p>
1021    
1022     <sample>{{ON}}
1023     # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
1024     - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
1025     </sample>
1026    
1027     <p>
1028     In this example, the result type of the translation is inferred
1029     to be <code>{{ON}}{{ Int }} list</code> (because the type for
1030     <code>f</code> is given). The corresponding x-type
1031     is <code>{{ON}}{{ [Int*] }}</code>.
1032     </p>
1033    
1034     </box>
1035    
1036 abate 1789 <box title="The standard library" link="stdlib">
1037    
1038     <p>
1039     In OCamlDuce, the Num library from OCaml is included in the standard
1040     library. In addition, there are two new module called
1041     <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
1042     </p>
1043    
1044     <p>
1045     The module <code>Cduce_types</code> gives access to the internal
1046     representation of x-values. It is currently undocumented.
1047     </p>
1048    
1049     <p>
1050     The module <code>Ocamlduce</code> provides several useful
1051     functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
1052     documentation for a description of its interface.
1053     </p>
1054    
1055     </box>
1056    
1057 abate 1792 <box title="Marshaling" link="marshal">
1058    
1059     <p>
1060     OCamlDuce use some tricks on its internal representation of x-values
1061     to reduce memory usage and improve performance. You need to pay
1062 abate 1793 special attention if you want to use OCaml serialization functions
1063 abate 1792 (module <code>Marshal</code>, functions
1064     <code>input_value/output_value</code>) on x-values. In addition to
1065     your values, you also need to save and restore some piece of internal data
1066     using the functions <code>Cduce_types.Value.extract_all</code> and
1067     <code>Cduce_types.Value.intract_all</code>. Of course, this also
1068     applies if the value to be serialized contains deeply nested x-values.
1069     </p>
1070    
1071     <p>
1072     Here are generic
1073     serialization/deserializations functions that illustrate how to do it:
1074     </p>
1075    
1076     <sample>
1077     let my_output_value oc v =
1078     let p = Cduce_types.Value.extract_all () in
1079     output_value oc (p,v)
1080    
1081     let my_input_value ic =
1082     let (p,v) = input_value ic in
1083     Cduce_types.Value.intract_all p;
1084     v
1085     </sample>
1086    
1087     </box>
1088    
1089     <box title="Performance" link="perf">
1090    
1091     <section title="Strings">
1092    
1093     <p>
1094     OCaml users might be surprised by the fact that x-strings are simply
1095     represented as sequences in OCamlDuce. Does this mean that they are
1096     actually stored in memory as linked list? Certainly not! The internal
1097     representation of sequence values uses several tricks to improve
1098     performance and memory usage. In particular, a special form in the
1099     representation can store strings as byte buffers, as in OCaml.
1100     It an XML document is loaded, or if a Caml string is converted
1101     to an x-value, this compact representation will be used.
1102     </p>
1103    
1104     </section>
1105    
1106     <section title="Concatenation">
1107    
1108     <p>
1109     Similarly, OCaml users might be relectutant to use the sequence
1110     concatenation <code>@</code> on sequences. In OCaml, the complexity
1111     of this operator is linear in the size of its first argument (which
1112     need to be copied). OCamlDuce use a special form in its internal
1113     representation to store concatenation in a lazy way. The concatenation
1114     will really by computed only when the value is accessed. This means
1115     that it's perfectly ok to build a long sequence by adding
1116     new elements at the end one by one, as long as you don't
1117     simultaneously inspect the sequence.
1118     </p>
1119    
1120     </section>
1121    
1122     <section title="Pattern matching">
1123    
1124     <p>
1125     Another point which is worth knowing when programming in OCamlDuce
1126     is that patterns can be written in a declarative style without
1127     affective performance. The compiler uses static type information
1128     about matched values to produce efficient code for pattern matching.
1129     To illustrate this, consider the following sample:
1130     </p>
1131    
1132     <sample><![CDATA[{{ON}}
1133     x.ml:
1134    
1135     type a = {{ <a>[ a* ] }}
1136     type b = {{ <b>[ b* ] }}
1137    
1138     let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1139     ]]></sample>
1140    
1141     <sample><![CDATA[{{ON}}
1142     y.ml:
1143    
1144     type a = {{ <a>[ a* ] }}
1145     type b = {{ <b>[ b* ] }}
1146    
1147     let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1148     ]]></sample>
1149    
1150     <p>
1151     The two functions have exactly the same semantics, but the first
1152     implementation is more declarative: it uses type checks to distinguish
1153     between <code>a</code> and <code>b</code> instead of saying
1154     <em>how</em> to distinguish between these two types. Imagine
1155     that the definition of these types change to:
1156     </p>
1157    
1158     <sample><![CDATA[{{ON}}
1159     type a = {{ <x kind="a">[ a* ] }}
1160     type b = {{ <x kind="b">[ b* ] }}
1161     ]]></sample>
1162    
1163     <p>
1164     Then the first implementation still works as expected, but the
1165     second one needs to be rewritten.</p>
1166    
1167     <p>Now one might believe that the second implementation is more
1168     efficient because it tells the compiler to check only the root tag,
1169     whereas the first implementation would force
1170     the compiler to produce code to check that all tags in the tree
1171     are <code>a</code>s. But this is not what happens! Actually,
1172     you can check that the compiler will produce exactly the same code
1173     for both implementations. It considers the static type information
1174     about the argument of the pattern matching (here, the input type
1175     of the function), and computes an efficient way to evaluate
1176     patterns for the values of this type.
1177     </p>
1178    
1179     </section>
1180    
1181     <section title="The map iterator">
1182    
1183     <p>
1184     The <code>map ... with ...</code> iterator is implemented in a
1185     tail-recursive way. You can safely use it on very long sequences.
1186     </p>
1187    
1188     </section>
1189    
1190     </box>
1191    
1192 abate 1799 <box title="OCaml and OCamlDuce" link="ocaml">
1193    
1194     <p>
1195     Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1196     OCaml release. This means that OCamlDuce can use OCaml-generated
1197     <tt>.cmi</tt> files and that it produces an OCaml-compatible
1198     <tt>.cmi</tt> file if the interface does not use any x-type
1199     (this file is equal to what would have been obtained by using OCaml).
1200     </p>
1201    
1202     <p>
1203     It is thus possible to use existing libraries which were compiled for
1204 abate 1821 OCaml. It is also possible to use OCamlDuce to compile
1205 abate 1799 some modules and use them in an OCaml project provided their interface
1206     is pure OCaml.
1207     </p>
1208    
1209     </box>
1210    
1211 abate 1832 </page>
1212    
1213     <page name="ocaml_code">
1214     <title>OCamlDuce: code samples and applications</title>
1215    
1216 abate 1789 <box title="Code samples" link="code">
1217    
1218     <section title="Parsing XML files">
1219    
1220     <p>
1221     OCamlDuce does not come with any built-in XML parser. However,
1222     the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
1223     makes it easy to plug existing XML parsers. Here is some
1224     code which demonstrate how to do that with three of
1225     the most popular OCaml XML parser libraries:
1226     </p>
1227    
1228     <ul>
1229     <li><a
1230     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
1231     <li><a
1232     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
1233     <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
1234     </ul>
1235    
1236     </section>
1237    
1238     <section title="Converting DTD to OCamlDuce types">
1239    
1240     <p>
1241     This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
1242     from a DTD. It requires PXP.
1243     </p>
1244    
1245     <note>This application does not use any of the new features, but it
1246     can be useful in the development of OCamlDuce applications.
1247     </note>
1248    
1249     </section>
1250    
1251     <section title="Parsing XML Schema, producing valid XHTML output">
1252    
1253     <p>
1254     This <a
1255     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
1256     parses XML Schema Definitions (.xsd files), and produces summaries
1257     (toplevel declaration names) in XHTML. OCamlDuce type system ensures
1258     that the parser is coherent with the input XML type (any valid XML
1259     Schema is accepted) and that the printer is coherent with the output
1260     XML type (it is necessarily a valid XHTML document).
1261     </p>
1262    
1263     <p>
1264     Of course, for such a simple transformation, parsing the XML document
1265     into an internal representation is not necessary. A direct XML-to-XML
1266     transformation would be easy to write. We wanted to illustrate
1267     a complex parsing of XML.
1268     </p>
1269    
1270     <p>
1271     It it interesting to introduce errors in the parser
1272     <code>schema_loader.ml</code> or the printer
1273 abate 1792 <code>dump_schema.ml</code> and see how the type system catches them.
1274 abate 1789 </p>
1275    
1276     <note>
1277     The application uses XML Light to parse XML document.
1278     </note>
1279    
1280     <note>
1281     Some features of XML Schema are not parsed, such as
1282     <code>redefine</code> elements or substitution groups.
1283     </note>
1284    
1285 abate 1811 <note>
1286     To compile the application with the provided Makefile,
1287     you must make the environment variable <code>OCAMLFIND_CONF</code>
1288     point to the <code>$GODI/etc/findlib-ocamlduce.conf</code> file.
1289     </note>
1290    
1291 abate 1789 </section>
1292    
1293 abate 1802 <section title="String regular expressions">
1294    
1295     <p>
1296     OCamlDuce supports regular expression types and patterns, not only
1297     for sequences of XML elements, but also for strings. The following
1298     example shows how to use regular expressions to split a string
1299     of the form <code>name1=val1,...,namen=valn</code> with
1300     <code>n>0</code> into
1301     a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>.
1302     The <code>*?</code> operator in regular expressions means ``ungreedy
1303     match'' (match the shortest possible subsequence). The last
1304     pattern describes precisely strings which are not matched by
1305     the other cases. It would be possible to replace it with
1306     the wildcard <code>_</code>.
1307     </p>
1308    
1309     <sample><![CDATA[{{ON}}
1310     let rec split (s : {{ String }}) =
1311     match s with
1312     | {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest)
1313     | {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ]
1314     | {{ Any - [ _* '=' _* ] }} -> failwith "split"
1315     ]]></sample>
1316    
1317     </section>
1318    
1319 abate 1789 </box>
1320    
1321 abate 1808 <box title="Applications in OCamlDuce" link="appli">
1322    
1323     <ul>
1324     <li><a
1325     href="http://anil.recoil.org/projects/review2atom.html">Review2Atom</a>
1326     by Anil Madhavapeddy: translates paper review files in XML format into
1327     an Atom feed suitable for aggregation.
1328     </li>
1329     </ul>
1330    
1331     </box>
1332    
1333 abate 1832 </page>
1334 abate 1808
1335 abate 1634 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5