/[svn]/web/ocaml.xml
ViewVC logotype

Contents of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1937 - (hide annotations)
Tue Jul 10 19:32:54 2007 UTC (5 years, 10 months ago) by abate
File MIME type: text/xml
File size: 43986 byte(s)
[r2007-01-23 08:17:03 by afrisch] Empty log message

Original author: afrisch
Date: 2007-01-23 08:17:03+00:00
1 abate 1634 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
2     <page name="ocaml">
3    
4 abate 1787 <title>OCamlDuce</title>
5 abate 1634
6     <left>
7     <local-links href="index,documentation"/>
8 abate 1787 <p>On this page:</p>
9     <boxes-toc/>
10 abate 1634 </left>
11    
12     <box>
13    
14     <p>
15 abate 1787 OCamlDuce is a merger between <a
16     href="http://caml.inria.fr/">OCaml</a> and
17     <local href="index">CDuce</local>. It comes as a modified
18 abate 1832 version of OCaml which integrates CDuce features: XML expressions,
19     regular expression types and patterns, iterators.
20 abate 1634 </p>
21    
22 abate 1790 <p>
23 abate 1832 OCamlDuce is distributed under the Q Public License version 1.0.
24 abate 1790 </p>
25    
26 abate 1832 <ul>
27     <li>A <a
28 abate 1895 href="papers/ocamlduce_icfp.pdf">technical
29 abate 1832 report</a> describes the theory behind OCamlDuce's type system (to be
30 abate 1895 presented in ICFP 2006).</li>
31 abate 1832 <li><local href="ocaml_install">How to get OCamlDuce:</local> download,
32     installation instructions, packages.</li>
33     <li><local href="ocaml_manual">User's manual</local>.</li>
34     <li><local href="ocaml_code">Code samples and
35     applications</local>.</li>
36 abate 1925 <li><local href="contacts">Mailing lists</local>.</li>
37 abate 1832 </ul>
38 abate 1815
39 abate 1787 </box>
40    
41 abate 1832 <page name="ocaml_install">
42     <title>Getting OCamlDuce</title>
43    
44 abate 1787 <box title="Download and installation" link="install">
45    
46 abate 1634 <p>
47 abate 1800 Currently, OCamlDuce
48 abate 1937 is based on OCaml 3.09.3 and CDuce 0.4.0.
49 abate 1634 </p>
50    
51     <ul>
52     <li><a
53 abate 1918 href="http://gallium.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl5.tar.gz">Compiler,
54 abate 1821 version 3.08.4, patch level 5</a> (to be used with OCaml 3.08.4)</li>
55     <li><a
56 abate 1918 href="http://gallium.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.1pl1.tar.gz">Compiler,
57     version 3.09.1, patch level 1</a> (to be used with OCaml 3.09.1)</li>
58 abate 1876 <li><a
59 abate 1918 href="http://gallium.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.2pl2.tar.gz">Compiler,
60     version 3.09.2, patch level 2</a> (to be used with OCaml 3.09.2)</li>
61 abate 1937 <li><a
62     href="http://gallium.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.09.3.tar.gz">Compiler
63     version 3.09.3</a> (to be used with OCaml 3.09.3)</li>
64 abate 1634 </ul>
65    
66     <p>
67 abate 1876 The following describes the installation procedure for the
68     3.09.2 release.
69     OCamlDuce is installed on top of an existing OCaml
70     installation (whose version number must match) and it requires
71     a recent version of findlib. The build procedure
72 abate 1821 is: <tt>make all &amp;&amp; make opt &amp;&amp; make
73     install</tt>. The configuration is taken from OCaml's
74     <tt>Makefile.config</tt>.
75 abate 1800 </p>
76    
77 abate 1821 <p>
78     The tools are named <tt>ocamlduce, ocamlducec, ocamlduceopt,
79 abate 1876 ocamlducedep, ocamlducemktop, ocamlducemktop, ocamlducefind</tt>.
80     They are installed in the same directory as the ocaml compiler itself.
81 abate 1821 </p>
82 abate 1800
83 abate 1821 <p>
84 abate 1876 In addition, a library called <tt>ocamlduce.cma/.cmxa</tt> is built.
85     It depends on the <tt>nums</tt> library. A findlib package named
86     <tt>ocamlduce</tt> is created by the <tt>make install</tt> target.
87     Normally, you don't need to care about the package except if you
88     insist to link your modules with the regular OCaml compilers (not
89     OCamlDuce), but there is no good reason to do so.
90 abate 1821 </p>
91 abate 1800
92 abate 1840 <p>
93     To generate the ocamldoc documentation for the <tt>Ocamlduce</tt>
94     module: <tt>make htdoc</tt>.
95     </p>
96    
97 abate 1876 <section title="Compiling, linking, calling the toplevel">
98    
99     <p>Starting from OCamlDuce 3.09.2, you don't need to struggle with
100     extra command-line options. You must simply use the OCamlDuce tools:</p>
101    
102     <sample>
103     {{Call the toplevel:}} ocamlduce
104     {{Compile:}} ocamlducec -c x.ml
105     {{Link:}} ocamlducec -o x x.cmo
106     {{Use ocamlfind:}} ocamlducefind ocamlc -o -linkpkg -package pcre x.ml
107     </sample>
108    
109     </section>
110    
111    
112 abate 1823 <section title="Building from the CVS">
113    
114     <p>
115     The following commands will extract the current development version of
116     OCamlDuce (from OCaml and CDuce CVS repositories):
117     </p>
118    
119     <sample>
120     cvs -f -d ":pserver:anoncvs@camlcvs.inria.fr:/caml" co -r cducetrunk ocaml
121     cvs -f -d ":pserver:anonymous@cvs.cduce.org:/cvsroot" co cduce
122     (cd ocaml/cduce; make link)
123     </sample>
124    
125     </section>
126    
127 abate 1808 </box>
128    
129     <box title="Ports and packages" link="ports">
130    
131     <section title="GODI">
132 abate 1821
133 abate 1800 <p>
134 abate 1821 There is a <tt>godi-ocamlduce</tt> package available in GODI
135 abate 1877 (sections 3.08 and 3.09).
136 abate 1634 </p>
137 abate 1821
138 abate 1808 </section>
139 abate 1634
140 abate 1808 <section title="DarwinPorts and OpenBSD">
141    
142     <p>
143     Anil Madhavapeddy contributed two ports of OCamlDuce for DarwinPorts
144     (in dports/lang/ocamlduce) and for OpenBSD (in ports/lang/ocamlduce).
145     </p>
146    
147     </section>
148    
149 abate 1634 </box>
150    
151 abate 1832 </page>
152    
153     <page name="ocaml_manual">
154     <title>OCamlDuce: manual</title>
155    
156 abate 1787 <box title="Overview" link="overview">
157    
158     <p>
159 abate 1791 The goal of the OCamlDuce project is to extend the OCaml language with features
160     to make it easier to write safe and efficient complex applications
161     that need to deal with XML documents. In particular, it relies
162     on a notion of types and patterns to guarantee statically
163     that all the possible input documents are correctly processed, and
164     that only valid output documents are produced.
165     </p>
166    
167     <p>
168 abate 1788 In a nutshell, OCamlDuce extends OCaml with a new kind of values
169     (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
170     strings. In order to describe these values, it also extends the type algebra
171 abate 1787 with so-called <em>x-types</em>. The philosophy behind these types is that they
172     represent <em>set of x-values</em>. They can be very precise: indeed,
173     each value can be seen as a singleton type (a set with a single
174     value), and it is possible to form Boolean combinations of x-types
175     (intersection, union, difference).
176     </p>
177    
178     <p>
179     OCamlDuce's type system can be understood as a refinement of OCaml.
180     For each sub-expression which is inferred to be of the x-kind (using
181     OCaml unification based type-system), OCamlDuce will try to infer to
182     best possible sound x-type. Here, best means smallest for the natural
183     subtyping relation (set inclusion). The inference algorithm is
184     actually a data-flow analysis: the x-type will collect all the values
185     that can be produced by the expression, considering all the possible
186     data-flow in the program. It it sometimes necessary to provide
187     explicit type annotations to help the type checker infer this type, in
188     particular when you define recursive functions or when you use
189     iterators.
190     </p>
191    
192     <p>
193     Subtyping is implicit for x-types: if an expression is inferred to be
194     of x-type <code>t</code>, which is a subtype of <code>s</code>, then
195     it is possible to use this expression in any context which expects a
196     value of type <code>s</code>.
197     </p>
198    
199     </box>
200    
201     <box title="Getting started" link="start">
202    
203     <p>
204     Most of the new language features are enclosed within double curly braces
205     <code>{{ON}}{{...}}</code>. For instance, the following code sample
206     defines a value <code>x</code> as an XML element (with tag
207     <code>a</code>, an attribute <code>href</code>, and a simple
208     string as content):
209     </p>
210    
211     <sample><![CDATA[{{ON}}
212     # let x = {{ <a href="http://www.cduce.org">['CDuce'] }};;
213     val x : {{<a href=[ 'http://www.cduce.org' ]>[ 'CDuce' ]}} =
214     {{<a href="http://www.cduce.org">[ 'CDuce' ]}}
215     ]]></sample>
216    
217     <p>
218     What appears between the curly braces is called an x-expression.
219     Similarly, there are x-types (as seen above), and also x-patterns.
220     The delimiters <code>{{ON}}{{...}}</code> are only used
221     for syntactical reasons, to avoid clashed between OCaml and CDuce
222     syntaxes and lexical conventions. As a matter of fact,
223     an OCaml expression need not be a syntactical x-expression
224     (delimited by double curly braces) to evaluate to an x-value.
225     For instance, once <code>x</code> has been declared as above,
226     the expression <code>x</code> evaluates to an x-value.
227     </p>
228    
229    
230     <p>
231     It is possible to use an arbitrary
232     OCaml expression as part of an x-expression: it must simply be
233     protected by a new pair of double curly braces. For instance, there is
234     no <code>if-then-else</code> construction for x-expressions, but you
235     can write:
236     </p>
237    
238     <sample><![CDATA[{{ON}}
239     # {{ <a href={{if true then {{"a"}} else {{"z"}}}}>[] }};;
240     - : {{<a href=[ 'a' | 'z' ]>[ ]}} = {{<a href="a">[ ]}}
241     ]]></sample>
242    
243     <p>
244     Only the highlighted parts are parsed as x-expressions. The
245     <code>if-then-else</code> sub-expression is parsed as an OCaml
246     expression, but its type is an x-type (namely <code>{{ON}}{{[ 'a' |
247     'z' ]}}</code>).
248     </p>
249    
250     </box>
251    
252     <box title="X-values" link="values">
253    
254     <p>
255     X-values are intended to represent XML documents and fragments
256     thereof: elements, tags, text, sequences. In this section, we
257     present the x-value algebra, the syntax of the corresponding
258     x-expression constructors and the associated x-types.
259     </p>
260    
261     <p>
262     There are three kinds of atomic kind of x-values:
263     </p>
264     <ul>
265     <li>Unicode characters;</li>
266     <li>qualified names;</li>
267     <li>arbitrarily large integers.</li>
268     </ul>
269    
270     <section title="Characters">
271    
272     <p>
273     X-characters are different from OCaml characters. They can represent
274     the range of Unicode codepoints defined in the XML specification.
275     Character literals are delimited by single quotes. The escape
276     sequences \n, \r, \t, \b, \', \&quot;, \\ are recognized as usual. The
277     numerical escape sequence are written <code>\n;</code> where n is an integer
278     literal (note the extra semi-colon). The source code is interpreted as
279     being encoded in iso-8859-1. As a consequence, Unicode characters which are not
280     part of the Latin1 character set must be introduced with this
281     numerical escape mechanism. The x-types for x-characters are:
282     </p>
283     <ul>
284     <li>singletons;</li>
285     <li>intervals, written <code>c -- d</code>, where <code>c</code> and
286     <code>d</code> are literals (example: <code>{{ON}}type t = {{ 'a'--'z'
287     }}</code>);</li>
288     <li>the type of all x-characters, written <code>Char</code>;</li>
289     <li>the type of all Latin1 characters, written <code>Latin1Char</code>
290     (defined as <code>\0; -- \255;</code>).</li>
291     </ul>
292    
293     </section>
294    
295     <section title="Integers">
296    
297     <p>
298     X-integers are arbitrarily large. Literals must be written in decimal.
299     Negative literals must be in parenthesis. E.g.: <code>(-3)</code>.
300     The x-types for x-integers are:
301     </p>
302     <ul>
303     <li>singletons;</li>
304     <li>intervals, written <code>i -- j</code>, where <code>i</code> and
305     <code>j</code> are literals (example: <code>{{ON}}type t = {{ 10--20
306     }}</code>); it is possible to replace <code>i</code> or <code>j</code>
307     with <code>**</code> to define open-ended intervals, e.g.
308     <code>{{ON}}type pos = {{ 1 -- ** }}</code>;
309     </li>
310     <li>the type of all x-integers, written <code>Int</code>;</li>
311     <li>the type of all the integers which can be represented by a
312     signed 32 (resp. 64) bit machine word, written <code>Int32</code> (resp.
313     <code>Int64</code>).</li>
314     </ul>
315    
316     </section>
317    
318     <section title="Qualified names">
319    
320     <p>
321     Qualified names are intended to represent XML tag names. Conceptually,
322     they are made of a namespace URI and a local name. Since URIs tends
323     to be long, literals are of the form <code>`prefix:local</code>
324     where <code>local</code> is the local name and <code>prefix</code>
325     is an <em>namespace prefix</em> bound to some URI (in the scope of the
326     literal). The local name follows the definitions from
327     the XML Namespaces specification; a dot character must be protected
328     by a backslash and non-Latin1 characters are written as character
329     literals <code>\n;</code>. <a href="#ns">See below</a> for a
330     explanation on how to bind prefixes to URIs. To refer
331     to the default namespace (or the absence of namespace if not default
332     has been defined), the syntax is simply <code>`local</code>.
333     The x-types for qualified names are:
334     </p>
335     <ul>
336     <li>singletons;</li>
337     <li>the type of all qualified names, written <code>Atom</code>;</li>
338     <li>the type of all qualified names from a specified namespace,
339     written <code>`ns:*</code>.</li>
340     </ul>
341     </section>
342    
343     <section title="Records">
344    
345     <p>
346     X-records are mainly used to represent the set of attributes of an XML
347     element. An x-record is a binding from a finite set of <em>labels</em>
348     to x-values. Labels follows the same syntax as for qualified names
349     without the leading backquote. However, if the namespace prefix is not
350     given, the default namespace does not apply (the namespace URI is
351     empty). The syntax for record x-expressions is <code> { l1=e1
352     ... ln=en }</code> where the <code>li</code> are labels and the
353     <code>ei</code> are x-expressions. Fields can also be separated with a
354     semi-colon. It is legal to omit the expression for a field; the label is then
355     taken as the content of the field (a value with this name must be
356     defined in the current scope), e.g.: <code>{{ON}}let x = ... and y = ...
357     in {{ {x y z=3} }}</code> is equivalent to <code>{{ON}}let x = ... and
358     y = ... in {{ {x=x y=y z=3} }}</code>. The types for x-records specify
359     which labels are authorized/mandatory, and what the types of the
360     corresponding fields are. There are two kind of record x-types:
361     </p>
362    
363     <ul>
364     <li>
365     Closed record types, which only allow a finite number of fields:
366     <code>{ l1=t1 ... ln=tn }</code>;
367     </li>
368     <li>
369     Open record types, which allow additional fields (with arbitrary
370     type):
371     <code>{ l1=t1 ... ln=tn .. }</code> (the final two colons are
372     in the syntax).
373     </li>
374     </ul>
375    
376     <p>
377     In both cases, it is possible to make one of
378     the fields optional by changing = to =?.
379     </p>
380    
381     <p>
382     The x-type of all x-record is thus <code>{ .. }</code>,
383     and the x-type of x-records with maybe a field <code>l</code>
384     of type <code>Int</code> and maybe arbitrary other fields is
385     <code>{ l=?Int .. }</code>.
386     </p>
387    
388     </section>
389    
390     <section title="Sequences">
391    
392     <p>
393     X-sequences are finite and ordered collections of x-values.
394     The syntax for a sequence x-expression in
395     <code>[ e1 ... en ]</code> (note that elements are <em>not</em> separated
396     by semi-colons as in OCaml list). Each item <code>ei</code>
397     can either be:
398     </p>
399     <ul>
400     <li>an x-expression;</li>
401     <li><code>!e</code> where <code>e</code> is an x-expression which
402     evaluates to a sequence (whose content is inserted in the sequence
403     which is currently defined); e.g.
404     <code>let x = [ 2 3 ] in [ 1 !x 4 ]</code> is equivalent to
405     <code>[ 1 2 3 4 ]</code>;</li>
406     <li>a string literal delimited by simple quotes; e.g.
407     <code>[ 'abc' ]</code> is equivalent to <code>[ 'a' 'b' 'c' ]</code>.</li>
408     </ul>
409    
410     <p>
411     X-types for sequences are of the form <code>[R]</code>
412     where <code>R</code> is a regular expression over x-types which
413     describe the possible contents of the sequences. The possible
414     forms of regular expressions are:
415     </p>
416    
417     <ul>
418     <li><code>t</code> (one single element of x-type <code>t</code>)</li>
419     <li><code>R*</code> (zero or more repetitions)</li>
420     <li><code>R+</code> (one or more repetitions)</li>
421     <li><code>R?</code> (zero or one repetition)</li>
422     <li><code>R1 R2</code> (sequence)</li>
423     <li><code>R1|R2</code> (alternation)</li>
424     <li><code>(R)</code></li>
425     <li><code>/t</code> (guard: the tail of the sequence must comply with
426     <code>t</code>).</li>
427     <li><code>PCDATA</code> (equivalent to Char*).</li>
428     </ul>
429    
430     <note>sequence are actually encoded with embedded pairs and a
431     terminator, and sequences types are encoded with product types and
432     recursive types. The encoding is available to the programmer
433     but not described in this manual.
434     </note>
435    
436     </section>
437    
438     <section title="Strings">
439    
440     <p>
441     Strings are nothing but sequences of characters. There are two
442     predefined types <code>String</code> and <code>Latin1</code>
443     (defined as <code>[ Char* ]</code> and <code>[ Latin1Char* ]</code>).
444     </p>
445    
446     <p>
447     A string literal <code>[ '...' ]</code> can also be written
448     <code>"..." </code> (without the square brackets). Note that simple
449     (resp. double) quotes need to be escaped only when the string is
450     delimited with double (resp. simple) quotes.
451     </p>
452    
453     </section>
454    
455     <section title="XML elements">
456    
457     <p>
458     An XML element is a triple of x-values. The syntax for
459     the corresponding x-expression constructor is
460     <code><![CDATA[<(e1) (e2)>e3]]></code>. When <code>e1</code> is a
461     qualified name literal, it is possible to omit the leading
462     backquote and the surrounding parentheses. Similarly,
463     when <code>e2</code> is an x-record literal, it is possible
464     to omit the curly braces and the parentheses. For instance,
465     one can simply write <code><![CDATA[<a href="abc">['def']]]></code>
466     instead of <code><![CDATA[<(`a) ({href="abc"})>['def']]]></code>.
467     </p>
468    
469     <p>
470     XML element x-type are written <code><![CDATA[<(t1) (t2)>t3]]></code>,
471     and the same simplifications applies. For instance, if
472     the namespace prefix <code>ns</code> has been defined,
473     the following is a legal x-type <code><![CDATA[<ns:* ..>[]]]></code>;
474     it describes XML elements whose tag is in the namespace bound to
475     <code>ns</code>, with an empty content, and with an arbitrary set of
476     attributes. An underscore in place of <code>(t1)</code> is
477     equivalent to <code>(Atom)</code> (any tag).
478     </p>
479    
480     </section>
481    
482     </box>
483    
484     <box title="X-expressions" link="expr">
485    
486     <p>
487     In the previous section, we have seen the syntax for x-values
488     constructors (constant literals, sequence, record, element constructors).
489     In this section, we describe the other kinds of x-expressions.
490     </p>
491    
492     <section title="Binary infix operators">
493    
494     <p>
495     The arithmetic operators on integers follow the usual precedence.
496     They are written <code>+,*,-,div,mod</code> (they are all infix).
497     </p>
498    
499     <p>
500     Record concatenation: <code>e1 ++ e2</code>. The x-expressions
501     <code>e1</code> and <code>e2</code> must evaluate to x-records.
502     The result is obtained by concatening them. If a field with the same
503     label is present in both records, the right-most one is selected.
504     </p>
505    
506     <p>
507     Sequence concatenation: <code>e1 @ e2</code>, equivalent
508     to <code>[!e1 !e2]</code>.
509     </p>
510    
511     </section>
512    
513     <section title="Projections, filtering">
514    
515     <p>
516     If the x-expression <code>e</code> evaluates to a record or an XML
517     element, the construction <code>e.l</code> will extract the value of
518     field or attribute <code>l</code>. Similarly, the construction
519     <code>e.?l</code> will extract the value of field or attribute
520     <code>l</code> if present, and return the empty sequence
521     <code>[]</code> otherwise.
522     </p>
523    
524     <p>
525     If the x-expression <code>e</code> evaluates to a record,
526     the construction <code>e -. l</code> will produce a new record
527     where the field <code>l</code> has been removed (if present).
528     </p>
529    
530     <p>
531     If the x-expression <code>e</code> evaluates to an x-sequence,
532     the construction <code>e/</code> will result in a new x-sequence
533     obtained by taking in order all the children of the XML elements
534     from the sequence <code>e</code>. For instance, the x-expression
535     <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/]]></code>
536     evaluates to the x-value <code>[ 1 2 3 6 7 8 ]</code>.
537     </p>
538    
539     <p>
540     If the x-expression <code>e</code> evaluates to an x-sequence,
541     the construction <code>e.(t)</code> (where <code>t</code> is an
542     x-type) will result in a new x-sequence
543     obtained by filtering <code>e</code> to keep only the elements
544     of type <code>t</code>. For instance, the x-expression
545     <code><![CDATA[[<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int)]]></code>
546     evaluates to the x-value <code>[ 4 5 ]</code>.
547     </p>
548     </section>
549    
550     <section title="Dynamic type checking">
551    
552     <p>
553     If <code>e</code> is an x-expression and <code>t</code> is an x-type,
554     the construction <code>(e :? t)</code> returns the same
555     result as <code>e</code> if it has type <code>t</code>, and otherwise
556     raises a <code>Failure</code> exception whose argument explains
557     why this is not the case.
558     </p>
559    
560     <sample><![CDATA[{{ON}}
561     # let f (x : {{ Any }}) = {{ (x :? <a>[ Int* ] ) }} in
562     f {{ <a>[ 1 2 '3' ] }};;
563     Exception:
564     Failure
565     "Value <a>[ 1 2 '3' ] does not match type <a>[ Int* ]\nValue '3' does not match type Int\n".
566     ]]></sample>
567     </section>
568    
569     <section title="Pattern matching">
570    
571     <p>
572     OCamlDuce comes with a powerful pattern matching operation.
573     X-patterns are described <a href="#patterns">below</a>.
574     The syntax for the pattern matching operation is:
575     <code>match e with p1 -> e1 | ... | pn -> en</code>.
576     The type-system ensures exhaustivivity for the pattern matching
577     and infers precise types for the capture variables.
578     It is also possile to use x-pattern matching as a regular
579     OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
580     match e with {{p1}} -> e1 | ... | {{pn}} -> en
581     function {{p1}} -> e1 | ... | {{pn}} -> en
582     </p>
583    
584 abate 1792 <p>
585     Pattern matching follows is first-match policy. The first pattern
586     that succeeds triggers the corresponding branch.
587     </p>
588    
589 abate 1787 <note>
590     currently it is impossible to mix normal OCaml patterns and x-patterns
591     in a single pattern matching.
592     </note>
593    
594     </section>
595    
596     <section title="Local binding">
597    
598     <p>
599     The x-expression <code>let p=e1 in e2</code> is equivalent to
600     <code>match e1 with p -> e2</code>. There is also an local binding
601     with an x-pattern in OCaml expressions: <code>let {{p}}=e1 in
602     e2</code>.
603     </p>
604    
605     </section>
606    
607    
608     <section title="Iterators">
609    
610     <p>
611     OCamlDuce comes with a sequence iterator
612     <code>map e with p1 -> e1 | ... | pn -> en</code> and
613     a tree iterator
614     <code>map* e with p1 -> e1 | ... | pn -> en</code>.
615     </p>
616    
617     <p>
618     For both constructions, the argument must evaluate to a sequence.
619     The <code>map</code> iterator applies the patterns to each element
620     of this sequence in turns and produces a new sequence by concatenating
621     all the results (all the right-hand sides must thus produce a
622     sequence). The set of patterns must be exhaustive for all the possible
623     elements of the input sequence.
624     </p>
625    
626     <p>
627     The tree iterator is similar except that the patterns need not be
628     exhaustive. If some element of the input sequence is not matched,
629     it is simply copied into the result unless it is an XML element. In
630     this case, the transformation is applied recursively to its content.
631     </p>
632    
633     </section>
634    
635     <section title="OCaml constructions">
636    
637     <p>
638     As a convenience, some of the OCaml expression constructors
639     are allowed as x-expressions (without a need to go back to OCaml
640 abate 1894 with double curly braces): (unqualified) value identifiers <b>without
641     apostrophes</b> and
642 abate 1787 function calls.
643     </p>
644    
645     </section>
646    
647     </box>
648    
649     <box title="More on x-types" link="types">
650    
651     <p>
652     We have seen how to write simple x-types. We can then combine
653     them with Boolean connectives:
654     </p>
655    
656     <ul>
657     <li><code>t1 &amp; t2</code>: intersection;</li>
658     <li><code>t1 | t2</code>: union;</li>
659     <li><code>t1 - t2</code>: difference.</li>
660     </ul>
661    
662     <p>
663     The empty x-type is written <code>Empty</code> (it contains no value),
664     and the universal x-type is written <code>Any</code> (it contains
665     all the x-values) or <code>_</code>.
666     </p>
667    
668     <p>
669     When an x-type has been bound to some OCaml identifier
670     (<code>{{ON}}type t = {{...}}</code>), it is possible to use
671     this identifier in another x-type. Recursive definitions
672     are allowed:
673     </p>
674    
675     <sample><![CDATA[{{ON}}
676     type t1 = {{ <a>[ t2* ] }}
677     and t2 = {{ <b>[ t1* ] }}
678     ]]></sample>
679    
680     <p>
681     Note that x-values are always finite and acyclic. The type checker
682     detects type definition which would yield empty types:
683     </p>
684    
685     <sample><![CDATA[{{ON}}
686     # type t = {{ <a>[ t+ ] }};;
687     This definition yields an empty type
688     ]]></sample>
689    
690     <p>
691     If <code>t1</code> and <code>t2</code> are record x-types,
692     we can combine them with the infix <code>++</code> operator, which
693     mimics the corresponding operator on expressions (record
694     concatenation). Similarly, we can use the infix <code>@</code>
695     concatenation operator on sequence x-types.
696     </p>
697    
698     </box>
699    
700     <box title="X-patterns" link="patterns">
701    
702     <p>
703     X-patterns follow the same syntax as X-types. In particular,
704     any X-type is a valid X-pattern. In addition to X-types constructors,
705     X-patterns can have:
706     </p>
707    
708     <ul>
709 abate 1894 <li>capture variables (lowercase OCaml identifiers <b>without apostrophes</b>);</li>
710 abate 1787 <li>constant bindings <code>(x := c)</code> where x is a capture
711     variable and c is
712     a literal x-constant (this pattern always succeeds and returns the
713     binding x->c).</li>
714     </ul>
715    
716     <p>
717 abate 1894 An identifier in an X-pattern can be either a reference
718     to a named X-type (if such a type declaration is in scope)
719     or a capture variable (otherwise).
720     </p>
721    
722     <p>
723 abate 1792 Here is a brief description of the semantics of patterns. Given
724     an input value, a pattern can either succeed or fail. If it succeeds,
725     it also produces a bindings from the capture variables in the pattern
726     to x-values.
727     </p>
728 abate 1787
729 abate 1792 <ul>
730    
731     <li>A pattern which is just a type (no capture variable) succeeds if
732     and only if the value has the type.</li>
733    
734     <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
735     or <code>p2</code> succeed, and returns the corresponding binding; if
736     both patterns succeeds, <code>p1</code> wins. It is required that
737     <code>p1</code> and <code>p2</code> have the same sets of capture
738     variables. </li>
739    
740     <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
741     and <code>p2</code> succeed, and returns the concatenation of the two
742     bindings. It is required that <code>p1</code> and <code>p2</code> have
743     <em>disjoint</em> sets of capture variables. </li>
744    
745     </ul>
746    
747     <p>
748     In record x-patterns, it is possible to omit the <code>=p</code> part
749     of a field. The content is then replaced with the label name
750 abate 1806 considered as a capture variable (or as a previously defined type).
751     E.g. <code>{ x y=p }</code> is
752 abate 1792 equivalent to <code>{ x=x y=p }</code>.</p>
753    
754 abate 1787 <p>It is also possible to add an "else" clause:
755     <code>{ x = (a,_)|(a:=3) }</code>
756     will accept any record with atmost the field <code>x</code>. If the content
757     is a pair, the capture variable a will be bound to its component;
758     otherwise, it is set to <code>3</code>.</p>
759    
760     <p>
761     In regular expressions, it is possible to extract whole subsequences
762     with the notation <code>x::R</code>, e.g.: <code>[ _* x::Int+ _* ]</code>
763     </p>
764    
765     <p>
766     If the same sequence capture variable appears several times (or below a
767     repetition) in a regexp, it is bound to the concatenation of all
768     matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
769     collect in <code>x</code> all the elements of type <code>Int</code> from
770 abate 1792 a sequence. It is not legal to have repeated simple capture variables.
771     </p>
772 abate 1787
773     <p>
774 abate 1788 The regexp operators <code>+,*,?</code> are greedy by default (they match as long
775     as possible). They admit non-greedy variants <code>+?,*?,??</code>.
776 abate 1787 </p>
777     </box>
778    
779     <box title="Namespace bindings" link="ns">
780    
781     <p>
782     The binding of namespace prefixes to URIs
783     can be done either by toplevel phrases (structure items) or
784     by local declarations:
785     </p>
786    
787     <sample>{{ON}}
788     # {{ namespace ns = "http://..." }};;
789     # let x = {{ `ns: x }};;
790     val x : {{`ns:x}} = {{`ns:x}}
791     # let x = {{ let namespace ns = "http://..." in `ns:x }};;
792     val x : {{`ns:x}} = {{`ns:x}}
793     </sample>
794    
795     <p>The toplevel definitions can also appear in module interfaces
796     (signatures). A toplevel prefix binding is not exported by a module: its scope
797     is limited to the current structure or signature. It is possible
798     to specify a default namespace, and to reset it:
799     </p>
800    
801     <sample>{{ON}}
802     # {{ namespace "http://..." }};;
803     # {{ `x }};;
804     - : {{`ns1:x}} = {{`ns1:x}}
805     # {{ namespace "" }};;
806     # {{ `x }};;
807     - : {{`x}} = {{`x}}
808     </sample>
809    
810     <p>
811     Note that the value pretty-printer invented some prefix
812     for the namespace URI. The default prefix declaration also have a
813     local form <code> let namespace "..." in ... </code>.
814     </p>
815    
816     </box>
817    
818 abate 1788 <box title="More on type-checking" link="typecheck">
819 abate 1787
820 abate 1788 <section title="Type inference">
821    
822 abate 1787 <p>
823 abate 1788 As we said above, the programmer is sometimes required to provide type
824     annotations. To know where to put these annotation, it is necessary to
825     get a basic understanding of how type-checking works.
826 abate 1787 </p>
827    
828 abate 1788 <p>
829     The OCaml type-checker is run first to detect which sub-expressions
830     are of the x-kind. A second ML type-checking pass is then done to
831     introduce subsumption (implicit subtyping) steps where allowed. After
832     these two passes, the OCamlDuce type checker obtains a data-flow summary of
833     x-values in the whole compilation unit. This is a directed graph,
834     whose edges represent either simple data-flow or complex operation
835     on x-values. The nodes of the graph can be thought as x-type
836     variables. A data-flow edge corresponds to a subtyping constraints,
837     and an operation edge corresponds to a symbolic constraints which
838     mimics the corresponding operation on values.
839     </p>
840    
841     <p>
842     Some of the nodes are given an explicit type by the programmer,
843     through type annotations (on expressions or function arguments)
844     or the other usual mechanism in ML (data type declarations,
845     signatures, ...).
846     </p>
847    
848     <p>
849     Also, if there is a loop with only subtyping edges in the graph,
850     all the nodes on the loop are merged together.
851     </p>
852    
853     <p>
854     After this operation, the graph is required to be acyclic (assuming
855     that the nodes with an explicit type are removed from the graph). It
856     is the responsibility of the programmer to provide enough type
857     annotation to achieve this property. Otherwise, a type error
858     is issued.
859     </p>
860    
861     <sample><![CDATA[{{ON}}
862     # let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
863     Cycle detected: cannot type-check
864     # let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
865     val f : int -> {{String}} = <fun>]]>
866     </sample>
867    
868     <p>
869     In the example above, there is a cycle between the result type for
870     <code>f</code> and the type for the sub-expression <code>{{ON}}f
871     {{n-1}}</code>. It is here broken with a type annotation on the result; it could
872     have been broken by a type annotation on the expression <code>{{ON}}f
873     {{n-1}}</code>, or on the function <code>f</code> itself, or by a
874     module signature.
875     </p>
876    
877     <p>
878     Let us study another simple example:
879     </p>
880    
881     <sample>{{ON}}
882     # let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
883     - : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
884     </sample>
885    
886     <p>
887     The type-checkers detects that the two x-values <code>2</code> and
888     <code>3</code> can flow to the argument of <code>f</code>. Its body
889     is thus type-checked with the assumption that <code>x</code> has type
890     <code>2--3</code>. The computed result type is then <code>3--4</code>.
891     </p>
892    
893    
894     <p>
895     The type-inference process described above is global by nature. The
896     acyclicity condition is only imposed after a whole compilation unit
897     has been type-checked by OCaml (and the information from the module
898     interface as been integrated). When a type variable is inferred to
899     be of the x-kind, it is never generalized. As a consequence, there
900     is no parametric polymorphism on x-types.
901     </p>
902    
903     <p>
904     In the toplevel, type-checking is done after each phrase. Consider
905     the following session:
906     </p>
907    
908     <sample><![CDATA[{{ON}}
909     # let f x = {{ x + 1 }};;
910     val f : {{Empty}} -> {{Empty}} = <fun>
911     # let a = f {{ 2 }};;
912     Subtyping failed 2 <= Empty
913     Sample:
914     2
915     ]]></sample>
916    
917     <p>
918     The function <code>f</code> is inferred to have type
919     <code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
920     phrase is type-checked, the data-flow graph says that no value
921     can flow to <code>x</code>, and thus the input type is empty
922     (and similarly for the result type). If the two phrases
923     were type-checked together (which would be the case it they had
924     been compiled by the compiler, not in the toplevel), the type checker
925     would have correctly inferred that the input type for <code>f</code>
926     must contain <code>2</code>.
927     </p>
928    
929     </section>
930    
931     <section title="Implicit subtyping">
932    
933     <p>
934     Coercion from an x-type to a super type is automatic in OCamlDuce.
935     However, this automatic subsumption does not carry over to OCaml
936     type constructor, even if there are covariant. Consider:
937     </p>
938    
939     <sample><![CDATA[{{ON}}
940     # let f (x : {{ Int }} * {{ Int }}) = 1;;
941     val f : {{Int}} * {{Int}} -> int = <fun>
942     # let g (x : {{ 0 }} * {{ 0 }}) = f x;;
943     This expression has type {{0}} * {{0}} but is here used with type
944     {{Int}} * {{Int}}
945     # let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
946     val g : {{0}} * {{0}} -> int = <fun>
947     # let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
948     val g : {{0}} * {{0}} -> int = <fun>
949     ]]></sample>
950    
951     <p>
952     The first attempt to define <code>g</code> fails because the type for
953     <code>x</code> is not an x-type and thus subsumption does not
954     apply. In the second attempt, we extract the two components of the
955     pair; since they are inferred to be x-values, subtyping applies to
956     both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
957     it is legal to unify its type with the input type of <code>f</code>.
958     The third definition for <code>g</code> gives an alternative solution:
959     using explicit OCaml type coercions.
960     </p>
961    
962     </section>
963    
964 abate 1787 </box>
965    
966 abate 1788 <box title="Exchanging values" link="transl">
967    
968     <p>
969     OCamlDuce strongly seperates regular OCaml values from the new
970     x-values. They have different syntax, expressions, types, patterns,
971     and even type-checking algorithms. This strong segregation is key point
972     which allowed a simple integration between very different type
973     systems.
974     </p>
975    
976     <p>
977     At some point, it is still necessary to cross the frontier and
978     translate OCaml values to x-values or the opposite.
979     </p>
980    
981     <p>
982     Fortunately, OCamlDuce provides automatic translations in both
983     directions. Instead of double curly braces, you can
984     enclose x-expressions in curly brace+colon <code>{: ... :}</code>
985     (here, the <code>...</code> is an x-expression).
986     The effect is to translate the result of the x-expression
987     (which must be an x-value) to an OCaml value. Similarly,
988     in an x-expression, you can obtain the x-translation of
989     an OCaml value with the same syntax <code>{: ... :}</code>
990     (here, the <code>...</code> is an OCaml expression).
991     </p>
992    
993     <p>
994     Here is how the translation works. To each OCaml type <code>t</code>,
995     we associate an x-type <code>T(t)</code> and a pair of translation
996     function between <code>t</code> and <code>T(t)</code>.
997     Actually, not all the features are supported. For instance,
998     free type variables, abstract types, object types, non-regular
999     recursive types cannot be translated. In particular, since
1000     type variables are not allowed, the OCaml type must be fully known.
1001     </p>
1002    
1003     <p>
1004 abate 1789 The translation for an OCaml type <code>t</code> is defined by structural
1005     induction on <code>t</code>. Sum types are
1006 abate 1788 translated to union types: a constant constructor <code>A</code> is
1007     translated to the qualified name <code>`A</code>; a non-constant
1008     constructor <code>A of t1 * ... * tn</code> is translated to
1009     <code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
1010     have the same translation. Record types are translated to closed
1011     record x-types. Some other translations:
1012     </p>
1013    
1014     <table border="1">
1015     <tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
1016     <tr><td><code>int</code></td> <td><code>Int</code></td></tr>
1017     <tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
1018     <tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
1019     <tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
1020     <tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
1021     <tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
1022     <tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
1023     <tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
1024     <tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
1025     </table>
1026    
1027     <p>
1028     Here is an example:
1029     </p>
1030    
1031     <sample>{{ON}}
1032     # let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
1033     - : {{Int}} list = [{{2}}; {{3}}; {{4}}]
1034     </sample>
1035    
1036     <p>
1037     In this example, the result type of the translation is inferred
1038     to be <code>{{ON}}{{ Int }} list</code> (because the type for
1039     <code>f</code> is given). The corresponding x-type
1040     is <code>{{ON}}{{ [Int*] }}</code>.
1041     </p>
1042    
1043     </box>
1044    
1045 abate 1789 <box title="The standard library" link="stdlib">
1046    
1047     <p>
1048     In OCamlDuce, the Num library from OCaml is included in the standard
1049     library. In addition, there are two new module called
1050     <code>Ocamlduce</code> and <code>Cduce_types</code> in the standard library.
1051     </p>
1052    
1053     <p>
1054     The module <code>Cduce_types</code> gives access to the internal
1055     representation of x-values. It is currently undocumented.
1056     </p>
1057    
1058     <p>
1059     The module <code>Ocamlduce</code> provides several useful
1060     functionality x-values. See the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.html">ocamldoc</a> generated
1061     documentation for a description of its interface.
1062     </p>
1063    
1064     </box>
1065    
1066 abate 1792 <box title="Marshaling" link="marshal">
1067    
1068     <p>
1069     OCamlDuce use some tricks on its internal representation of x-values
1070     to reduce memory usage and improve performance. You need to pay
1071 abate 1793 special attention if you want to use OCaml serialization functions
1072 abate 1792 (module <code>Marshal</code>, functions
1073     <code>input_value/output_value</code>) on x-values. In addition to
1074     your values, you also need to save and restore some piece of internal data
1075     using the functions <code>Cduce_types.Value.extract_all</code> and
1076     <code>Cduce_types.Value.intract_all</code>. Of course, this also
1077     applies if the value to be serialized contains deeply nested x-values.
1078     </p>
1079    
1080     <p>
1081     Here are generic
1082     serialization/deserializations functions that illustrate how to do it:
1083     </p>
1084    
1085     <sample>
1086     let my_output_value oc v =
1087     let p = Cduce_types.Value.extract_all () in
1088     output_value oc (p,v)
1089    
1090     let my_input_value ic =
1091     let (p,v) = input_value ic in
1092     Cduce_types.Value.intract_all p;
1093     v
1094     </sample>
1095    
1096     </box>
1097    
1098     <box title="Performance" link="perf">
1099    
1100     <section title="Strings">
1101    
1102     <p>
1103     OCaml users might be surprised by the fact that x-strings are simply
1104     represented as sequences in OCamlDuce. Does this mean that they are
1105     actually stored in memory as linked list? Certainly not! The internal
1106     representation of sequence values uses several tricks to improve
1107     performance and memory usage. In particular, a special form in the
1108     representation can store strings as byte buffers, as in OCaml.
1109     It an XML document is loaded, or if a Caml string is converted
1110     to an x-value, this compact representation will be used.
1111     </p>
1112    
1113     </section>
1114    
1115     <section title="Concatenation">
1116    
1117     <p>
1118     Similarly, OCaml users might be relectutant to use the sequence
1119     concatenation <code>@</code> on sequences. In OCaml, the complexity
1120     of this operator is linear in the size of its first argument (which
1121     need to be copied). OCamlDuce use a special form in its internal
1122     representation to store concatenation in a lazy way. The concatenation
1123     will really by computed only when the value is accessed. This means
1124     that it's perfectly ok to build a long sequence by adding
1125     new elements at the end one by one, as long as you don't
1126     simultaneously inspect the sequence.
1127     </p>
1128    
1129     </section>
1130    
1131     <section title="Pattern matching">
1132    
1133     <p>
1134     Another point which is worth knowing when programming in OCamlDuce
1135     is that patterns can be written in a declarative style without
1136     affective performance. The compiler uses static type information
1137     about matched values to produce efficient code for pattern matching.
1138     To illustrate this, consider the following sample:
1139     </p>
1140    
1141     <sample><![CDATA[{{ON}}
1142     x.ml:
1143    
1144     type a = {{ <a>[ a* ] }}
1145     type b = {{ <b>[ b* ] }}
1146    
1147     let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1148     ]]></sample>
1149    
1150     <sample><![CDATA[{{ON}}
1151     y.ml:
1152    
1153     type a = {{ <a>[ a* ] }}
1154     type b = {{ <b>[ b* ] }}
1155    
1156     let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1157     ]]></sample>
1158    
1159     <p>
1160     The two functions have exactly the same semantics, but the first
1161     implementation is more declarative: it uses type checks to distinguish
1162     between <code>a</code> and <code>b</code> instead of saying
1163     <em>how</em> to distinguish between these two types. Imagine
1164     that the definition of these types change to:
1165     </p>
1166    
1167     <sample><![CDATA[{{ON}}
1168     type a = {{ <x kind="a">[ a* ] }}
1169     type b = {{ <x kind="b">[ b* ] }}
1170     ]]></sample>
1171    
1172     <p>
1173     Then the first implementation still works as expected, but the
1174     second one needs to be rewritten.</p>
1175    
1176     <p>Now one might believe that the second implementation is more
1177     efficient because it tells the compiler to check only the root tag,
1178     whereas the first implementation would force
1179     the compiler to produce code to check that all tags in the tree
1180     are <code>a</code>s. But this is not what happens! Actually,
1181     you can check that the compiler will produce exactly the same code
1182     for both implementations. It considers the static type information
1183     about the argument of the pattern matching (here, the input type
1184     of the function), and computes an efficient way to evaluate
1185     patterns for the values of this type.
1186     </p>
1187    
1188     </section>
1189    
1190     <section title="The map iterator">
1191    
1192     <p>
1193     The <code>map ... with ...</code> iterator is implemented in a
1194     tail-recursive way. You can safely use it on very long sequences.
1195     </p>
1196    
1197     </section>
1198    
1199     </box>
1200    
1201 abate 1799 <box title="OCaml and OCamlDuce" link="ocaml">
1202    
1203     <p>
1204     Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1205     OCaml release. This means that OCamlDuce can use OCaml-generated
1206     <tt>.cmi</tt> files and that it produces an OCaml-compatible
1207     <tt>.cmi</tt> file if the interface does not use any x-type
1208     (this file is equal to what would have been obtained by using OCaml).
1209     </p>
1210    
1211     <p>
1212     It is thus possible to use existing libraries which were compiled for
1213 abate 1821 OCaml. It is also possible to use OCamlDuce to compile
1214 abate 1799 some modules and use them in an OCaml project provided their interface
1215     is pure OCaml.
1216     </p>
1217    
1218     </box>
1219    
1220 abate 1832 </page>
1221    
1222     <page name="ocaml_code">
1223     <title>OCamlDuce: code samples and applications</title>
1224    
1225 abate 1789 <box title="Code samples" link="code">
1226    
1227     <section title="Parsing XML files">
1228    
1229     <p>
1230     OCamlDuce does not come with any built-in XML parser. However,
1231     the <a href="http://yquem.inria.fr/~frisch/ocamlcduce/doc/ocamlduce/Ocamlduce.Load.html"><code>Ocamlduce.Load</code></a> module in the standard library
1232     makes it easy to plug existing XML parsers. Here is some
1233     code which demonstrate how to do that with three of
1234     the most popular OCaml XML parser libraries:
1235     </p>
1236    
1237     <ul>
1238     <li><a
1239     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/">PXP</a></li>
1240     <li><a
1241     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/">Expat</a></li>
1242     <li><a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/">Xml-light</a></li>
1243     </ul>
1244    
1245     </section>
1246    
1247     <section title="Converting DTD to OCamlDuce types">
1248    
1249     <p>
1250     This <a href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/dtd2types/">tool</a> produces a set of OCamlDuce type declarations
1251     from a DTD. It requires PXP.
1252     </p>
1253    
1254     <note>This application does not use any of the new features, but it
1255     can be useful in the development of OCamlDuce applications.
1256     </note>
1257    
1258     </section>
1259    
1260     <section title="Parsing XML Schema, producing valid XHTML output">
1261    
1262     <p>
1263     This <a
1264     href="http://yquem.inria.fr/~frisch/ocamlcduce/samples/schema/">application</a>
1265     parses XML Schema Definitions (.xsd files), and produces summaries
1266     (toplevel declaration names) in XHTML. OCamlDuce type system ensures
1267     that the parser is coherent with the input XML type (any valid XML
1268     Schema is accepted) and that the printer is coherent with the output
1269     XML type (it is necessarily a valid XHTML document).
1270     </p>
1271    
1272     <p>
1273     Of course, for such a simple transformation, parsing the XML document
1274     into an internal representation is not necessary. A direct XML-to-XML
1275     transformation would be easy to write. We wanted to illustrate
1276     a complex parsing of XML.
1277     </p>
1278    
1279     <p>
1280     It it interesting to introduce errors in the parser
1281     <code>schema_loader.ml</code> or the printer
1282 abate 1792 <code>dump_schema.ml</code> and see how the type system catches them.
1283 abate 1789 </p>
1284    
1285     <note>
1286     The application uses XML Light to parse XML document.
1287     </note>
1288    
1289     <note>
1290     Some features of XML Schema are not parsed, such as
1291     <code>redefine</code> elements or substitution groups.
1292     </note>
1293    
1294 abate 1811 <note>
1295     To compile the application with the provided Makefile,
1296     you must make the environment variable <code>OCAMLFIND_CONF</code>
1297     point to the <code>$GODI/etc/findlib-ocamlduce.conf</code> file.
1298     </note>
1299    
1300 abate 1789 </section>
1301    
1302 abate 1802 <section title="String regular expressions">
1303    
1304     <p>
1305     OCamlDuce supports regular expression types and patterns, not only
1306     for sequences of XML elements, but also for strings. The following
1307     example shows how to use regular expressions to split a string
1308     of the form <code>name1=val1,...,namen=valn</code> with
1309     <code>n>0</code> into
1310     a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>.
1311     The <code>*?</code> operator in regular expressions means ``ungreedy
1312     match'' (match the shortest possible subsequence). The last
1313     pattern describes precisely strings which are not matched by
1314     the other cases. It would be possible to replace it with
1315     the wildcard <code>_</code>.
1316     </p>
1317    
1318     <sample><![CDATA[{{ON}}
1319     let rec split (s : {{ String }}) =
1320     match s with
1321     | {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest)
1322     | {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ]
1323     | {{ Any - [ _* '=' _* ] }} -> failwith "split"
1324     ]]></sample>
1325    
1326     </section>
1327    
1328 abate 1789 </box>
1329    
1330 abate 1808 <box title="Applications in OCamlDuce" link="appli">
1331    
1332     <ul>
1333     <li><a
1334     href="http://anil.recoil.org/projects/review2atom.html">Review2Atom</a>
1335     by Anil Madhavapeddy: translates paper review files in XML format into
1336     an Atom feed suitable for aggregation.
1337     </li>
1338 abate 1919
1339     <li>
1340     <a href="http://www.caterpillarjones.org/soss/">SOSS</a> by Stefan
1341     Lampe: an implementation of a SOAP server for OCaml, designed to allow
1342     a service, developed in OCaml, to be made available as a SOAP service
1343     with minimal effort. </li>
1344 abate 1808 </ul>
1345    
1346     </box>
1347    
1348 abate 1832 </page>
1349 abate 1808
1350 abate 1937 </page>

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5