| 32 |
<box title="Download and installation" link="install"> |
<box title="Download and installation" link="install"> |
| 33 |
|
|
| 34 |
<p> |
<p> |
| 35 |
The build procedure for OCamlDuce is exactly the same as for OCaml: |
Currently, OCamlDuce |
| 36 |
<tt>configure, make world, make install</tt>. The names of the tools |
is based on OCaml 3.08.4 and on a CVS snapshots |
| 37 |
are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce |
of CDuce (between 0.3.92 and the head). |
|
is based on CVS snapshots of OCaml (between 3.08.3 and the current |
|
|
<tt>release308</tt> branch) and CDuce (between 0.3.91 and the head). |
|
| 38 |
</p> |
</p> |
| 39 |
|
|
| 40 |
<ul> |
<ul> |
| 41 |
<li><a |
<li><a |
| 42 |
href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler, |
href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl1.tar.gz">Compiler, |
| 43 |
version 0.0.5</a></li> |
version 3.08.4, patch level 1</a></li> |
|
<!--<li><a |
|
|
href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support |
|
|
library, version 0.0.4</a></li>--> |
|
| 44 |
</ul> |
</ul> |
| 45 |
|
|
| 46 |
<p> |
<p> |
| 47 |
GODI users can upgrade an existing installation by adding this |
There are two different installation modes: |
| 48 |
|
</p> |
| 49 |
|
|
| 50 |
|
<ul> |
| 51 |
|
<li><b>Stand-alone mode</b>. OCamlDuce is used as a drop-in |
| 52 |
|
replacement for OCaml. The build procedure is unchanged: |
| 53 |
|
<tt>./configure && make world && make install</tt>. |
| 54 |
|
The tools are named <tt>ocaml, ocamlc, ocamlopt</tt>, ... |
| 55 |
|
The standard library is extended with the <tt>num</tt> library |
| 56 |
|
and the <tt>Ocamlduce</tt> module. |
| 57 |
|
</li> |
| 58 |
|
|
| 59 |
|
<li><b>Package mode</b>. OCamlDuce is installed on top of an existing |
| 60 |
|
OCaml installation (whose version number must match), without touching |
| 61 |
|
it. The build |
| 62 |
|
procedure is: <tt>./configure && make all && make opt |
| 63 |
|
&& make install</tt>. The <tt>configure</tt> script should be called with |
| 64 |
|
the same arguments as the ones used when you built OCaml. For instance, |
| 65 |
|
the <tt>LIBDIR</tt> argument is used to find OCaml standard library. |
| 66 |
|
The tools names are changed to <tt>ocamlduce, ocamlducec, |
| 67 |
|
ocamlduceopt</tt>, ... They use the existing standard library. |
| 68 |
|
In addition, a library <tt>ocamlduce.cma</tt> is built. |
| 69 |
|
It depends on the <tt>nums.cma</tt> library. The <tt>install</tt> |
| 70 |
|
target implements a <tt>Findlib</tt>-based installation. It registers |
| 71 |
|
a package named <tt>ocamlduce</tt> and it puts the tools |
| 72 |
|
in the package sub-directory (the <tt>BINDIR</tt> and <tt>LIBDIR</tt> |
| 73 |
|
arguments to <tt>configure</tt> are not used). The toplevel |
| 74 |
|
can be called by <tt>ocamlfind ocamlduce/ocamlduce -I `ocamlfind query ocamlduce`</tt>. |
| 75 |
|
</li> |
| 76 |
|
</ul> |
| 77 |
|
|
| 78 |
|
<p> |
| 79 |
|
GODI users can choose any of these two modes. |
| 80 |
|
In order to upgrade an existing installation so as to use |
| 81 |
|
OCamlDuce in place of OCaml, they must add this |
| 82 |
line to their <tt>etc/godi.conf</tt> file: |
line to their <tt>etc/godi.conf</tt> file: |
| 83 |
</p> |
</p> |
| 84 |
<sample> |
<sample> |
| 85 |
GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi |
GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi |
| 86 |
</sample> |
</sample> |
| 87 |
<p> |
<p> |
| 88 |
and by forcing a recompilation of the <tt>godi-ocaml-src</tt> |
and force a recompilation of the <tt>godi-ocaml-src</tt> |
| 89 |
and <tt>godi-ocaml</tt> packages. <!--They should also build |
and <tt>godi-ocaml</tt> packages. The alternative is to install OCamlDuce |
| 90 |
the <tt>godi-xml-support</tt> library.--> |
as a GODI package over an existing installation. You don't need |
| 91 |
|
to touch the <tt>etc/godi.conf</tt> file. The package |
| 92 |
|
name is <tt>godi-ocamlduce</tt>. In order to use the new compilers |
| 93 |
|
and tools, you can make the environment variable |
| 94 |
|
<tt>OCAMLFIND_CONF</tt> point to the |
| 95 |
|
<tt>$GODI/etc/findlib-ocamlduce.conf</tt> file and then |
| 96 |
|
uses e.g. <tt>ocamlfind ocamlc -package ocamlduce</tt>. |
| 97 |
</p> |
</p> |
| 98 |
|
|
|
<!-- |
|
|
<p> |
|
|
Some simple examples can be found <a --> |
|
|
<!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p> |
|
|
--> |
|
|
|
|
| 99 |
</box> |
</box> |
| 100 |
|
|
| 101 |
<box title="Overview" link="overview"> |
<box title="Overview" link="overview"> |
| 102 |
|
|
| 103 |
<p> |
<p> |
| 104 |
|
The goal of the OCamlDuce project is to extend the OCaml language with features |
| 105 |
|
to make it easier to write safe and efficient complex applications |
| 106 |
|
that need to deal with XML documents. In particular, it relies |
| 107 |
|
on a notion of types and patterns to guarantee statically |
| 108 |
|
that all the possible input documents are correctly processed, and |
| 109 |
|
that only valid output documents are produced. |
| 110 |
|
</p> |
| 111 |
|
|
| 112 |
|
<p> |
| 113 |
In a nutshell, OCamlDuce extends OCaml with a new kind of values |
In a nutshell, OCamlDuce extends OCaml with a new kind of values |
| 114 |
(<em>x-values</em>) to represent XML documents, fragments, tags, Unicode |
(<em>x-values</em>) to represent XML documents, fragments, tags, Unicode |
| 115 |
strings. In order to describe these values, it also extends the type algebra |
strings. In order to describe these values, it also extends the type algebra |
| 526 |
function {{p1}} -> e1 | ... | {{pn}} -> en |
function {{p1}} -> e1 | ... | {{pn}} -> en |
| 527 |
</p> |
</p> |
| 528 |
|
|
| 529 |
|
<p> |
| 530 |
|
Pattern matching follows is first-match policy. The first pattern |
| 531 |
|
that succeeds triggers the corresponding branch. |
| 532 |
|
</p> |
| 533 |
|
|
| 534 |
<note> |
<note> |
| 535 |
currently it is impossible to mix normal OCaml patterns and x-patterns |
currently it is impossible to mix normal OCaml patterns and x-patterns |
| 536 |
in a single pattern matching. |
in a single pattern matching. |
| 658 |
</ul> |
</ul> |
| 659 |
|
|
| 660 |
<p> |
<p> |
| 661 |
In record x-patterns, it is possible to omit the <code>=p</code> part of a field. |
Here is a brief description of the semantics of patterns. Given |
| 662 |
The content is then replaced with the label name considered as |
an input value, a pattern can either succeed or fail. If it succeeds, |
| 663 |
a capture variable. E.g. <code>{ x y=p }</code> is equivalent to |
it also produces a bindings from the capture variables in the pattern |
| 664 |
<code>{ x=x y=p }</code>.</p> |
to x-values. |
| 665 |
|
</p> |
| 666 |
|
|
| 667 |
|
<ul> |
| 668 |
|
|
| 669 |
|
<li>A pattern which is just a type (no capture variable) succeeds if |
| 670 |
|
and only if the value has the type.</li> |
| 671 |
|
|
| 672 |
|
<li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code> |
| 673 |
|
or <code>p2</code> succeed, and returns the corresponding binding; if |
| 674 |
|
both patterns succeeds, <code>p1</code> wins. It is required that |
| 675 |
|
<code>p1</code> and <code>p2</code> have the same sets of capture |
| 676 |
|
variables. </li> |
| 677 |
|
|
| 678 |
|
<li>A pattern <code>p1 & p2</code> succeeds if both <code>p1</code> |
| 679 |
|
and <code>p2</code> succeed, and returns the concatenation of the two |
| 680 |
|
bindings. It is required that <code>p1</code> and <code>p2</code> have |
| 681 |
|
<em>disjoint</em> sets of capture variables. </li> |
| 682 |
|
|
| 683 |
|
</ul> |
| 684 |
|
|
| 685 |
|
<p> |
| 686 |
|
In record x-patterns, it is possible to omit the <code>=p</code> part |
| 687 |
|
of a field. The content is then replaced with the label name |
| 688 |
|
considered as a capture variable. E.g. <code>{ x y=p }</code> is |
| 689 |
|
equivalent to <code>{ x=x y=p }</code>.</p> |
| 690 |
|
|
| 691 |
<p>It is also possible to add an "else" clause: |
<p>It is also possible to add an "else" clause: |
| 692 |
<code>{ x = (a,_)|(a:=3) }</code> |
<code>{ x = (a,_)|(a:=3) }</code> |
| 704 |
repetition) in a regexp, it is bound to the concatenation of all |
repetition) in a regexp, it is bound to the concatenation of all |
| 705 |
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will |
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will |
| 706 |
collect in <code>x</code> all the elements of type <code>Int</code> from |
collect in <code>x</code> all the elements of type <code>Int</code> from |
| 707 |
a sequence.</p> |
a sequence. It is not legal to have repeated simple capture variables. |
| 708 |
|
</p> |
| 709 |
|
|
| 710 |
<p> |
<p> |
| 711 |
The regexp operators <code>+,*,?</code> are greedy by default (they match as long |
The regexp operators <code>+,*,?</code> are greedy by default (they match as long |
| 1000 |
|
|
| 1001 |
</box> |
</box> |
| 1002 |
|
|
| 1003 |
<box title="Code samples" link="code"> |
<box title="Marshaling" link="marshal"> |
| 1004 |
|
|
| 1005 |
|
<p> |
| 1006 |
|
OCamlDuce use some tricks on its internal representation of x-values |
| 1007 |
|
to reduce memory usage and improve performance. You need to pay |
| 1008 |
|
special attention if you want to use OCaml serialization functions |
| 1009 |
|
(module <code>Marshal</code>, functions |
| 1010 |
|
<code>input_value/output_value</code>) on x-values. In addition to |
| 1011 |
|
your values, you also need to save and restore some piece of internal data |
| 1012 |
|
using the functions <code>Cduce_types.Value.extract_all</code> and |
| 1013 |
|
<code>Cduce_types.Value.intract_all</code>. Of course, this also |
| 1014 |
|
applies if the value to be serialized contains deeply nested x-values. |
| 1015 |
|
</p> |
| 1016 |
|
|
| 1017 |
|
<p> |
| 1018 |
|
Here are generic |
| 1019 |
|
serialization/deserializations functions that illustrate how to do it: |
| 1020 |
|
</p> |
| 1021 |
|
|
| 1022 |
|
<sample> |
| 1023 |
|
let my_output_value oc v = |
| 1024 |
|
let p = Cduce_types.Value.extract_all () in |
| 1025 |
|
output_value oc (p,v) |
| 1026 |
|
|
| 1027 |
|
let my_input_value ic = |
| 1028 |
|
let (p,v) = input_value ic in |
| 1029 |
|
Cduce_types.Value.intract_all p; |
| 1030 |
|
v |
| 1031 |
|
</sample> |
| 1032 |
|
|
| 1033 |
|
</box> |
| 1034 |
|
|
| 1035 |
|
<box title="Performance" link="perf"> |
| 1036 |
|
|
| 1037 |
|
<section title="Strings"> |
| 1038 |
|
|
| 1039 |
|
<p> |
| 1040 |
|
OCaml users might be surprised by the fact that x-strings are simply |
| 1041 |
|
represented as sequences in OCamlDuce. Does this mean that they are |
| 1042 |
|
actually stored in memory as linked list? Certainly not! The internal |
| 1043 |
|
representation of sequence values uses several tricks to improve |
| 1044 |
|
performance and memory usage. In particular, a special form in the |
| 1045 |
|
representation can store strings as byte buffers, as in OCaml. |
| 1046 |
|
It an XML document is loaded, or if a Caml string is converted |
| 1047 |
|
to an x-value, this compact representation will be used. |
| 1048 |
|
</p> |
| 1049 |
|
|
| 1050 |
|
</section> |
| 1051 |
|
|
| 1052 |
|
<section title="Concatenation"> |
| 1053 |
|
|
| 1054 |
|
<p> |
| 1055 |
|
Similarly, OCaml users might be relectutant to use the sequence |
| 1056 |
|
concatenation <code>@</code> on sequences. In OCaml, the complexity |
| 1057 |
|
of this operator is linear in the size of its first argument (which |
| 1058 |
|
need to be copied). OCamlDuce use a special form in its internal |
| 1059 |
|
representation to store concatenation in a lazy way. The concatenation |
| 1060 |
|
will really by computed only when the value is accessed. This means |
| 1061 |
|
that it's perfectly ok to build a long sequence by adding |
| 1062 |
|
new elements at the end one by one, as long as you don't |
| 1063 |
|
simultaneously inspect the sequence. |
| 1064 |
|
</p> |
| 1065 |
|
|
| 1066 |
|
</section> |
| 1067 |
|
|
| 1068 |
|
<section title="Pattern matching"> |
| 1069 |
|
|
| 1070 |
|
<p> |
| 1071 |
|
Another point which is worth knowing when programming in OCamlDuce |
| 1072 |
|
is that patterns can be written in a declarative style without |
| 1073 |
|
affective performance. The compiler uses static type information |
| 1074 |
|
about matched values to produce efficient code for pattern matching. |
| 1075 |
|
To illustrate this, consider the following sample: |
| 1076 |
|
</p> |
| 1077 |
|
|
| 1078 |
|
<sample><![CDATA[{{ON}} |
| 1079 |
|
x.ml: |
| 1080 |
|
|
| 1081 |
|
type a = {{ <a>[ a* ] }} |
| 1082 |
|
type b = {{ <b>[ b* ] }} |
| 1083 |
|
|
| 1084 |
|
let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1 |
| 1085 |
|
]]></sample> |
| 1086 |
|
|
| 1087 |
|
<sample><![CDATA[{{ON}} |
| 1088 |
|
y.ml: |
| 1089 |
|
|
| 1090 |
|
type a = {{ <a>[ a* ] }} |
| 1091 |
|
type b = {{ <b>[ b* ] }} |
| 1092 |
|
|
| 1093 |
|
let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1 |
| 1094 |
|
]]></sample> |
| 1095 |
|
|
| 1096 |
|
<p> |
| 1097 |
|
The two functions have exactly the same semantics, but the first |
| 1098 |
|
implementation is more declarative: it uses type checks to distinguish |
| 1099 |
|
between <code>a</code> and <code>b</code> instead of saying |
| 1100 |
|
<em>how</em> to distinguish between these two types. Imagine |
| 1101 |
|
that the definition of these types change to: |
| 1102 |
|
</p> |
| 1103 |
|
|
| 1104 |
|
<sample><![CDATA[{{ON}} |
| 1105 |
|
type a = {{ <x kind="a">[ a* ] }} |
| 1106 |
|
type b = {{ <x kind="b">[ b* ] }} |
| 1107 |
|
]]></sample> |
| 1108 |
|
|
| 1109 |
|
<p> |
| 1110 |
|
Then the first implementation still works as expected, but the |
| 1111 |
|
second one needs to be rewritten.</p> |
| 1112 |
|
|
| 1113 |
|
<p>Now one might believe that the second implementation is more |
| 1114 |
|
efficient because it tells the compiler to check only the root tag, |
| 1115 |
|
whereas the first implementation would force |
| 1116 |
|
the compiler to produce code to check that all tags in the tree |
| 1117 |
|
are <code>a</code>s. But this is not what happens! Actually, |
| 1118 |
|
you can check that the compiler will produce exactly the same code |
| 1119 |
|
for both implementations. It considers the static type information |
| 1120 |
|
about the argument of the pattern matching (here, the input type |
| 1121 |
|
of the function), and computes an efficient way to evaluate |
| 1122 |
|
patterns for the values of this type. |
| 1123 |
|
</p> |
| 1124 |
|
|
| 1125 |
|
</section> |
| 1126 |
|
|
| 1127 |
|
<section title="The map iterator"> |
| 1128 |
|
|
| 1129 |
|
<p> |
| 1130 |
|
The <code>map ... with ...</code> iterator is implemented in a |
| 1131 |
|
tail-recursive way. You can safely use it on very long sequences. |
| 1132 |
|
</p> |
| 1133 |
|
|
| 1134 |
|
</section> |
| 1135 |
|
|
| 1136 |
|
</box> |
| 1137 |
|
|
| 1138 |
|
<box title="OCaml and OCamlDuce" link="ocaml"> |
| 1139 |
|
|
| 1140 |
|
<p> |
| 1141 |
|
Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding |
| 1142 |
|
OCaml release. This means that OCamlDuce can use OCaml-generated |
| 1143 |
|
<tt>.cmi</tt> files and that it produces an OCaml-compatible |
| 1144 |
|
<tt>.cmi</tt> file if the interface does not use any x-type |
| 1145 |
|
(this file is equal to what would have been obtained by using OCaml). |
| 1146 |
|
</p> |
| 1147 |
|
|
| 1148 |
|
<p> |
| 1149 |
|
It is thus possible to use existing libraries which were compiled for |
| 1150 |
|
OCaml 3.08.4. It is also possible to use OCamlDuce to compile |
| 1151 |
|
some modules and use them in an OCaml project provided their interface |
| 1152 |
|
is pure OCaml. |
| 1153 |
|
</p> |
| 1154 |
|
|
| 1155 |
|
|
| 1156 |
|
</box> |
| 1157 |
|
|
| 1158 |
|
<box title="Code samples" link="code"> |
| 1159 |
|
|
| 1160 |
<section title="Parsing XML files"> |
<section title="Parsing XML files"> |
| 1161 |
|
|
| 1212 |
<p> |
<p> |
| 1213 |
It it interesting to introduce errors in the parser |
It it interesting to introduce errors in the parser |
| 1214 |
<code>schema_loader.ml</code> or the printer |
<code>schema_loader.ml</code> or the printer |
| 1215 |
<code>dump_schema.ml</code> and see how the type system catch them. |
<code>dump_schema.ml</code> and see how the type system catches them. |
| 1216 |
</p> |
</p> |
| 1217 |
|
|
| 1218 |
<note> |
<note> |