| 27 |
been reused. |
been reused. |
| 28 |
</p> |
</p> |
| 29 |
|
|
| 30 |
|
<p> |
| 31 |
|
The theory behind OCamlDuce's type system is described in a <a |
| 32 |
|
href="http://cristal.inria.fr/~frisch/ocamlcduce/">technical |
| 33 |
|
report</a>. |
| 34 |
|
</p> |
| 35 |
|
|
| 36 |
</box> |
</box> |
| 37 |
|
|
| 38 |
<box title="Download and installation" link="install"> |
<box title="Download and installation" link="install"> |
| 39 |
|
|
| 40 |
<p> |
<p> |
| 41 |
The build procedure for OCamlDuce is exactly the same as for OCaml: |
Currently, OCamlDuce |
| 42 |
<tt>configure, make world, make install</tt>. The names of the tools |
is based on OCaml 3.08.4 and on a CVS snapshots |
| 43 |
are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce |
of CDuce (between 0.3.92 and the head). |
|
is based on CVS snapshots of OCaml (between 3.08.3 and the current |
|
|
<tt>release308</tt> branch) and CDuce (between 0.3.91 and the head). |
|
| 44 |
</p> |
</p> |
| 45 |
|
|
| 46 |
<ul> |
<ul> |
| 47 |
<li><a |
<li><a |
| 48 |
href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler, |
href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl5.tar.gz">Compiler, |
| 49 |
version 0.0.5</a></li> |
version 3.08.4, patch level 5</a></li> |
|
<!--<li><a |
|
|
href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support |
|
|
library, version 0.0.4</a></li>--> |
|
| 50 |
</ul> |
</ul> |
| 51 |
|
|
| 52 |
<p> |
<p> |
| 53 |
GODI users can upgrade an existing installation by adding this |
There are two different installation modes: |
| 54 |
|
</p> |
| 55 |
|
|
| 56 |
|
<ul> |
| 57 |
|
<li><b>Stand-alone mode</b>. OCamlDuce is used as a drop-in |
| 58 |
|
replacement for OCaml. The build procedure is unchanged: |
| 59 |
|
<tt>./configure && make world && make install</tt>. |
| 60 |
|
The tools are named <tt>ocaml, ocamlc, ocamlopt</tt>, ... |
| 61 |
|
The standard library is extended with the <tt>num</tt> library |
| 62 |
|
and the <tt>Ocamlduce</tt> module. |
| 63 |
|
</li> |
| 64 |
|
|
| 65 |
|
<li><b>Package mode</b>. OCamlDuce is installed on top of an existing |
| 66 |
|
OCaml installation (whose version number must match), without touching |
| 67 |
|
it. The build |
| 68 |
|
procedure is: <tt>./configure && make all && make opt |
| 69 |
|
&& make install</tt>. The <tt>configure</tt> script should be called with |
| 70 |
|
the same arguments as the ones used when you built OCaml. For instance, |
| 71 |
|
the <tt>LIBDIR</tt> argument is used to find OCaml standard library. |
| 72 |
|
The tools names are changed to <tt>ocamlduce, ocamlducec, |
| 73 |
|
ocamlduceopt</tt>, ... They use the existing standard library. |
| 74 |
|
In addition, a library <tt>ocamlduce.cma</tt> is built. |
| 75 |
|
It depends on the <tt>nums.cma</tt> library. The <tt>install</tt> |
| 76 |
|
target implements a <tt>Findlib</tt>-based installation. It registers |
| 77 |
|
a package named <tt>ocamlduce</tt> and it puts the tools |
| 78 |
|
in the package sub-directory (the <tt>BINDIR</tt> and <tt>LIBDIR</tt> |
| 79 |
|
arguments to <tt>configure</tt> are not used). The toplevel |
| 80 |
|
can be called by <tt>ocamlfind ocamlduce/ocamlduce -I `ocamlfind query ocamlduce`</tt>. |
| 81 |
|
</li> |
| 82 |
|
</ul> |
| 83 |
|
|
| 84 |
|
</box> |
| 85 |
|
|
| 86 |
|
<box title="Ports and packages" link="ports"> |
| 87 |
|
|
| 88 |
|
<section title="GODI"> |
| 89 |
|
<p> |
| 90 |
|
GODI users can choose any of the two installation modes. |
| 91 |
|
In order to upgrade an existing installation so as to use |
| 92 |
|
OCamlDuce in place of OCaml, they must add this |
| 93 |
line to their <tt>etc/godi.conf</tt> file: |
line to their <tt>etc/godi.conf</tt> file: |
| 94 |
</p> |
</p> |
| 95 |
<sample> |
<sample> |
| 96 |
GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi |
GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi |
| 97 |
</sample> |
</sample> |
| 98 |
<p> |
<p> |
| 99 |
and by forcing a recompilation of the <tt>godi-ocaml-src</tt> |
and force a recompilation of the <tt>godi-ocaml-src</tt> |
| 100 |
and <tt>godi-ocaml</tt> packages. <!--They should also build |
and <tt>godi-ocaml</tt> packages. The alternative is to install |
| 101 |
the <tt>godi-xml-support</tt> library.--> |
OCamlDuce |
| 102 |
|
as a GODI package over an existing installation. You don't need |
| 103 |
|
to touch the <tt>etc/godi.conf</tt> file. The package |
| 104 |
|
name is <tt>godi-ocamlduce</tt>. In order to use the new compilers |
| 105 |
|
and tools, you can make the environment variable |
| 106 |
|
<tt>OCAMLFIND_CONF</tt> point to the |
| 107 |
|
<tt>$GODI/etc/findlib-ocamlduce.conf</tt> file and then |
| 108 |
|
uses e.g. <tt>ocamlfind ocamlc -package ocamlduce</tt>. |
| 109 |
</p> |
</p> |
| 110 |
|
</section> |
| 111 |
|
|
| 112 |
|
<section title="DarwinPorts and OpenBSD"> |
| 113 |
|
|
|
<!-- |
|
| 114 |
<p> |
<p> |
| 115 |
Some simple examples can be found <a --> |
Anil Madhavapeddy contributed two ports of OCamlDuce for DarwinPorts |
| 116 |
<!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p> |
(in dports/lang/ocamlduce) and for OpenBSD (in ports/lang/ocamlduce). |
| 117 |
--> |
</p> |
| 118 |
|
|
| 119 |
|
</section> |
| 120 |
|
|
| 121 |
</box> |
</box> |
| 122 |
|
|
| 123 |
<box title="Overview" link="overview"> |
<box title="Overview" link="overview"> |
| 124 |
|
|
| 125 |
<p> |
<p> |
| 126 |
|
The goal of the OCamlDuce project is to extend the OCaml language with features |
| 127 |
|
to make it easier to write safe and efficient complex applications |
| 128 |
|
that need to deal with XML documents. In particular, it relies |
| 129 |
|
on a notion of types and patterns to guarantee statically |
| 130 |
|
that all the possible input documents are correctly processed, and |
| 131 |
|
that only valid output documents are produced. |
| 132 |
|
</p> |
| 133 |
|
|
| 134 |
|
<p> |
| 135 |
In a nutshell, OCamlDuce extends OCaml with a new kind of values |
In a nutshell, OCamlDuce extends OCaml with a new kind of values |
| 136 |
(<em>x-values</em>) to represent XML documents, fragments, tags, Unicode |
(<em>x-values</em>) to represent XML documents, fragments, tags, Unicode |
| 137 |
strings. In order to describe these values, it also extends the type algebra |
strings. In order to describe these values, it also extends the type algebra |
| 548 |
function {{p1}} -> e1 | ... | {{pn}} -> en |
function {{p1}} -> e1 | ... | {{pn}} -> en |
| 549 |
</p> |
</p> |
| 550 |
|
|
| 551 |
|
<p> |
| 552 |
|
Pattern matching follows is first-match policy. The first pattern |
| 553 |
|
that succeeds triggers the corresponding branch. |
| 554 |
|
</p> |
| 555 |
|
|
| 556 |
<note> |
<note> |
| 557 |
currently it is impossible to mix normal OCaml patterns and x-patterns |
currently it is impossible to mix normal OCaml patterns and x-patterns |
| 558 |
in a single pattern matching. |
in a single pattern matching. |
| 680 |
</ul> |
</ul> |
| 681 |
|
|
| 682 |
<p> |
<p> |
| 683 |
In record x-patterns, it is possible to omit the <code>=p</code> part of a field. |
Here is a brief description of the semantics of patterns. Given |
| 684 |
The content is then replaced with the label name considered as |
an input value, a pattern can either succeed or fail. If it succeeds, |
| 685 |
a capture variable. E.g. <code>{ x y=p }</code> is equivalent to |
it also produces a bindings from the capture variables in the pattern |
| 686 |
<code>{ x=x y=p }</code>.</p> |
to x-values. |
| 687 |
|
</p> |
| 688 |
|
|
| 689 |
|
<ul> |
| 690 |
|
|
| 691 |
|
<li>A pattern which is just a type (no capture variable) succeeds if |
| 692 |
|
and only if the value has the type.</li> |
| 693 |
|
|
| 694 |
|
<li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code> |
| 695 |
|
or <code>p2</code> succeed, and returns the corresponding binding; if |
| 696 |
|
both patterns succeeds, <code>p1</code> wins. It is required that |
| 697 |
|
<code>p1</code> and <code>p2</code> have the same sets of capture |
| 698 |
|
variables. </li> |
| 699 |
|
|
| 700 |
|
<li>A pattern <code>p1 & p2</code> succeeds if both <code>p1</code> |
| 701 |
|
and <code>p2</code> succeed, and returns the concatenation of the two |
| 702 |
|
bindings. It is required that <code>p1</code> and <code>p2</code> have |
| 703 |
|
<em>disjoint</em> sets of capture variables. </li> |
| 704 |
|
|
| 705 |
|
</ul> |
| 706 |
|
|
| 707 |
|
<p> |
| 708 |
|
In record x-patterns, it is possible to omit the <code>=p</code> part |
| 709 |
|
of a field. The content is then replaced with the label name |
| 710 |
|
considered as a capture variable (or as a previously defined type). |
| 711 |
|
E.g. <code>{ x y=p }</code> is |
| 712 |
|
equivalent to <code>{ x=x y=p }</code>.</p> |
| 713 |
|
|
| 714 |
<p>It is also possible to add an "else" clause: |
<p>It is also possible to add an "else" clause: |
| 715 |
<code>{ x = (a,_)|(a:=3) }</code> |
<code>{ x = (a,_)|(a:=3) }</code> |
| 727 |
repetition) in a regexp, it is bound to the concatenation of all |
repetition) in a regexp, it is bound to the concatenation of all |
| 728 |
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will |
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will |
| 729 |
collect in <code>x</code> all the elements of type <code>Int</code> from |
collect in <code>x</code> all the elements of type <code>Int</code> from |
| 730 |
a sequence.</p> |
a sequence. It is not legal to have repeated simple capture variables. |
| 731 |
|
</p> |
| 732 |
|
|
| 733 |
<p> |
<p> |
| 734 |
The regexp operators <code>+,*,?</code> are greedy by default (they match as long |
The regexp operators <code>+,*,?</code> are greedy by default (they match as long |
| 1023 |
|
|
| 1024 |
</box> |
</box> |
| 1025 |
|
|
| 1026 |
<box title="Code samples" link="code"> |
<box title="Marshaling" link="marshal"> |
| 1027 |
|
|
| 1028 |
|
<p> |
| 1029 |
|
OCamlDuce use some tricks on its internal representation of x-values |
| 1030 |
|
to reduce memory usage and improve performance. You need to pay |
| 1031 |
|
special attention if you want to use OCaml serialization functions |
| 1032 |
|
(module <code>Marshal</code>, functions |
| 1033 |
|
<code>input_value/output_value</code>) on x-values. In addition to |
| 1034 |
|
your values, you also need to save and restore some piece of internal data |
| 1035 |
|
using the functions <code>Cduce_types.Value.extract_all</code> and |
| 1036 |
|
<code>Cduce_types.Value.intract_all</code>. Of course, this also |
| 1037 |
|
applies if the value to be serialized contains deeply nested x-values. |
| 1038 |
|
</p> |
| 1039 |
|
|
| 1040 |
|
<p> |
| 1041 |
|
Here are generic |
| 1042 |
|
serialization/deserializations functions that illustrate how to do it: |
| 1043 |
|
</p> |
| 1044 |
|
|
| 1045 |
|
<sample> |
| 1046 |
|
let my_output_value oc v = |
| 1047 |
|
let p = Cduce_types.Value.extract_all () in |
| 1048 |
|
output_value oc (p,v) |
| 1049 |
|
|
| 1050 |
|
let my_input_value ic = |
| 1051 |
|
let (p,v) = input_value ic in |
| 1052 |
|
Cduce_types.Value.intract_all p; |
| 1053 |
|
v |
| 1054 |
|
</sample> |
| 1055 |
|
|
| 1056 |
|
</box> |
| 1057 |
|
|
| 1058 |
|
<box title="Performance" link="perf"> |
| 1059 |
|
|
| 1060 |
|
<section title="Strings"> |
| 1061 |
|
|
| 1062 |
|
<p> |
| 1063 |
|
OCaml users might be surprised by the fact that x-strings are simply |
| 1064 |
|
represented as sequences in OCamlDuce. Does this mean that they are |
| 1065 |
|
actually stored in memory as linked list? Certainly not! The internal |
| 1066 |
|
representation of sequence values uses several tricks to improve |
| 1067 |
|
performance and memory usage. In particular, a special form in the |
| 1068 |
|
representation can store strings as byte buffers, as in OCaml. |
| 1069 |
|
It an XML document is loaded, or if a Caml string is converted |
| 1070 |
|
to an x-value, this compact representation will be used. |
| 1071 |
|
</p> |
| 1072 |
|
|
| 1073 |
|
</section> |
| 1074 |
|
|
| 1075 |
|
<section title="Concatenation"> |
| 1076 |
|
|
| 1077 |
|
<p> |
| 1078 |
|
Similarly, OCaml users might be relectutant to use the sequence |
| 1079 |
|
concatenation <code>@</code> on sequences. In OCaml, the complexity |
| 1080 |
|
of this operator is linear in the size of its first argument (which |
| 1081 |
|
need to be copied). OCamlDuce use a special form in its internal |
| 1082 |
|
representation to store concatenation in a lazy way. The concatenation |
| 1083 |
|
will really by computed only when the value is accessed. This means |
| 1084 |
|
that it's perfectly ok to build a long sequence by adding |
| 1085 |
|
new elements at the end one by one, as long as you don't |
| 1086 |
|
simultaneously inspect the sequence. |
| 1087 |
|
</p> |
| 1088 |
|
|
| 1089 |
|
</section> |
| 1090 |
|
|
| 1091 |
|
<section title="Pattern matching"> |
| 1092 |
|
|
| 1093 |
|
<p> |
| 1094 |
|
Another point which is worth knowing when programming in OCamlDuce |
| 1095 |
|
is that patterns can be written in a declarative style without |
| 1096 |
|
affective performance. The compiler uses static type information |
| 1097 |
|
about matched values to produce efficient code for pattern matching. |
| 1098 |
|
To illustrate this, consider the following sample: |
| 1099 |
|
</p> |
| 1100 |
|
|
| 1101 |
|
<sample><![CDATA[{{ON}} |
| 1102 |
|
x.ml: |
| 1103 |
|
|
| 1104 |
|
type a = {{ <a>[ a* ] }} |
| 1105 |
|
type b = {{ <b>[ b* ] }} |
| 1106 |
|
|
| 1107 |
|
let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1 |
| 1108 |
|
]]></sample> |
| 1109 |
|
|
| 1110 |
|
<sample><![CDATA[{{ON}} |
| 1111 |
|
y.ml: |
| 1112 |
|
|
| 1113 |
|
type a = {{ <a>[ a* ] }} |
| 1114 |
|
type b = {{ <b>[ b* ] }} |
| 1115 |
|
|
| 1116 |
|
let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1 |
| 1117 |
|
]]></sample> |
| 1118 |
|
|
| 1119 |
|
<p> |
| 1120 |
|
The two functions have exactly the same semantics, but the first |
| 1121 |
|
implementation is more declarative: it uses type checks to distinguish |
| 1122 |
|
between <code>a</code> and <code>b</code> instead of saying |
| 1123 |
|
<em>how</em> to distinguish between these two types. Imagine |
| 1124 |
|
that the definition of these types change to: |
| 1125 |
|
</p> |
| 1126 |
|
|
| 1127 |
|
<sample><![CDATA[{{ON}} |
| 1128 |
|
type a = {{ <x kind="a">[ a* ] }} |
| 1129 |
|
type b = {{ <x kind="b">[ b* ] }} |
| 1130 |
|
]]></sample> |
| 1131 |
|
|
| 1132 |
|
<p> |
| 1133 |
|
Then the first implementation still works as expected, but the |
| 1134 |
|
second one needs to be rewritten.</p> |
| 1135 |
|
|
| 1136 |
|
<p>Now one might believe that the second implementation is more |
| 1137 |
|
efficient because it tells the compiler to check only the root tag, |
| 1138 |
|
whereas the first implementation would force |
| 1139 |
|
the compiler to produce code to check that all tags in the tree |
| 1140 |
|
are <code>a</code>s. But this is not what happens! Actually, |
| 1141 |
|
you can check that the compiler will produce exactly the same code |
| 1142 |
|
for both implementations. It considers the static type information |
| 1143 |
|
about the argument of the pattern matching (here, the input type |
| 1144 |
|
of the function), and computes an efficient way to evaluate |
| 1145 |
|
patterns for the values of this type. |
| 1146 |
|
</p> |
| 1147 |
|
|
| 1148 |
|
</section> |
| 1149 |
|
|
| 1150 |
|
<section title="The map iterator"> |
| 1151 |
|
|
| 1152 |
|
<p> |
| 1153 |
|
The <code>map ... with ...</code> iterator is implemented in a |
| 1154 |
|
tail-recursive way. You can safely use it on very long sequences. |
| 1155 |
|
</p> |
| 1156 |
|
|
| 1157 |
|
</section> |
| 1158 |
|
|
| 1159 |
|
</box> |
| 1160 |
|
|
| 1161 |
|
<box title="OCaml and OCamlDuce" link="ocaml"> |
| 1162 |
|
|
| 1163 |
|
<p> |
| 1164 |
|
Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding |
| 1165 |
|
OCaml release. This means that OCamlDuce can use OCaml-generated |
| 1166 |
|
<tt>.cmi</tt> files and that it produces an OCaml-compatible |
| 1167 |
|
<tt>.cmi</tt> file if the interface does not use any x-type |
| 1168 |
|
(this file is equal to what would have been obtained by using OCaml). |
| 1169 |
|
</p> |
| 1170 |
|
|
| 1171 |
|
<p> |
| 1172 |
|
It is thus possible to use existing libraries which were compiled for |
| 1173 |
|
OCaml 3.08.4. It is also possible to use OCamlDuce to compile |
| 1174 |
|
some modules and use them in an OCaml project provided their interface |
| 1175 |
|
is pure OCaml. |
| 1176 |
|
</p> |
| 1177 |
|
|
| 1178 |
|
|
| 1179 |
|
</box> |
| 1180 |
|
|
| 1181 |
|
<box title="Code samples" link="code"> |
| 1182 |
|
|
| 1183 |
<section title="Parsing XML files"> |
<section title="Parsing XML files"> |
| 1184 |
|
|
| 1235 |
<p> |
<p> |
| 1236 |
It it interesting to introduce errors in the parser |
It it interesting to introduce errors in the parser |
| 1237 |
<code>schema_loader.ml</code> or the printer |
<code>schema_loader.ml</code> or the printer |
| 1238 |
<code>dump_schema.ml</code> and see how the type system catch them. |
<code>dump_schema.ml</code> and see how the type system catches them. |
| 1239 |
</p> |
</p> |
| 1240 |
|
|
| 1241 |
<note> |
<note> |
| 1247 |
<code>redefine</code> elements or substitution groups. |
<code>redefine</code> elements or substitution groups. |
| 1248 |
</note> |
</note> |
| 1249 |
|
|
| 1250 |
|
<note> |
| 1251 |
|
To compile the application with the provided Makefile, |
| 1252 |
|
you must make the environment variable <code>OCAMLFIND_CONF</code> |
| 1253 |
|
point to the <code>$GODI/etc/findlib-ocamlduce.conf</code> file. |
| 1254 |
|
</note> |
| 1255 |
|
|
| 1256 |
|
</section> |
| 1257 |
|
|
| 1258 |
|
<section title="String regular expressions"> |
| 1259 |
|
|
| 1260 |
|
<p> |
| 1261 |
|
OCamlDuce supports regular expression types and patterns, not only |
| 1262 |
|
for sequences of XML elements, but also for strings. The following |
| 1263 |
|
example shows how to use regular expressions to split a string |
| 1264 |
|
of the form <code>name1=val1,...,namen=valn</code> with |
| 1265 |
|
<code>n>0</code> into |
| 1266 |
|
a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>. |
| 1267 |
|
The <code>*?</code> operator in regular expressions means ``ungreedy |
| 1268 |
|
match'' (match the shortest possible subsequence). The last |
| 1269 |
|
pattern describes precisely strings which are not matched by |
| 1270 |
|
the other cases. It would be possible to replace it with |
| 1271 |
|
the wildcard <code>_</code>. |
| 1272 |
|
</p> |
| 1273 |
|
|
| 1274 |
|
<sample><![CDATA[{{ON}} |
| 1275 |
|
let rec split (s : {{ String }}) = |
| 1276 |
|
match s with |
| 1277 |
|
| {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest) |
| 1278 |
|
| {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ] |
| 1279 |
|
| {{ Any - [ _* '=' _* ] }} -> failwith "split" |
| 1280 |
|
]]></sample> |
| 1281 |
|
|
| 1282 |
</section> |
</section> |
| 1283 |
|
|
| 1284 |
</box> |
</box> |
| 1285 |
|
|
| 1286 |
|
<box title="Applications in OCamlDuce" link="appli"> |
| 1287 |
|
|
| 1288 |
|
<ul> |
| 1289 |
|
<li><a |
| 1290 |
|
href="http://anil.recoil.org/projects/review2atom.html">Review2Atom</a> |
| 1291 |
|
by Anil Madhavapeddy: translates paper review files in XML format into |
| 1292 |
|
an Atom feed suitable for aggregation. |
| 1293 |
|
</li> |
| 1294 |
|
</ul> |
| 1295 |
|
|
| 1296 |
|
</box> |
| 1297 |
|
|
| 1298 |
|
|
| 1299 |
</page> |
</page> |