/[svn]/web/ocaml.xml
ViewVC logotype

Diff of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1790 by abate, Tue Jul 10 19:23:06 2007 UTC revision 1815 by abate, Tue Jul 10 19:24:58 2007 UTC
# Line 27  Line 27 
27  been reused.  been reused.
28  </p>  </p>
29    
30    <p>
31    The theory behind OCamlDuce's type system is described in a <a
32    href="http://cristal.inria.fr/~frisch/ocamlcduce/">technical
33    report</a>.
34    </p>
35    
36  </box>  </box>
37    
38  <box title="Download and installation" link="install">  <box title="Download and installation" link="install">
39    
40  <p>  <p>
41  The build procedure for OCamlDuce is exactly the same as for OCaml:  Currently, OCamlDuce
42  <tt>configure, make world, make install</tt>. The names of the tools  is based on OCaml 3.08.4 and on a CVS snapshots
43  are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce  of CDuce (between 0.3.92 and the head).
 is based on CVS snapshots of OCaml (between 3.08.3 and the current  
 <tt>release308</tt> branch) and CDuce (between 0.3.91 and the head).  
44  </p>  </p>
45    
46  <ul>  <ul>
47  <li><a  <li><a
48  href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler,  href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl5.tar.gz">Compiler,
49  version 0.0.5</a></li>  version 3.08.4, patch level 5</a></li>
 <!--<li><a  
 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support  
 library, version 0.0.4</a></li>-->  
50  </ul>  </ul>
51    
52  <p>  <p>
53  GODI users can upgrade an existing installation by adding this  There are two different installation modes:
54    </p>
55    
56    <ul>
57    <li><b>Stand-alone mode</b>. OCamlDuce is used as a drop-in
58    replacement for OCaml. The build procedure is unchanged:
59    <tt>./configure &amp;&amp; make world &amp;&amp; make install</tt>.
60    The tools are named <tt>ocaml, ocamlc, ocamlopt</tt>, ...
61    The standard library is extended with the <tt>num</tt> library
62    and the <tt>Ocamlduce</tt> module.
63    </li>
64    
65    <li><b>Package mode</b>. OCamlDuce is installed on top of an existing
66    OCaml installation (whose version number must match), without touching
67    it. The build
68    procedure is: <tt>./configure &amp;&amp; make all &amp;&amp; make opt
69    &amp;&amp; make install</tt>.  The <tt>configure</tt> script should be called with
70    the same arguments as the ones used when you built OCaml. For instance,
71    the <tt>LIBDIR</tt> argument is used to find OCaml standard library.
72    The tools names are changed to <tt>ocamlduce, ocamlducec,
73    ocamlduceopt</tt>, ...  They use the existing standard library.
74    In addition, a library <tt>ocamlduce.cma</tt> is built.
75    It depends on the <tt>nums.cma</tt> library. The <tt>install</tt>
76    target implements a <tt>Findlib</tt>-based installation. It registers
77    a package named <tt>ocamlduce</tt> and it puts the tools
78    in the package sub-directory (the <tt>BINDIR</tt> and <tt>LIBDIR</tt>
79    arguments to <tt>configure</tt> are not used). The toplevel
80    can be called by <tt>ocamlfind ocamlduce/ocamlduce -I `ocamlfind query ocamlduce`</tt>.
81    </li>
82    </ul>
83    
84    </box>
85    
86    <box title="Ports and packages" link="ports">
87    
88    <section title="GODI">
89    <p>
90    GODI users can choose any of the two installation modes.
91    In order to upgrade an existing installation so as to use
92    OCamlDuce in place of OCaml, they must add this
93  line to their <tt>etc/godi.conf</tt> file:  line to their <tt>etc/godi.conf</tt> file:
94  </p>  </p>
95  <sample>  <sample>
96  GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi  GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
97  </sample>  </sample>
98  <p>  <p>
99  and by forcing a recompilation of the <tt>godi-ocaml-src</tt>  and force a recompilation of the <tt>godi-ocaml-src</tt>
100  and <tt>godi-ocaml</tt> packages. <!--They should also build  and <tt>godi-ocaml</tt> packages. The alternative is to install
101  the <tt>godi-xml-support</tt> library.-->  OCamlDuce
102    as a GODI package over an existing installation. You don't need
103    to touch the <tt>etc/godi.conf</tt> file. The package
104    name is <tt>godi-ocamlduce</tt>. In order to use the new compilers
105    and tools, you can make the environment variable
106    <tt>OCAMLFIND_CONF</tt> point to the
107    <tt>$GODI/etc/findlib-ocamlduce.conf</tt> file and then
108    uses e.g. <tt>ocamlfind ocamlc -package ocamlduce</tt>.
109  </p>  </p>
110    </section>
111    
112    <section title="DarwinPorts and OpenBSD">
113    
 <!--  
114  <p>  <p>
115  Some simple examples can be found <a -->  Anil Madhavapeddy contributed two ports of OCamlDuce for DarwinPorts
116  <!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p>  (in dports/lang/ocamlduce) and for OpenBSD (in ports/lang/ocamlduce).
117  -->  </p>
118    
119    </section>
120    
121  </box>  </box>
122    
123  <box title="Overview" link="overview">  <box title="Overview" link="overview">
124    
125  <p>  <p>
126    The goal of the OCamlDuce project is to extend the OCaml language with features
127    to make it easier to write safe and efficient complex applications
128    that need to deal with XML documents. In particular, it relies
129    on a notion of types and patterns to guarantee statically
130    that all the possible input documents are correctly processed, and
131    that only valid output documents are produced.
132    </p>
133    
134    <p>
135  In a nutshell, OCamlDuce extends OCaml with a new kind of values  In a nutshell, OCamlDuce extends OCaml with a new kind of values
136  (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode  (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
137  strings. In order to describe these values, it also extends the type algebra  strings. In order to describe these values, it also extends the type algebra
# Line 488  Line 548 
548     function {{p1}} -> e1 | ... | {{pn}} -> en     function {{p1}} -> e1 | ... | {{pn}} -> en
549  </p>  </p>
550    
551    <p>
552    Pattern matching follows is first-match policy. The first pattern
553    that succeeds triggers the corresponding branch.
554    </p>
555    
556  <note>  <note>
557  currently it is impossible to mix normal OCaml patterns and x-patterns  currently it is impossible to mix normal OCaml patterns and x-patterns
558  in a single pattern matching.  in a single pattern matching.
# Line 615  Line 680 
680  </ul>  </ul>
681    
682  <p>  <p>
683  In record x-patterns, it is possible to omit the <code>=p</code> part of a field.  Here is a brief description of the semantics of patterns. Given
684  The content is then replaced with the label name considered as  an input value, a pattern can either succeed or fail. If it succeeds,
685  a capture variable. E.g.  <code>{ x y=p }</code> is equivalent to  it also produces a bindings from the capture variables in the pattern
686  <code>{ x=x y=p }</code>.</p>  to x-values.
687    </p>
688    
689    <ul>
690    
691    <li>A pattern which is just a type (no capture variable) succeeds if
692    and only if the value has the type.</li>
693    
694    <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
695    or <code>p2</code> succeed, and returns the corresponding binding; if
696    both patterns succeeds, <code>p1</code> wins. It is required that
697    <code>p1</code> and <code>p2</code> have the same sets of capture
698    variables. </li>
699    
700    <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
701    and <code>p2</code> succeed, and returns the concatenation of the two
702    bindings. It is required that <code>p1</code> and <code>p2</code> have
703    <em>disjoint</em> sets of capture variables. </li>
704    
705    </ul>
706    
707    <p>
708    In record x-patterns, it is possible to omit the <code>=p</code> part
709    of a field.  The content is then replaced with the label name
710    considered as a capture variable (or as a previously defined type).
711     E.g.  <code>{ x y=p }</code> is
712    equivalent to <code>{ x=x y=p }</code>.</p>
713    
714  <p>It is also possible to add an "else" clause:  <p>It is also possible to add an "else" clause:
715  <code>{ x = (a,_)|(a:=3) }</code>  <code>{ x = (a,_)|(a:=3) }</code>
# Line 636  Line 727 
727  repetition) in a regexp, it is bound to the concatenation of all  repetition) in a regexp, it is bound to the concatenation of all
728  matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will  matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
729  collect in <code>x</code> all the elements of type <code>Int</code> from  collect in <code>x</code> all the elements of type <code>Int</code> from
730  a sequence.</p>  a sequence. It is not legal to have repeated simple capture variables.
731    </p>
732    
733  <p>  <p>
734  The regexp operators <code>+,*,?</code> are greedy by default (they match as long  The regexp operators <code>+,*,?</code> are greedy by default (they match as long
# Line 931  Line 1023 
1023    
1024  </box>  </box>
1025    
1026  <box title="Code samples" link="code">  <box title="Marshaling" link="marshal">
1027    
1028    <p>
1029    OCamlDuce use some tricks on its internal representation of x-values
1030    to reduce memory usage and improve performance. You need to pay
1031    special attention if you want to use OCaml serialization functions
1032    (module <code>Marshal</code>, functions
1033    <code>input_value/output_value</code>) on x-values. In addition to
1034    your values, you also need to save and restore some piece of internal data
1035    using the functions <code>Cduce_types.Value.extract_all</code> and
1036    <code>Cduce_types.Value.intract_all</code>. Of course, this also
1037    applies if the value to be serialized contains deeply nested x-values.
1038    </p>
1039    
1040    <p>
1041    Here are generic
1042    serialization/deserializations functions that illustrate how to do it:
1043    </p>
1044    
1045    <sample>
1046    let my_output_value oc v =
1047      let p = Cduce_types.Value.extract_all () in
1048      output_value oc (p,v)
1049    
1050    let my_input_value ic =
1051      let (p,v) = input_value ic in
1052      Cduce_types.Value.intract_all p;
1053      v
1054    </sample>
1055    
1056    </box>
1057    
1058    <box title="Performance" link="perf">
1059    
1060    <section title="Strings">
1061    
1062    <p>
1063    OCaml users might be surprised by the fact that x-strings are simply
1064    represented as sequences in OCamlDuce. Does this mean that they are
1065    actually stored in memory as linked list? Certainly not!  The internal
1066    representation of sequence values uses several tricks to improve
1067    performance and memory usage. In particular, a special form in the
1068    representation can store strings as byte buffers, as in OCaml.
1069    It an XML document is loaded, or if a Caml string is converted
1070    to an x-value, this compact representation will be used.
1071    </p>
1072    
1073    </section>
1074    
1075    <section title="Concatenation">
1076    
1077    <p>
1078    Similarly, OCaml users might be relectutant to use the sequence
1079    concatenation <code>@</code> on sequences. In OCaml, the complexity
1080    of this operator is linear in the size of its first argument (which
1081    need to be copied). OCamlDuce use a special form in its internal
1082    representation to store concatenation in a lazy way. The concatenation
1083    will really by computed only when the value is accessed. This means
1084    that it's perfectly ok to build a long sequence by adding
1085    new elements at the end one by one, as long as you don't
1086    simultaneously inspect the sequence.
1087    </p>
1088    
1089    </section>
1090    
1091    <section title="Pattern matching">
1092    
1093    <p>
1094    Another point which is worth knowing when programming in OCamlDuce
1095    is that patterns can be written in a declarative style without
1096    affective performance. The compiler uses static type information
1097    about matched values to produce efficient code for pattern matching.
1098    To illustrate this, consider the following sample:
1099    </p>
1100    
1101    <sample><![CDATA[{{ON}}
1102    x.ml:
1103    
1104    type a = {{ <a>[ a* ] }}
1105    type b = {{ <b>[ b* ] }}
1106    
1107    let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1108    ]]></sample>
1109    
1110    <sample><![CDATA[{{ON}}
1111    y.ml:
1112    
1113    type a = {{ <a>[ a* ] }}
1114    type b = {{ <b>[ b* ] }}
1115    
1116    let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1117    ]]></sample>
1118    
1119    <p>
1120    The two functions have exactly the same semantics, but the first
1121    implementation is more declarative: it uses type checks to distinguish
1122    between <code>a</code> and <code>b</code> instead of saying
1123    <em>how</em> to distinguish between these two types. Imagine
1124    that the definition of these types change to:
1125    </p>
1126    
1127    <sample><![CDATA[{{ON}}
1128    type a = {{ <x kind="a">[ a* ] }}
1129    type b = {{ <x kind="b">[ b* ] }}
1130    ]]></sample>
1131    
1132    <p>
1133    Then the first implementation still works as expected, but the
1134    second one needs to be rewritten.</p>
1135    
1136    <p>Now one might believe that the second implementation is more
1137    efficient because it tells the compiler to check only the root tag,
1138    whereas the first implementation would force
1139    the compiler to produce code to check that all tags in the tree
1140    are <code>a</code>s. But this is not what happens! Actually,
1141    you can check that the compiler will produce exactly the same code
1142    for both implementations. It considers the static type information
1143    about the argument of the pattern matching (here, the input type
1144    of the function), and computes an efficient way to evaluate
1145    patterns for the values of this type.
1146    </p>
1147    
1148    </section>
1149    
1150    <section title="The map iterator">
1151    
1152    <p>
1153    The <code>map ... with ...</code> iterator is implemented in a
1154    tail-recursive way. You can safely use it on very long sequences.
1155    </p>
1156    
1157    </section>
1158    
1159    </box>
1160    
1161    <box title="OCaml and OCamlDuce" link="ocaml">
1162    
1163    <p>
1164    Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1165    OCaml release. This means that OCamlDuce can use OCaml-generated
1166    <tt>.cmi</tt> files and that it produces an OCaml-compatible
1167    <tt>.cmi</tt> file if the interface does not use any x-type
1168    (this file is equal to what would have been obtained by using OCaml).
1169    </p>
1170    
1171    <p>
1172    It is thus possible to use existing libraries which were compiled for
1173    OCaml 3.08.4. It is also possible to use OCamlDuce to compile
1174    some modules and use them in an OCaml project provided their interface
1175    is pure OCaml.
1176    </p>
1177    
1178    
1179    </box>
1180    
1181    <box title="Code samples" link="code">
1182    
1183  <section title="Parsing XML files">  <section title="Parsing XML files">
1184    
# Line 989  Line 1235 
1235  <p>  <p>
1236  It it interesting to introduce errors in the parser  It it interesting to introduce errors in the parser
1237  <code>schema_loader.ml</code> or the printer  <code>schema_loader.ml</code> or the printer
1238  <code>dump_schema.ml</code> and see how the type system catch them.  <code>dump_schema.ml</code> and see how the type system catches them.
1239  </p>  </p>
1240    
1241  <note>  <note>
# Line 1001  Line 1247 
1247  <code>redefine</code> elements or substitution groups.  <code>redefine</code> elements or substitution groups.
1248  </note>  </note>
1249    
1250    <note>
1251    To compile the application with the provided Makefile,
1252    you must make the environment variable <code>OCAMLFIND_CONF</code>
1253    point to the <code>$GODI/etc/findlib-ocamlduce.conf</code> file.
1254    </note>
1255    
1256    </section>
1257    
1258    <section title="String regular expressions">
1259    
1260    <p>
1261    OCamlDuce supports regular expression types and patterns, not only
1262    for sequences of XML elements, but also for strings. The following
1263    example shows how to use regular expressions to split a string
1264    of the form <code>name1=val1,...,namen=valn</code> with
1265    <code>n>0</code> into
1266    a list of pairs <code>[ (name1,val1); ...; (namen,valn) ]</code>.
1267    The <code>*?</code> operator in regular expressions means ``ungreedy
1268    match'' (match the shortest possible subsequence). The last
1269    pattern describes precisely strings which are not matched by
1270    the other cases. It would be possible to replace it with
1271    the wildcard <code>_</code>.
1272    </p>
1273    
1274    <sample><![CDATA[{{ON}}
1275    let rec split (s : {{ String }}) =
1276      match s with
1277        | {{ [ n::_*? '=' v::_*? ',' rest::_* ] }} -> (n,v)::(split rest)
1278        | {{ [ n::_*? '=' v::_*? ] }} -> [ (n,v) ]
1279        | {{ Any - [ _* '=' _* ] }} -> failwith "split"
1280    ]]></sample>
1281    
1282  </section>  </section>
1283    
1284  </box>  </box>
1285    
1286    <box title="Applications in OCamlDuce" link="appli">
1287    
1288    <ul>
1289    <li><a
1290    href="http://anil.recoil.org/projects/review2atom.html">Review2Atom</a>
1291    by Anil Madhavapeddy: translates paper review files in XML format into
1292    an Atom feed suitable for aggregation.
1293    </li>
1294    </ul>
1295    
1296    </box>
1297    
1298    
1299  </page>  </page>

Legend:
Removed from v.1790  
changed lines
  Added in v.1815

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5