/[svn]/web/ocaml.xml
ViewVC logotype

Diff of /web/ocaml.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1790 by abate, Tue Jul 10 19:23:06 2007 UTC revision 1801 by abate, Tue Jul 10 19:23:46 2007 UTC
# Line 32  Line 32 
32  <box title="Download and installation" link="install">  <box title="Download and installation" link="install">
33    
34  <p>  <p>
35  The build procedure for OCamlDuce is exactly the same as for OCaml:  Currently, OCamlDuce
36  <tt>configure, make world, make install</tt>. The names of the tools  is based on OCaml 3.08.4 and on a CVS snapshots
37  are unchanged: <tt>ocaml,ocamlc,ocamlopt</tt>. Currently, OCamlDuce  of CDuce (between 0.3.92 and the head).
 is based on CVS snapshots of OCaml (between 3.08.3 and the current  
 <tt>release308</tt> branch) and CDuce (between 0.3.91 and the head).  
38  </p>  </p>
39    
40  <ul>  <ul>
41  <li><a  <li><a
42  href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/cduce-ocaml-0.0.5.tar.gz">Compiler,  href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/ocamlduce-3.08.4pl1.tar.gz">Compiler,
43  version 0.0.5</a></li>  version 3.08.4, patch level 1</a></li>
 <!--<li><a  
 href="http://pauillac.inria.fr/~frisch/ocamlcduce/download/xml-support-0.0.4.tar.gz">Support  
 library, version 0.0.4</a></li>-->  
44  </ul>  </ul>
45    
46  <p>  <p>
47  GODI users can upgrade an existing installation by adding this  There are two different installation modes:
48    </p>
49    
50    <ul>
51    <li><b>Stand-alone mode</b>. OCamlDuce is used as a drop-in
52    replacement for OCaml. The build procedure is unchanged:
53    <tt>./configure &amp;&amp; make world &amp;&amp; make install</tt>.
54    The tools are named <tt>ocaml, ocamlc, ocamlopt</tt>, ...
55    The standard library is extended with the <tt>num</tt> library
56    and the <tt>Ocamlduce</tt> module.
57    </li>
58    
59    <li><b>Package mode</b>. OCamlDuce is installed on top of an existing
60    OCaml installation (whose version number must match), without touching
61    it. The build
62    procedure is: <tt>./configure &amp;&amp; make all &amp;&amp; make opt
63    &amp;&amp; make install</tt>.  The <tt>configure</tt> script should be called with
64    the same arguments as the ones used when you built OCaml. For instance,
65    the <tt>LIBDIR</tt> argument is used to find OCaml standard library.
66    The tools names are changed to <tt>ocamlduce, ocamlducec,
67    ocamlduceopt</tt>, ...  They use the existing standard library.
68    In addition, a library <tt>ocamlduce.cma</tt> is built.
69    It depends on the <tt>nums.cma</tt> library. The <tt>install</tt>
70    target implements a <tt>Findlib</tt>-based installation. It registers
71    a package named <tt>ocamlduce</tt> and it puts the tools
72    in the package sub-directory (the <tt>BINDIR</tt> and <tt>LIBDIR</tt>
73    arguments to <tt>configure</tt> are not used). The toplevel
74    can be called by <tt>ocamlfind ocamlduce/ocamlduce -I `ocamlfind query ocamlduce`</tt>.
75    </li>
76    </ul>
77    
78    <p>
79    GODI users can choose any of these two modes.
80    In order to upgrade an existing installation so as to use
81    OCamlDuce in place of OCaml, they must add this
82  line to their <tt>etc/godi.conf</tt> file:  line to their <tt>etc/godi.conf</tt> file:
83  </p>  </p>
84  <sample>  <sample>
85  GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi  GODI_BUILD_SITES += http://pauillac.inria.fr/~frisch/ocamlcduce/godi
86  </sample>  </sample>
87  <p>  <p>
88  and by forcing a recompilation of the <tt>godi-ocaml-src</tt>  and force a recompilation of the <tt>godi-ocaml-src</tt>
89  and <tt>godi-ocaml</tt> packages. <!--They should also build  and <tt>godi-ocaml</tt> packages. The alternative is to install OCamlDuce
90  the <tt>godi-xml-support</tt> library.-->  as a GODI package over an existing installation. You don't need
91    to touch the <tt>etc/godi.conf</tt> file. The package
92    name is <tt>godi-ocamlduce</tt>. In order to use the new compilers
93    and tools, you can make the environment variable
94    <tt>OCAMLFIND_CONF</tt> point to the
95    <tt>$GODI/etc/findlib-ocamlduce.conf</tt> file and then
96    uses e.g. <tt>ocamlfind ocamlc -package ocamlduce</tt>.
97  </p>  </p>
98    
 <!--  
 <p>  
 Some simple examples can be found <a -->  
 <!--href="http://pauillac.inria.fr/~frisch/ocamlcduce/tests/">here</a>.</p>  
 -->  
   
99  </box>  </box>
100    
101  <box title="Overview" link="overview">  <box title="Overview" link="overview">
102    
103  <p>  <p>
104    The goal of the OCamlDuce project is to extend the OCaml language with features
105    to make it easier to write safe and efficient complex applications
106    that need to deal with XML documents. In particular, it relies
107    on a notion of types and patterns to guarantee statically
108    that all the possible input documents are correctly processed, and
109    that only valid output documents are produced.
110    </p>
111    
112    <p>
113  In a nutshell, OCamlDuce extends OCaml with a new kind of values  In a nutshell, OCamlDuce extends OCaml with a new kind of values
114  (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode  (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
115  strings. In order to describe these values, it also extends the type algebra  strings. In order to describe these values, it also extends the type algebra
# Line 488  Line 526 
526     function {{p1}} -> e1 | ... | {{pn}} -> en     function {{p1}} -> e1 | ... | {{pn}} -> en
527  </p>  </p>
528    
529    <p>
530    Pattern matching follows is first-match policy. The first pattern
531    that succeeds triggers the corresponding branch.
532    </p>
533    
534  <note>  <note>
535  currently it is impossible to mix normal OCaml patterns and x-patterns  currently it is impossible to mix normal OCaml patterns and x-patterns
536  in a single pattern matching.  in a single pattern matching.
# Line 615  Line 658 
658  </ul>  </ul>
659    
660  <p>  <p>
661  In record x-patterns, it is possible to omit the <code>=p</code> part of a field.  Here is a brief description of the semantics of patterns. Given
662  The content is then replaced with the label name considered as  an input value, a pattern can either succeed or fail. If it succeeds,
663  a capture variable. E.g.  <code>{ x y=p }</code> is equivalent to  it also produces a bindings from the capture variables in the pattern
664  <code>{ x=x y=p }</code>.</p>  to x-values.
665    </p>
666    
667    <ul>
668    
669    <li>A pattern which is just a type (no capture variable) succeeds if
670    and only if the value has the type.</li>
671    
672    <li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
673    or <code>p2</code> succeed, and returns the corresponding binding; if
674    both patterns succeeds, <code>p1</code> wins. It is required that
675    <code>p1</code> and <code>p2</code> have the same sets of capture
676    variables. </li>
677    
678    <li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
679    and <code>p2</code> succeed, and returns the concatenation of the two
680    bindings. It is required that <code>p1</code> and <code>p2</code> have
681    <em>disjoint</em> sets of capture variables. </li>
682    
683    </ul>
684    
685    <p>
686    In record x-patterns, it is possible to omit the <code>=p</code> part
687    of a field.  The content is then replaced with the label name
688    considered as a capture variable. E.g.  <code>{ x y=p }</code> is
689    equivalent to <code>{ x=x y=p }</code>.</p>
690    
691  <p>It is also possible to add an "else" clause:  <p>It is also possible to add an "else" clause:
692  <code>{ x = (a,_)|(a:=3) }</code>  <code>{ x = (a,_)|(a:=3) }</code>
# Line 636  Line 704 
704  repetition) in a regexp, it is bound to the concatenation of all  repetition) in a regexp, it is bound to the concatenation of all
705  matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will  matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
706  collect in <code>x</code> all the elements of type <code>Int</code> from  collect in <code>x</code> all the elements of type <code>Int</code> from
707  a sequence.</p>  a sequence. It is not legal to have repeated simple capture variables.
708    </p>
709    
710  <p>  <p>
711  The regexp operators <code>+,*,?</code> are greedy by default (they match as long  The regexp operators <code>+,*,?</code> are greedy by default (they match as long
# Line 931  Line 1000 
1000    
1001  </box>  </box>
1002    
1003  <box title="Code samples" link="code">  <box title="Marshaling" link="marshal">
1004    
1005    <p>
1006    OCamlDuce use some tricks on its internal representation of x-values
1007    to reduce memory usage and improve performance. You need to pay
1008    special attention if you want to use OCaml serialization functions
1009    (module <code>Marshal</code>, functions
1010    <code>input_value/output_value</code>) on x-values. In addition to
1011    your values, you also need to save and restore some piece of internal data
1012    using the functions <code>Cduce_types.Value.extract_all</code> and
1013    <code>Cduce_types.Value.intract_all</code>. Of course, this also
1014    applies if the value to be serialized contains deeply nested x-values.
1015    </p>
1016    
1017    <p>
1018    Here are generic
1019    serialization/deserializations functions that illustrate how to do it:
1020    </p>
1021    
1022    <sample>
1023    let my_output_value oc v =
1024      let p = Cduce_types.Value.extract_all () in
1025      output_value oc (p,v)
1026    
1027    let my_input_value ic =
1028      let (p,v) = input_value ic in
1029      Cduce_types.Value.intract_all p;
1030      v
1031    </sample>
1032    
1033    </box>
1034    
1035    <box title="Performance" link="perf">
1036    
1037    <section title="Strings">
1038    
1039    <p>
1040    OCaml users might be surprised by the fact that x-strings are simply
1041    represented as sequences in OCamlDuce. Does this mean that they are
1042    actually stored in memory as linked list? Certainly not!  The internal
1043    representation of sequence values uses several tricks to improve
1044    performance and memory usage. In particular, a special form in the
1045    representation can store strings as byte buffers, as in OCaml.
1046    It an XML document is loaded, or if a Caml string is converted
1047    to an x-value, this compact representation will be used.
1048    </p>
1049    
1050    </section>
1051    
1052    <section title="Concatenation">
1053    
1054    <p>
1055    Similarly, OCaml users might be relectutant to use the sequence
1056    concatenation <code>@</code> on sequences. In OCaml, the complexity
1057    of this operator is linear in the size of its first argument (which
1058    need to be copied). OCamlDuce use a special form in its internal
1059    representation to store concatenation in a lazy way. The concatenation
1060    will really by computed only when the value is accessed. This means
1061    that it's perfectly ok to build a long sequence by adding
1062    new elements at the end one by one, as long as you don't
1063    simultaneously inspect the sequence.
1064    </p>
1065    
1066    </section>
1067    
1068    <section title="Pattern matching">
1069    
1070    <p>
1071    Another point which is worth knowing when programming in OCamlDuce
1072    is that patterns can be written in a declarative style without
1073    affective performance. The compiler uses static type information
1074    about matched values to produce efficient code for pattern matching.
1075    To illustrate this, consider the following sample:
1076    </p>
1077    
1078    <sample><![CDATA[{{ON}}
1079    x.ml:
1080    
1081    type a = {{ <a>[ a* ] }}
1082    type b = {{ <b>[ b* ] }}
1083    
1084    let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
1085    ]]></sample>
1086    
1087    <sample><![CDATA[{{ON}}
1088    y.ml:
1089    
1090    type a = {{ <a>[ a* ] }}
1091    type b = {{ <b>[ b* ] }}
1092    
1093    let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
1094    ]]></sample>
1095    
1096    <p>
1097    The two functions have exactly the same semantics, but the first
1098    implementation is more declarative: it uses type checks to distinguish
1099    between <code>a</code> and <code>b</code> instead of saying
1100    <em>how</em> to distinguish between these two types. Imagine
1101    that the definition of these types change to:
1102    </p>
1103    
1104    <sample><![CDATA[{{ON}}
1105    type a = {{ <x kind="a">[ a* ] }}
1106    type b = {{ <x kind="b">[ b* ] }}
1107    ]]></sample>
1108    
1109    <p>
1110    Then the first implementation still works as expected, but the
1111    second one needs to be rewritten.</p>
1112    
1113    <p>Now one might believe that the second implementation is more
1114    efficient because it tells the compiler to check only the root tag,
1115    whereas the first implementation would force
1116    the compiler to produce code to check that all tags in the tree
1117    are <code>a</code>s. But this is not what happens! Actually,
1118    you can check that the compiler will produce exactly the same code
1119    for both implementations. It considers the static type information
1120    about the argument of the pattern matching (here, the input type
1121    of the function), and computes an efficient way to evaluate
1122    patterns for the values of this type.
1123    </p>
1124    
1125    </section>
1126    
1127    <section title="The map iterator">
1128    
1129    <p>
1130    The <code>map ... with ...</code> iterator is implemented in a
1131    tail-recursive way. You can safely use it on very long sequences.
1132    </p>
1133    
1134    </section>
1135    
1136    </box>
1137    
1138    <box title="OCaml and OCamlDuce" link="ocaml">
1139    
1140    <p>
1141    Since the 3.08.4 release, OCamlDuce is binary compatible with the corresponding
1142    OCaml release. This means that OCamlDuce can use OCaml-generated
1143    <tt>.cmi</tt> files and that it produces an OCaml-compatible
1144    <tt>.cmi</tt> file if the interface does not use any x-type
1145    (this file is equal to what would have been obtained by using OCaml).
1146    </p>
1147    
1148    <p>
1149    It is thus possible to use existing libraries which were compiled for
1150    OCaml 3.08.4. It is also possible to use OCamlDuce to compile
1151    some modules and use them in an OCaml project provided their interface
1152    is pure OCaml.
1153    </p>
1154    
1155    
1156    </box>
1157    
1158    <box title="Code samples" link="code">
1159    
1160  <section title="Parsing XML files">  <section title="Parsing XML files">
1161    
# Line 989  Line 1212 
1212  <p>  <p>
1213  It it interesting to introduce errors in the parser  It it interesting to introduce errors in the parser
1214  <code>schema_loader.ml</code> or the printer  <code>schema_loader.ml</code> or the printer
1215  <code>dump_schema.ml</code> and see how the type system catch them.  <code>dump_schema.ml</code> and see how the type system catches them.
1216  </p>  </p>
1217    
1218  <note>  <note>

Legend:
Removed from v.1790  
changed lines
  Added in v.1801

CVS Admin">CVS Admin
ViewVC Help
Powered by ViewVC 1.1.5