| 497 |
function {{p1}} -> e1 | ... | {{pn}} -> en |
function {{p1}} -> e1 | ... | {{pn}} -> en |
| 498 |
</p> |
</p> |
| 499 |
|
|
| 500 |
|
<p> |
| 501 |
|
Pattern matching follows is first-match policy. The first pattern |
| 502 |
|
that succeeds triggers the corresponding branch. |
| 503 |
|
</p> |
| 504 |
|
|
| 505 |
<note> |
<note> |
| 506 |
currently it is impossible to mix normal OCaml patterns and x-patterns |
currently it is impossible to mix normal OCaml patterns and x-patterns |
| 507 |
in a single pattern matching. |
in a single pattern matching. |
| 629 |
</ul> |
</ul> |
| 630 |
|
|
| 631 |
<p> |
<p> |
| 632 |
In record x-patterns, it is possible to omit the <code>=p</code> part of a field. |
Here is a brief description of the semantics of patterns. Given |
| 633 |
The content is then replaced with the label name considered as |
an input value, a pattern can either succeed or fail. If it succeeds, |
| 634 |
a capture variable. E.g. <code>{ x y=p }</code> is equivalent to |
it also produces a bindings from the capture variables in the pattern |
| 635 |
<code>{ x=x y=p }</code>.</p> |
to x-values. |
| 636 |
|
</p> |
| 637 |
|
|
| 638 |
|
<ul> |
| 639 |
|
|
| 640 |
|
<li>A pattern which is just a type (no capture variable) succeeds if |
| 641 |
|
and only if the value has the type.</li> |
| 642 |
|
|
| 643 |
|
<li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code> |
| 644 |
|
or <code>p2</code> succeed, and returns the corresponding binding; if |
| 645 |
|
both patterns succeeds, <code>p1</code> wins. It is required that |
| 646 |
|
<code>p1</code> and <code>p2</code> have the same sets of capture |
| 647 |
|
variables. </li> |
| 648 |
|
|
| 649 |
|
<li>A pattern <code>p1 & p2</code> succeeds if both <code>p1</code> |
| 650 |
|
and <code>p2</code> succeed, and returns the concatenation of the two |
| 651 |
|
bindings. It is required that <code>p1</code> and <code>p2</code> have |
| 652 |
|
<em>disjoint</em> sets of capture variables. </li> |
| 653 |
|
|
| 654 |
|
</ul> |
| 655 |
|
|
| 656 |
|
<p> |
| 657 |
|
In record x-patterns, it is possible to omit the <code>=p</code> part |
| 658 |
|
of a field. The content is then replaced with the label name |
| 659 |
|
considered as a capture variable. E.g. <code>{ x y=p }</code> is |
| 660 |
|
equivalent to <code>{ x=x y=p }</code>.</p> |
| 661 |
|
|
| 662 |
<p>It is also possible to add an "else" clause: |
<p>It is also possible to add an "else" clause: |
| 663 |
<code>{ x = (a,_)|(a:=3) }</code> |
<code>{ x = (a,_)|(a:=3) }</code> |
| 675 |
repetition) in a regexp, it is bound to the concatenation of all |
repetition) in a regexp, it is bound to the concatenation of all |
| 676 |
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will |
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will |
| 677 |
collect in <code>x</code> all the elements of type <code>Int</code> from |
collect in <code>x</code> all the elements of type <code>Int</code> from |
| 678 |
a sequence.</p> |
a sequence. It is not legal to have repeated simple capture variables. |
| 679 |
|
</p> |
| 680 |
|
|
| 681 |
<p> |
<p> |
| 682 |
The regexp operators <code>+,*,?</code> are greedy by default (they match as long |
The regexp operators <code>+,*,?</code> are greedy by default (they match as long |
| 971 |
|
|
| 972 |
</box> |
</box> |
| 973 |
|
|
| 974 |
<box title="Code samples" link="code"> |
<box title="Marshaling" link="marshal"> |
| 975 |
|
|
| 976 |
|
<p> |
| 977 |
|
OCamlDuce use some tricks on its internal representation of x-values |
| 978 |
|
to reduce memory usage and improve performance. You need to pay |
| 979 |
|
special attention is you want to use OCaml serialization function |
| 980 |
|
(module <code>Marshal</code>, functions |
| 981 |
|
<code>input_value/output_value</code>) on x-values. In addition to |
| 982 |
|
your values, you also need to save and restore some piece of internal data |
| 983 |
|
using the functions <code>Cduce_types.Value.extract_all</code> and |
| 984 |
|
<code>Cduce_types.Value.intract_all</code>. Of course, this also |
| 985 |
|
applies if the value to be serialized contains deeply nested x-values. |
| 986 |
|
</p> |
| 987 |
|
|
| 988 |
|
<p> |
| 989 |
|
Here are generic |
| 990 |
|
serialization/deserializations functions that illustrate how to do it: |
| 991 |
|
</p> |
| 992 |
|
|
| 993 |
|
<sample> |
| 994 |
|
let my_output_value oc v = |
| 995 |
|
let p = Cduce_types.Value.extract_all () in |
| 996 |
|
output_value oc (p,v) |
| 997 |
|
|
| 998 |
|
let my_input_value ic = |
| 999 |
|
let (p,v) = input_value ic in |
| 1000 |
|
Cduce_types.Value.intract_all p; |
| 1001 |
|
v |
| 1002 |
|
</sample> |
| 1003 |
|
|
| 1004 |
|
</box> |
| 1005 |
|
|
| 1006 |
|
<box title="Performance" link="perf"> |
| 1007 |
|
|
| 1008 |
|
<section title="Strings"> |
| 1009 |
|
|
| 1010 |
|
<p> |
| 1011 |
|
OCaml users might be surprised by the fact that x-strings are simply |
| 1012 |
|
represented as sequences in OCamlDuce. Does this mean that they are |
| 1013 |
|
actually stored in memory as linked list? Certainly not! The internal |
| 1014 |
|
representation of sequence values uses several tricks to improve |
| 1015 |
|
performance and memory usage. In particular, a special form in the |
| 1016 |
|
representation can store strings as byte buffers, as in OCaml. |
| 1017 |
|
It an XML document is loaded, or if a Caml string is converted |
| 1018 |
|
to an x-value, this compact representation will be used. |
| 1019 |
|
</p> |
| 1020 |
|
|
| 1021 |
|
</section> |
| 1022 |
|
|
| 1023 |
|
<section title="Concatenation"> |
| 1024 |
|
|
| 1025 |
|
<p> |
| 1026 |
|
Similarly, OCaml users might be relectutant to use the sequence |
| 1027 |
|
concatenation <code>@</code> on sequences. In OCaml, the complexity |
| 1028 |
|
of this operator is linear in the size of its first argument (which |
| 1029 |
|
need to be copied). OCamlDuce use a special form in its internal |
| 1030 |
|
representation to store concatenation in a lazy way. The concatenation |
| 1031 |
|
will really by computed only when the value is accessed. This means |
| 1032 |
|
that it's perfectly ok to build a long sequence by adding |
| 1033 |
|
new elements at the end one by one, as long as you don't |
| 1034 |
|
simultaneously inspect the sequence. |
| 1035 |
|
</p> |
| 1036 |
|
|
| 1037 |
|
</section> |
| 1038 |
|
|
| 1039 |
|
<section title="Pattern matching"> |
| 1040 |
|
|
| 1041 |
|
<p> |
| 1042 |
|
Another point which is worth knowing when programming in OCamlDuce |
| 1043 |
|
is that patterns can be written in a declarative style without |
| 1044 |
|
affective performance. The compiler uses static type information |
| 1045 |
|
about matched values to produce efficient code for pattern matching. |
| 1046 |
|
To illustrate this, consider the following sample: |
| 1047 |
|
</p> |
| 1048 |
|
|
| 1049 |
|
<sample><![CDATA[{{ON}} |
| 1050 |
|
x.ml: |
| 1051 |
|
|
| 1052 |
|
type a = {{ <a>[ a* ] }} |
| 1053 |
|
type b = {{ <b>[ b* ] }} |
| 1054 |
|
|
| 1055 |
|
let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1 |
| 1056 |
|
]]></sample> |
| 1057 |
|
|
| 1058 |
|
<sample><![CDATA[{{ON}} |
| 1059 |
|
y.ml: |
| 1060 |
|
|
| 1061 |
|
type a = {{ <a>[ a* ] }} |
| 1062 |
|
type b = {{ <b>[ b* ] }} |
| 1063 |
|
|
| 1064 |
|
let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1 |
| 1065 |
|
]]></sample> |
| 1066 |
|
|
| 1067 |
|
<p> |
| 1068 |
|
The two functions have exactly the same semantics, but the first |
| 1069 |
|
implementation is more declarative: it uses type checks to distinguish |
| 1070 |
|
between <code>a</code> and <code>b</code> instead of saying |
| 1071 |
|
<em>how</em> to distinguish between these two types. Imagine |
| 1072 |
|
that the definition of these types change to: |
| 1073 |
|
</p> |
| 1074 |
|
|
| 1075 |
|
<sample><![CDATA[{{ON}} |
| 1076 |
|
type a = {{ <x kind="a">[ a* ] }} |
| 1077 |
|
type b = {{ <x kind="b">[ b* ] }} |
| 1078 |
|
]]></sample> |
| 1079 |
|
|
| 1080 |
|
<p> |
| 1081 |
|
Then the first implementation still works as expected, but the |
| 1082 |
|
second one needs to be rewritten.</p> |
| 1083 |
|
|
| 1084 |
|
<p>Now one might believe that the second implementation is more |
| 1085 |
|
efficient because it tells the compiler to check only the root tag, |
| 1086 |
|
whereas the first implementation would force |
| 1087 |
|
the compiler to produce code to check that all tags in the tree |
| 1088 |
|
are <code>a</code>s. But this is not what happens! Actually, |
| 1089 |
|
you can check that the compiler will produce exactly the same code |
| 1090 |
|
for both implementations. It considers the static type information |
| 1091 |
|
about the argument of the pattern matching (here, the input type |
| 1092 |
|
of the function), and computes an efficient way to evaluate |
| 1093 |
|
patterns for the values of this type. |
| 1094 |
|
</p> |
| 1095 |
|
|
| 1096 |
|
</section> |
| 1097 |
|
|
| 1098 |
|
<section title="The map iterator"> |
| 1099 |
|
|
| 1100 |
|
<p> |
| 1101 |
|
The <code>map ... with ...</code> iterator is implemented in a |
| 1102 |
|
tail-recursive way. You can safely use it on very long sequences. |
| 1103 |
|
</p> |
| 1104 |
|
|
| 1105 |
|
</section> |
| 1106 |
|
|
| 1107 |
|
</box> |
| 1108 |
|
|
| 1109 |
|
<box title="Code samples" link="code"> |
| 1110 |
|
|
| 1111 |
<section title="Parsing XML files"> |
<section title="Parsing XML files"> |
| 1112 |
|
|
| 1163 |
<p> |
<p> |
| 1164 |
It it interesting to introduce errors in the parser |
It it interesting to introduce errors in the parser |
| 1165 |
<code>schema_loader.ml</code> or the printer |
<code>schema_loader.ml</code> or the printer |
| 1166 |
<code>dump_schema.ml</code> and see how the type system catch them. |
<code>dump_schema.ml</code> and see how the type system catches them. |
| 1167 |
</p> |
</p> |
| 1168 |
|
|
| 1169 |
<note> |
<note> |