| 1 |
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
|
| 2 |
<!DOCTYPE page [
|
| 3 |
<!ENTITY larr "←"> <!-- leftwards arrow, U+2190 ISOnum -->
|
| 4 |
<!ENTITY uarr "↑"> <!-- upwards arrow, U+2191 ISOnum-->
|
| 5 |
<!ENTITY rarr "→"> <!-- rightwards arrow, U+2192 ISOnum -->
|
| 6 |
<!ENTITY darr "↓"> <!-- downwards arrow, U+2193 ISOnum -->
|
| 7 |
]>
|
| 8 |
<page name="manual_schema">
|
| 9 |
|
| 10 |
<title>XML Schema</title>
|
| 11 |
|
| 12 |
<box title="Overview" link="overview">
|
| 13 |
<p>
|
| 14 |
CDuce partially supports <a href="http://www.w3.org/XML/Schema">XML
|
| 15 |
Schema</a> Recommendations (<a
|
| 16 |
href="http://www.w3.org/TR/xmlschema-0/">Primer</a>, <a
|
| 17 |
href="http://www.w3.org/TR/xmlschema-1/">Structures</a>, <a
|
| 18 |
href="http://www.w3.org/TR/xmlschema-2/">Datatypes</a>). Using this CDuce
|
| 19 |
feature is possible to manipulate XML documents whose leaves are typed
|
| 20 |
values like integers, dates, binary data, and so on.
|
| 21 |
</p>
|
| 22 |
<p>
|
| 23 |
CDuce supports XML Schema by implementing the following features:
|
| 24 |
</p>
|
| 25 |
<ul>
|
| 26 |
<li>
|
| 27 |
<a href="#import">XML Schema components import</a>
|
| 28 |
</li>
|
| 29 |
<li>
|
| 30 |
<a href="#validation">XML Schema validation</a>
|
| 31 |
</li>
|
| 32 |
<li>
|
| 33 |
<a href="#print_xml">XML Schema instances output</a>
|
| 34 |
</li>
|
| 35 |
</ul>
|
| 36 |
<p>
|
| 37 |
This manual page describes how to use these features in CDuce, all the
|
| 38 |
documents used in the examples are available in the manual section: <local
|
| 39 |
href="manual_schema_samples">XML Schema sample documents</local>.
|
| 40 |
</p>
|
| 41 |
|
| 42 |
<note>
|
| 43 |
The support for XML Schema does not currently interact well with
|
| 44 |
separate compilation. When a CDuce unit <code>{{script}}.cd</code>
|
| 45 |
which uses an XML Schema
|
| 46 |
is compiled, the resulting <code>{{script}}.cdo</code> object
|
| 47 |
refers to the XML Schema by name. That is, when these units
|
| 48 |
are run, the XML Schema must still be available from the current
|
| 49 |
directory and must not have been changed since compilation.
|
| 50 |
</note>
|
| 51 |
|
| 52 |
</box>
|
| 53 |
|
| 54 |
<box title="XML Schema components (micro) introduction" link="primer">
|
| 55 |
<p>
|
| 56 |
An XML Schema document could define five different kinds of component, each
|
| 57 |
of them could be imported in CDuce and used as CDuce types:
|
| 58 |
</p>
|
| 59 |
<ul>
|
| 60 |
<li>
|
| 61 |
<b>Type definitions</b><br />
|
| 62 |
A type definition defines either a simple type or a complex type. The
|
| 63 |
former could be used to type more precisely the string content of an
|
| 64 |
element. You can think at it as a refinement of #PCDATA. XML Schema
|
| 65 |
provides a set of <a
|
| 66 |
href="http://www.w3.org/TR/xmlschema-2/#built-in-datatypes">predefined
|
| 67 |
simple types</a> and a way to define new simple types. The latter could
|
| 68 |
be used to constraint the content model and the attributes of an XML
|
| 69 |
element. An XML Schema complex type is strictly more expressive than a DTD
|
| 70 |
element declaration.
|
| 71 |
</li>
|
| 72 |
<li>
|
| 73 |
<b>Attribute declaration</b><br />
|
| 74 |
An attribute declaration links an attribute name to a simple type.
|
| 75 |
Optionally it can constraints the set of possible values for the attribute
|
| 76 |
mandating a fixed value or providing a default value.
|
| 77 |
</li>
|
| 78 |
<li>
|
| 79 |
<b>Element declarations</b>
|
| 80 |
An element declaration links an attribute name to a complex type.
|
| 81 |
Optionally, if the type is a simple type, it can constraints the set of
|
| 82 |
possible values for the element mandating a fixed value or providing a
|
| 83 |
default value.
|
| 84 |
</li>
|
| 85 |
<li>
|
| 86 |
<b>Attribute group definitions</b>
|
| 87 |
An attribute group definitions links a set of attribute declarations to a
|
| 88 |
name which can be referenced from other XML Schema components.
|
| 89 |
</li>
|
| 90 |
<li>
|
| 91 |
<b>Model group definitions</b>
|
| 92 |
A model group definition links a name to a constraint over the complex
|
| 93 |
content of an XML element. The linked name can be referenced from other
|
| 94 |
XML Schema components.
|
| 95 |
</li>
|
| 96 |
</ul>
|
| 97 |
</box>
|
| 98 |
|
| 99 |
<box title="XML Schema components import" link="import">
|
| 100 |
<p>
|
| 101 |
In order to import XML Schema components in CDuce, you first need to tell
|
| 102 |
CDuce to import an XML Schema document. You can do this using the
|
| 103 |
<code>schema</code> keyword to bind an uppercase identifier to a local
|
| 104 |
schema document:
|
| 105 |
</p>
|
| 106 |
<sample>
|
| 107 |
# {{schema Mails = "tests/schema/mails.xsd"}};;
|
| 108 |
Registering schema type: Mails # attachmentType
|
| 109 |
Registering schema type: Mails # mimeTopLevelType
|
| 110 |
Registering schema type: Mails # mailType
|
| 111 |
Registering schema type: Mails # envelopeType
|
| 112 |
Registering schema type: Mails # mailsType
|
| 113 |
Registering schema type: Mails # bodyType
|
| 114 |
Registering schema attribute: Mails # name
|
| 115 |
Registering schema element: Mails # Date
|
| 116 |
Registering schema element: Mails # mails
|
| 117 |
Registering schema element: Mails # header
|
| 118 |
Registering schema attribute group: Mails # mimeTypeAttributes
|
| 119 |
Registering schema model group: Mails # attachmentContent
|
| 120 |
</sample>
|
| 121 |
<p>
|
| 122 |
The above declaration will (try to) import all schema components included in
|
| 123 |
the schema document <local href="manual_schema_samples">mails.xsd</local>
|
| 124 |
as CDuce types. You can reference them using the
|
| 125 |
<code>#</code> (sharp) operator.
|
| 126 |
</p>
|
| 127 |
<p>
|
| 128 |
XML Schema permits ambiguity in components name, this implies that you can
|
| 129 |
have both an element declaration and an attribute declaration having the
|
| 130 |
same name in a single schema document. In case of no ambiguity you can
|
| 131 |
reference CDuce types corresponding to schema components just using the name
|
| 132 |
with the following syntax:<br /> <tt>schema_ref ::= </tt>
|
| 133 |
<code><schema_name> # <component_name></code><br />
|
| 134 |
Otherwise you can specify the kind of schema component as follows:<br />
|
| 135 |
<tt>|</tt> <code><schema_name> # <component_name> as
|
| 136 |
<component_kind></code><br /> where component kind is one of:<br />
|
| 137 |
<tt>component_kind ::= </tt>
|
| 138 |
<code>element | type | attribute | attribute_group | model_group</code>
|
| 139 |
<br />
|
| 140 |
</p>
|
| 141 |
<p>
|
| 142 |
The result of a schema component reference is an ordinary CDuce type which
|
| 143 |
you can use as usual in function definitions, pattern matching and so on.
|
| 144 |
</p>
|
| 145 |
<sample>
|
| 146 |
let is_valid_mail (Any -> Bool)
|
| 147 |
| {{Mails # mailType}} -> `true
|
| 148 |
| _ -> `false
|
| 149 |
</sample>
|
| 150 |
<p>
|
| 151 |
<em>
|
| 152 |
Please note the spaces which surround the sharp character, they are
|
| 153 |
needed, otherwise <code>#mailType</code> will be considered by the lexer
|
| 154 |
as a(n unexistent) directive.
|
| 155 |
</em>
|
| 156 |
</p>
|
| 157 |
</box>
|
| 158 |
<box>
|
| 159 |
<p>
|
| 160 |
<em>
|
| 161 |
<b>Correctness remark:</b> while parsing XML Schema documents, CDuce
|
| 162 |
assumes that they're correct with respect to XML Schema recommendations.
|
| 163 |
At minimum they're required to be valid with respect to <a
|
| 164 |
href="http://www.w3.org/TR/xmlschema-1/#normative-schemaSchema">XML
|
| 165 |
Schema for Schemas</a>. It's recommended that you will check for
|
| 166 |
validity your schemas before importing them in CDuce, strange behaviour is
|
| 167 |
assured otherwise.
|
| 168 |
</em>
|
| 169 |
</p>
|
| 170 |
</box>
|
| 171 |
|
| 172 |
<box title="Toplevel directives" link="directives">
|
| 173 |
<p>
|
| 174 |
The toplevel directive <code>#env</code> supports schemas, it lists the
|
| 175 |
currently defined schemas.
|
| 176 |
</p>
|
| 177 |
<sample>
|
| 178 |
# #env;;
|
| 179 |
Types: Empty Any Int Char Byte Atom Pair Arrow Record String Latin1 Bool
|
| 180 |
Namespace prefixes:
|
| 181 |
=>""
|
| 182 |
xml=>"http://www.w3.org/XML/1998/namespace"
|
| 183 |
Namespace prefixes used for pretty-printing:
|
| 184 |
{{Schemas: Mails}}
|
| 185 |
Values:
|
| 186 |
val argv : [ String* ] = ""
|
| 187 |
</sample>
|
| 188 |
<p>
|
| 189 |
The toplevel directive <code>#print_type</code> supports schemas too, it can
|
| 190 |
be used to print types corresponding to schema components with the usual
|
| 191 |
sharp syntax.
|
| 192 |
</p>
|
| 193 |
<sample>
|
| 194 |
# #print_type {{Mails # bodyType}};;
|
| 195 |
[ Char ]
|
| 196 |
</sample>
|
| 197 |
<p>
|
| 198 |
The toplevel directive <code>#print_schema</code> is not really user
|
| 199 |
friendly (because it shows some representation internals), but can be used
|
| 200 |
to show the various schema components contained in a given schema.
|
| 201 |
</p>
|
| 202 |
<sample><![CDATA[
|
| 203 |
# #print_schema Mails;;
|
| 204 |
Types: C:10:attachmentType S:mimeTopLevelType' C:12:mailType C:4:envelopeType
|
| 205 |
C:14:mailsType S:bodyType'
|
| 206 |
Attributes: @name:xsd:string
|
| 207 |
Elements: E:18:<Date> E:15:<mails> E:17:<header>
|
| 208 |
Attribute groups: {agroup:mimeTypeAttributes}
|
| 209 |
Model groups: {mgroup:attachmentContent}
|
| 210 |
]]></sample>
|
| 211 |
<p>
|
| 212 |
For more information have a look at the manual section about <local
|
| 213 |
href="manual_interpreter">toplevel directives</local>.
|
| 214 |
</p>
|
| 215 |
</box>
|
| 216 |
|
| 217 |
<box title="XML Schema → CDuce mapping" link="mapping">
|
| 218 |
<ul>
|
| 219 |
<li>
|
| 220 |
<p>
|
| 221 |
XML Schema <b>predefined simple types</b> are mapped to CDuce types
|
| 222 |
directly in the CDuce implementation preserving as most as possible XML
|
| 223 |
Schema constraints. The table below lists the most significant mappings.
|
| 224 |
</p>
|
| 225 |
<table border="1">
|
| 226 |
<tr>
|
| 227 |
<td><b>XML Schema predefined simple type</b></td>
|
| 228 |
<td><b>CDuce type</b></td>
|
| 229 |
</tr>
|
| 230 |
<tr>
|
| 231 |
<td>
|
| 232 |
<code>duration</code>, <code>dateTime</code>, <code>time</code>,
|
| 233 |
<code>date</code>, <code>gYear</code>, <code>gMonth</code>, ...
|
| 234 |
</td>
|
| 235 |
<td>
|
| 236 |
closed record types with some of the following fields (depending on
|
| 237 |
the Schema type): <code>year</code>, <code>month</code>,
|
| 238 |
<code>day</code>, <code>hour</code>, <code>minute</code>,
|
| 239 |
<code>second</code>, <code>timezone</code>
|
| 240 |
</td>
|
| 241 |
</tr>
|
| 242 |
<tr><td><code>boolean</code></td><td><code>Bool</code></td></tr>
|
| 243 |
<tr>
|
| 244 |
<td>
|
| 245 |
<code>anySimpleType</code>, <code>string</code>,
|
| 246 |
<code>base64Binary</code>, <code>hexBinary</code>,
|
| 247 |
<code>anyURI</code>
|
| 248 |
</td>
|
| 249 |
<td><code>String</code></td>
|
| 250 |
</tr>
|
| 251 |
<tr><td><code>integer</code></td><td><code>Int</code></td></tr>
|
| 252 |
<tr>
|
| 253 |
<td>
|
| 254 |
<code>nonPositiveInteger</code>, <code>negativeInteger</code>,
|
| 255 |
<code>nonNegativeInteger</code>, <code>positiveInteger</code>,
|
| 256 |
<code>long</code>, <code>int</code>, <code>short</code>,
|
| 257 |
<code>byte</code>
|
| 258 |
</td>
|
| 259 |
<td>integer intervals with the appropriate limits</td>
|
| 260 |
</tr>
|
| 261 |
<tr>
|
| 262 |
<td> <code>string</code>, <code>normalizedString</code>, and the other
|
| 263 |
types derived (directly or indirectly) by restriction from string
|
| 264 |
</td>
|
| 265 |
<td>String</td>
|
| 266 |
</tr>
|
| 267 |
<tr>
|
| 268 |
<td>
|
| 269 |
<code>NMTOKENS</code>, <code>IDREFS</code>, <code>ENTITIES</code>
|
| 270 |
</td>
|
| 271 |
<td>
|
| 272 |
<code>String</code> list (i.e. Kleene star of a <code>String</code>
|
| 273 |
type)
|
| 274 |
</td>
|
| 275 |
</tr>
|
| 276 |
<tr>
|
| 277 |
<td>
|
| 278 |
(<b>Not properly supported</b>)<br /> <code>decimal</code>,
|
| 279 |
<code>float</code>, <code>double</code>, <code>NOTATION</code>,
|
| 280 |
<code>QName</code>
|
| 281 |
</td>
|
| 282 |
<td>
|
| 283 |
<code>String</code>
|
| 284 |
</td>
|
| 285 |
</tr>
|
| 286 |
</table>
|
| 287 |
<p>
|
| 288 |
<b>Simple type definitions</b> are built from the above types following
|
| 289 |
the XML Schema derivation rules.
|
| 290 |
</p>
|
| 291 |
</li>
|
| 292 |
<li>
|
| 293 |
<p>
|
| 294 |
XML Schema <b>complex type definitions</b> are mapped to CDuce types
|
| 295 |
representing XML elements which can have any tag, but whose attributes
|
| 296 |
and content are constrained to be valid with respect to the original
|
| 297 |
complex type.
|
| 298 |
</p>
|
| 299 |
<p>
|
| 300 |
As an example, the following XML Schema complex type (a simplified
|
| 301 |
version of the homonymous <code>envelopeType</code> defined in <local
|
| 302 |
href="manual_schema_samples">mails.xsd</local>):
|
| 303 |
</p>
|
| 304 |
<sample><![CDATA[
|
| 305 |
<xsd:complexType name="envelopeType">
|
| 306 |
<xsd:sequence>
|
| 307 |
<xsd:element name="From" type="xsd:string"/>
|
| 308 |
<xsd:element name="To" type="xsd:string"/>
|
| 309 |
<xsd:element name="Date" type="xsd:dateTime"/>
|
| 310 |
<xsd:element name="Subject" type="xsd:string"/>
|
| 311 |
</xsd:sequence>
|
| 312 |
</xsd:complexType>
|
| 313 |
]]></sample>
|
| 314 |
<p>
|
| 315 |
will be mapped to an XML CDuce type which must have a <tt>From</tt>
|
| 316 |
attribute of type String and four children. Among them the <tt>Date</tt>
|
| 317 |
children must be an XML element containing a record which represents a
|
| 318 |
<tt>dateTime</tt> Schema type.
|
| 319 |
</p>
|
| 320 |
<sample><![CDATA[
|
| 321 |
# #print_type Mails # envelopeType;;
|
| 322 |
<(Any) {| |}>[
|
| 323 |
<From {| |}>String
|
| 324 |
<To {| |}>String
|
| 325 |
<Date {| |}>{
|
| 326 |
positive = Bool;
|
| 327 |
year = Int; month = Int; day = Int;
|
| 328 |
hour = Int; minute = Int; second = Int;
|
| 329 |
timezone =? { positive = Bool; hour = Int; minute = Int }
|
| 330 |
}
|
| 331 |
<Subject {| |}>String
|
| 332 |
]
|
| 333 |
]]></sample>
|
| 334 |
</li>
|
| 335 |
<li>
|
| 336 |
<p>
|
| 337 |
XML Schema <b>attribute declarations</b> are converted to closed record
|
| 338 |
types with exactly one required field corresponding to the declared
|
| 339 |
attribute.
|
| 340 |
</p>
|
| 341 |
<sample>
|
| 342 |
# #print_type Mails # name;;
|
| 343 |
{| {{name = String}} |}
|
| 344 |
</sample>
|
| 345 |
</li>
|
| 346 |
<li>
|
| 347 |
<p>
|
| 348 |
XML Schema <b>element declarations</b> can bound an XML element either
|
| 349 |
to a complex type or to a simple type. In the former case the conversion
|
| 350 |
is almost identical as what we have seen for complex type conversion.
|
| 351 |
The only difference is that this time element's tag must correspond to
|
| 352 |
the name of the XML element in the schema element declaration, whereas
|
| 353 |
previously it was <code>Any</code> type.
|
| 354 |
</p>
|
| 355 |
<p>
|
| 356 |
In the latter case (element with simple type content), the corresponding
|
| 357 |
CDuce types is an element type. Its tag must correspond to the name of
|
| 358 |
the XML element in the schema element declaration; its content type its
|
| 359 |
the CDuce translation of the simple type provided in the element
|
| 360 |
declaration.
|
| 361 |
</p>
|
| 362 |
<p>
|
| 363 |
For example, the following XML Schema element (corresponding to the
|
| 364 |
homonymous element defined in <local
|
| 365 |
href="manual_schema_samples">mails.xsd</local>):
|
| 366 |
</p>
|
| 367 |
<sample><![CDATA[
|
| 368 |
<xsd:element name="header">
|
| 369 |
<xsd:complexType>
|
| 370 |
<xsd:simpleContent>
|
| 371 |
<xsd:extension base="xsd:string">
|
| 372 |
<xsd:attribute ref="name" use="required" />
|
| 373 |
</xsd:extension>
|
| 374 |
</xsd:simpleContent>
|
| 375 |
</xsd:complexType>
|
| 376 |
</xsd:element>
|
| 377 |
]]></sample>
|
| 378 |
<p>
|
| 379 |
will be translated to the following CDuce type:
|
| 380 |
</p>
|
| 381 |
<sample><![CDATA[
|
| 382 |
# #print_type Mails # header;;
|
| 383 |
<header {| name = String |}>String
|
| 384 |
]]></sample>
|
| 385 |
<p>
|
| 386 |
Note that the type of the element content <em>is not a sequence</em>
|
| 387 |
unless the translation of the XML Schema types is a sequence itself (as
|
| 388 |
you can notice in the example above). Compare it with the following
|
| 389 |
where the element content is not a sequente, but a single record:
|
| 390 |
</p>
|
| 391 |
<sample><![CDATA[
|
| 392 |
# #print_type Mails # Date;;
|
| 393 |
<Date {| |}>{
|
| 394 |
positive = Bool;
|
| 395 |
year = Int; month = Int; day = Int; hour = Int;
|
| 396 |
minute = Int; second = Int;
|
| 397 |
timezone =? { positive = Bool; hour = Int; minute = Int }
|
| 398 |
}
|
| 399 |
]]></sample>
|
| 400 |
</li>
|
| 401 |
<li>
|
| 402 |
<p>
|
| 403 |
XML Schema <b>attribute group definitions</b> are mapped to record types
|
| 404 |
containing one field for each attribute declarations contained in the
|
| 405 |
group. <tt>use</tt> constraints are respected: optional attributes are
|
| 406 |
mapped to optional fields, required attributes to required fields.
|
| 407 |
</p>
|
| 408 |
<p>
|
| 409 |
The following XML Schema attribute group declaration:
|
| 410 |
</p>
|
| 411 |
<sample><![CDATA[
|
| 412 |
<xsd:attributeGroup name="mimeTypeAttributes">
|
| 413 |
<xsd:attribute name="type" type="mimeTopLevelType" use="required" />
|
| 414 |
<xsd:attribute name="subtype" type="xsd:string" use="required" />
|
| 415 |
</xsd:attributeGroup>
|
| 416 |
]]></sample>
|
| 417 |
<p>
|
| 418 |
will thus be mapped to the following CDuce type:
|
| 419 |
</p>
|
| 420 |
<sample>
|
| 421 |
# #print_type Mails # mimeTypeAttributes;;
|
| 422 |
{| type = [
|
| 423 |
'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video'
|
| 424 |
];
|
| 425 |
subtype = String |}
|
| 426 |
</sample>
|
| 427 |
</li>
|
| 428 |
<li>
|
| 429 |
<p>
|
| 430 |
XML Schema <b>model group definitions</b> are mapped to CDuce sequence
|
| 431 |
types. <tt>minOccurs</tt> and <tt>maxOccurs</tt> constraints are
|
| 432 |
respected, using CDuce recursive types to represent <tt>unbounded</tt>
|
| 433 |
repetition (i.e. Kleene star).
|
| 434 |
</p>
|
| 435 |
<p>
|
| 436 |
<tt>all</tt> constraints, also known as <em>interleaving
|
| 437 |
constraints</em>, can't be expressed in the CDuce type system avoiding
|
| 438 |
type sizes explosion. Thus, this kind of content models are normalized
|
| 439 |
and considered, in the type system, as sequence types.
|
| 440 |
</p>
|
| 441 |
<p>
|
| 442 |
For a similar reason, <tt>mixed</tt> content models aren't supported by
|
| 443 |
CDuce too.
|
| 444 |
</p>
|
| 445 |
<p>
|
| 446 |
As an example, the following XML Schema model group definition:
|
| 447 |
</p>
|
| 448 |
<sample><![CDATA[
|
| 449 |
<xsd:group name="attachmentContent">
|
| 450 |
<xsd:sequence>
|
| 451 |
<xsd:element name="mimetype">
|
| 452 |
<xsd:complexType>
|
| 453 |
<xsd:attributeGroup ref="mimeTypeAttributes" />
|
| 454 |
</xsd:complexType>
|
| 455 |
</xsd:element>
|
| 456 |
<xsd:element name="content" type="xsd:string" minOccurs="0" />
|
| 457 |
</xsd:sequence>
|
| 458 |
</xsd:group>
|
| 459 |
]]></sample>
|
| 460 |
<p>
|
| 461 |
will be mapped to the following CDuce type:
|
| 462 |
</p>
|
| 463 |
<sample><![CDATA[
|
| 464 |
# #print_type Mails # attachmentContent;;
|
| 465 |
[ X1 <content {| |}>String | X1 ] where
|
| 466 |
X1 = <mimetype {| type = [ ... ]; subtype = String |}>[ ]
|
| 467 |
]]></sample>
|
| 468 |
</li>
|
| 469 |
</ul>
|
| 470 |
</box>
|
| 471 |
|
| 472 |
<box title="XML Schema validation" link="validation">
|
| 473 |
<p>
|
| 474 |
The processes of XML Schema validation and assessment check that an XML
|
| 475 |
Schema instance document is valid with respect to an XML Schema document and
|
| 476 |
add missing information such as default values. The CDuce's notion of Schema
|
| 477 |
validation is a bit different.
|
| 478 |
</p>
|
| 479 |
<p>
|
| 480 |
CDuce permits to have XML values made of arbitrary types, for example you
|
| 481 |
can have XML elements which have integer attributes. Still, this feature is
|
| 482 |
rarely used because the function used to load XML documents
|
| 483 |
(<code>load_xml</code>) returns XML values which have as leaves values of
|
| 484 |
type PCDATA.
|
| 485 |
</p>
|
| 486 |
<p>
|
| 487 |
Once you have imported an XML Schema in CDuce, you can use it to validate an
|
| 488 |
XML value returned by <code>load_xml</code> against an XML Schema component
|
| 489 |
defined in it. The process of validation will basically build a CDuce value
|
| 490 |
which has the type corresponding to the conversion of the XML Schema type of
|
| 491 |
the component used in validation to a CDuce type. The conversion is the same
|
| 492 |
described in the previous secion. Note that is not strictly necessary that
|
| 493 |
the input XML value comes from <code>load_xml</code> it's enough that it has
|
| 494 |
PCDATA values as leaves.
|
| 495 |
</p>
|
| 496 |
<p>
|
| 497 |
During validation PCDATA strings are parsed to build CDuce values
|
| 498 |
corresponding to XML Schema simple types and whitespace are handled as
|
| 499 |
specified by XML Schema <code>whiteSpace</code> facet. For example,
|
| 500 |
validating the <code>1234567890 </code><em>PCDATA string</em> against an
|
| 501 |
<code>xsd:integer</code> simple type will return the CDuce value
|
| 502 |
<code>1234567890</code> typed with type <code>Int</code>.<br />
|
| 503 |
Default values for missing attributes or elements are also added where
|
| 504 |
specified.
|
| 505 |
</p>
|
| 506 |
<p>
|
| 507 |
You can use the <code>validate</code> keyword to perform validation in CDuce
|
| 508 |
program. The syntax is as follows:<br /> <code>validate <expr> with
|
| 509 |
<schema_ref></code><br /> where schema_ref is defined as described
|
| 510 |
in <a href="#import">XML Schema components import</a>. Same ambiguity rules
|
| 511 |
will apply here.
|
| 512 |
</p>
|
| 513 |
<p>
|
| 514 |
More in detail, validation can be applied to different kind of CDuce values
|
| 515 |
depending on the type of Schema component used for validation.
|
| 516 |
</p>
|
| 517 |
<ul>
|
| 518 |
<li>
|
| 519 |
<p>
|
| 520 |
The typical use of validation is to validate against <b>element
|
| 521 |
declaration</b>. In such a case validate should be invoked on an XML
|
| 522 |
CDuce value as in the following example.
|
| 523 |
</p>
|
| 524 |
<sample><![CDATA[
|
| 525 |
# let xml = <Date>"2003-10-15T15:44:01Z" in
|
| 526 |
validate xml with Mails # Date;;
|
| 527 |
- : <Date {| |}>{
|
| 528 |
positive = Bool;
|
| 529 |
year = Int; month = Int; day = Int;
|
| 530 |
hour = Int; minute = Int; second = Int;
|
| 531 |
timezone =? { positive = Bool; hour = Int; minute = Int }
|
| 532 |
}
|
| 533 |
=
|
| 534 |
<Date> {
|
| 535 |
positive=`true;
|
| 536 |
year=2003; month=10; day=15;
|
| 537 |
hour=15; minute=44; second=1;
|
| 538 |
timezone={ positive=`true; hour=0; minute=0 }
|
| 539 |
}
|
| 540 |
]]></sample>
|
| 541 |
<p>
|
| 542 |
The tag of the given element is checked for consistency with the
|
| 543 |
element declaration; attributes and content are checked against the
|
| 544 |
Schema type declared for the element.
|
| 545 |
</p>
|
| 546 |
</li>
|
| 547 |
<li>
|
| 548 |
<p>
|
| 549 |
Sometimes you may want to validate an element against an XML Schema
|
| 550 |
<b>complex type</b> without having to use element declarations. This
|
| 551 |
case is really similar to the previous one with the difference that the
|
| 552 |
Schema component you should use is a complex type declaration, you can
|
| 553 |
apply such a validation to any XML value. The other important difference
|
| 554 |
is that the tag name of the given value is completely ignored.
|
| 555 |
</p>
|
| 556 |
<p>
|
| 557 |
As an example:
|
| 558 |
</p>
|
| 559 |
<sample><![CDATA[
|
| 560 |
# let xml = load_xml "envelope.xml" ;;
|
| 561 |
val xml : Any = <ignored_tag From="fake@microsoft.com">[
|
| 562 |
<From>[ 'user@unknown.domain.org' ]
|
| 563 |
<To>[ 'user@cduce.org' ]
|
| 564 |
<Date>[ '2003-10-15T15:44:01Z' ]
|
| 565 |
<Subject>[ 'I desperately need XML Schema support in CDuce' ]
|
| 566 |
<header name="Reply-To">[ 'bill@microsoft.com' ]
|
| 567 |
]
|
| 568 |
# validate xml with Mails # envelopeType;;
|
| 569 |
- : <(Any) {| From = String |}>[
|
| 570 |
<From {| |}>String <To {| |}>String
|
| 571 |
<Date {| |}>{
|
| 572 |
positive = Bool;
|
| 573 |
year = Int; month = Int; day = Int;
|
| 574 |
hour = Int; minute = Int; second = Int;
|
| 575 |
timezone =? { positive = Bool; hour = Int; minute = Int }
|
| 576 |
}
|
| 577 |
<Subject {| |}>String
|
| 578 |
<header {| name = String |}>[ String ]* ]
|
| 579 |
=
|
| 580 |
<ignored_tag From="fake@microsoft.com">[
|
| 581 |
<From>[ 'user@unknown.domain.org' ]
|
| 582 |
<To>[ 'user@cduce.org' ]
|
| 583 |
<Date> {
|
| 584 |
positive=`true;
|
| 585 |
year=2003; month=10; day=15;
|
| 586 |
hour=15; minute=44; second=1;
|
| 587 |
timezone={ positive=`true; hour=0; minute=0 }
|
| 588 |
}
|
| 589 |
<Subject>[ 'I desperately need XML Schema support in CDuce' ]
|
| 590 |
<header name="Reply-To">[ "bill@microsoft.com" ]
|
| 591 |
]
|
| 592 |
]]></sample>
|
| 593 |
</li>
|
| 594 |
<li>
|
| 595 |
<p>
|
| 596 |
Similarly you can want to validate against a <b>model group</b>. In this
|
| 597 |
case you can validate CDuce's sequences against model groups. Given
|
| 598 |
sequences will be considered as content of XML elements.
|
| 599 |
</p>
|
| 600 |
<p>
|
| 601 |
As an example:
|
| 602 |
</p>
|
| 603 |
<sample><![CDATA[
|
| 604 |
# let xml = load_xml "attachment.xml";;
|
| 605 |
val xml : Any =
|
| 606 |
<ignored_tag ignored_attribute="foo">[
|
| 607 |
<mimetype type="application"; subtype="msword">[ ]
|
| 608 |
<content>[ '\n ### removed by spamoracle ###\n ' ]
|
| 609 |
]
|
| 610 |
# let content = match xml with <_>cont -> cont | _ -> raise "failure";;
|
| 611 |
val content : Any = [
|
| 612 |
<mimetype type="application"; subtype="msword">[ ]
|
| 613 |
<content>[ '\n ### removed by spamoracle ###\n ' ]
|
| 614 |
]
|
| 615 |
# validate content with Mails # attachmentContent;;
|
| 616 |
- : [ X1 <content {| |}>String | X1 ] where
|
| 617 |
X1 = <mimetype {|
|
| 618 |
type = [
|
| 619 |
'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video'
|
| 620 |
];
|
| 621 |
subtype = String |}>[ ]
|
| 622 |
=
|
| 623 |
[ <mimetype type="application"; subtype="msword">[ ]
|
| 624 |
<content>[ '\n ### removed by spamoracle ###\n ' ]
|
| 625 |
]
|
| 626 |
]]></sample>
|
| 627 |
</li>
|
| 628 |
<!-- TODO see schema/schema_validator.mli
|
| 629 |
<li>
|
| 630 |
<p>
|
| 631 |
Is also possible to validate CDuce records against <b>attribute
|
| 632 |
declarations</b>. If the defined attribute is required, the record is
|
| 633 |
scanned for a field having the same name as the attribute. Its content
|
| 634 |
is then validated against the simple type associated to the attribute in
|
| 635 |
the schema declaration and a new record value is returned. This value is
|
| 636 |
identical to the given one except for the content of the validated
|
| 637 |
field. Validation fails if no field in the record matches the attribute
|
| 638 |
name.
|
| 639 |
</p>
|
| 640 |
<p>
|
| 641 |
If the defined attribute is not required no error is raised if the field
|
| 642 |
is missing. If a default value is specified in the attribute declaration
|
| 643 |
the returned record will have a corresponding additional field,
|
| 644 |
otherwise a record identical to the given one is returned.
|
| 645 |
</p>
|
| 646 |
<p>
|
| 647 |
As an example:
|
| 648 |
</p>
|
| 649 |
<sample><![CDATA[
|
| 650 |
# let record = { name = "User-Agent"; added_by = "mutt" };;
|
| 651 |
val record : {| name = [ 'User-Agent' ]; added_by = [ 'mutt' ] |}
|
| 652 |
=
|
| 653 |
{ name="User-Agent"; added_by="mutt" }
|
| 654 |
# validate record with Mails # name ;;
|
| 655 |
- : { name = String } = { name="User-Agent"; added_by="mutt" }
|
| 656 |
]]></sample>
|
| 657 |
</li>
|
| 658 |
-->
|
| 659 |
<li>
|
| 660 |
<p>
|
| 661 |
Finally is possible to validate records against <b>attribute groups</b>.
|
| 662 |
All required attributes declared in the attribute group should have
|
| 663 |
corresponding fields in the given record. The content of each of them is
|
| 664 |
validate against the simple type defined for the corresponding attribute
|
| 665 |
in the attribute group. Non required fields are added if missing using
|
| 666 |
the corresponding default value (if any).
|
| 667 |
</p>
|
| 668 |
<p>
|
| 669 |
As an example:
|
| 670 |
</p>
|
| 671 |
<sample><![CDATA[
|
| 672 |
# let record = { type = "image"; subtype = "png" };;
|
| 673 |
val record :
|
| 674 |
{| type = [ 'image' ]; subtype = [ 'png' ] |} =
|
| 675 |
{ type="image"; subtype="png" }
|
| 676 |
# validate record with Mails # mimeTypeAttributes ;;
|
| 677 |
- : {| type = [ 'image' | 'text' | ... ]; subtype = String |} =
|
| 678 |
{ type="image"; subtype="png" }
|
| 679 |
]]></sample>
|
| 680 |
</li>
|
| 681 |
</ul>
|
| 682 |
</box>
|
| 683 |
|
| 684 |
<box title="XML Schema instances output" link="print_xml">
|
| 685 |
<p>
|
| 686 |
<b>TODO</b>
|
| 687 |
</p>
|
| 688 |
</box>
|
| 689 |
|
| 690 |
</page>
|