Overview

CDuce partially supports XML Schema Recommendations (Primer, Structures, Datatypes). Using this CDuce feature it is possible to manipulate XML documents whose leaves are typed values like integers, dates, binary data, and so on.

CDuce supports XML Schema by implementing the following features:

This manual page describes how to use these features in CDuce, all the documents used in the examples are available in the manual section: XML Schema sample documents.

Note: The support for XML Schema does not currently interact well with separate compilation. When a CDuce unit script.cd which uses an XML Schema is compiled, the resulting script.cdo object refers to the XML Schema by name. That is, when these units are run, the XML Schema must still be available from the current directory and must not have been changed since compilation.

XML Schema components (micro) introduction

An XML Schema document could define four different kinds of component, each of them could be imported in CDuce and used as CDuce types:

  • Type definitions
    A type definition defines either a simple type or a complex type. The former could be used to type more precisely the string content of an element. You can think at it as a refinement of #PCDATA. XML Schema provides a set of predefined simple types and a way to define new simple types. The latter could be used to constraint the content model and the attributes of an XML element. An XML Schema complex type is strictly more expressive than a DTD element declaration.
  • Element declarations An element declaration links an attribute name to a complex type. Optionally, if the type is a simple type, it can constraints the set of possible values for the element mandating a fixed value or providing a default value.
  • Attribute group definitions An attribute group definitions links a set of attribute declarations to a name which can be referenced from other XML Schema components.
  • Model group definitions A model group definition links a name to a constraint over the complex content of an XML element. The linked name can be referenced from other XML Schema components.

Attribute declaration currently don't produce any CDuce type and can't be used for validation themselves.

XML Schema components import

In order to import XML Schema components in CDuce, you first need to tell CDuce to import an XML Schema document. You can do this using the schema keyword to bind an uppercase identifier to a local schema document:

# schema Mails = "tests/schema/mails.xsd";;
  

The above declaration will (try to) import all schema components included in the schema document mails.xsd as CDuce types. You can reference them using the dot operator, e.g. S.mails.

XML Schema permits ambiguity in components name. CDuce chooses to resolve references to Schema components in this order: elements, types, model groups, attribute group.

The result of a schema component reference is an ordinary CDuce type which you can use as usual in function definitions, pattern matching and so on.

let is_valid_mail (Any -> Bool)
  | Mails.mailType -> `true
  | _ -> `false
  

Correctness remark: while parsing XML Schema documents, CDuce assumes that they're correct with respect to XML Schema recommendations. At minimum they're required to be valid with respect to XML Schema for Schemas. It's recommended that you will check for validity your schemas before importing them in CDuce, strange behaviour is assured otherwise.

Toplevel directives

The toplevel directive #env supports schemas, it lists the currently defined schemas.

The toplevel directive #print_type supports schemas too, it can be used to print types corresponding to schema components.

# #print_type Mails.bodyType;;
[ Char* ]
  

For more information have a look at the manual section about toplevel directives.

XML Schema → CDuce mapping

  • XML Schema predefined simple types are mapped to CDuce types directly in the CDuce implementation preserving as most as possible XML Schema constraints. The table below lists the most significant mappings.

    XML Schema predefined simple typeCDuce type
    duration, dateTime, time, date, gYear, gMonth, ... closed record types with some of the following fields (depending on the Schema type): year, month, day, hour, minute, second, timezone
    booleanBool
    anySimpleType, string, base64Binary, hexBinary, anyURIString
    integerInt
    nonPositiveInteger, negativeInteger, nonNegativeInteger, positiveInteger, long, int, short, byteinteger intervals with the appropriate limits
    string, normalizedString, and the other types derived (directly or indirectly) by restriction from string String
    NMTOKENS, IDREFS, ENTITIES[String*]
    decimal,float,doubleFloat
    (Not properly supported)
    decimal, float, double, NOTATION, QName
    String

    Simple type definitions are built from the above types following the XML Schema derivation rules.

  • XML Schema complex type definitions are mapped to CDuce types representing XML elements which can have any tag, but whose attributes and content are constrained to be valid with respect to the original complex type.

    As an example, the following XML Schema complex type (a simplified version of the homonymous envelopeType defined in mails.xsd):

     <xsd:complexType name="envelopeType">
      <xsd:sequence>
       <xsd:element name="From" type="xsd:string"/>
       <xsd:element name="To" type="xsd:string"/>
       <xsd:element name="Date" type="xsd:dateTime"/>
       <xsd:element name="Subject" type="xsd:string"/>
      </xsd:sequence>
     </xsd:complexType>
    

    will be mapped to an XML CDuce type which must have a From attribute of type String and four children. Among them the Date children must be an XML element containing a record which represents a dateTime Schema type.

    # #print_type Mails.envelopeType;;
    <(Any)>[
      <From>String
      <To>String
      <Date>{ hour=Int positive=Bool year=Int minute=Int second=Int day=Int
        timezone=?{ hour=Int positive=Bool minute=Int } month=Int
        time_kind=`date | `gDay | `time | `gMonth | `dateTime | `gYear |
        `duration | `gMonthDay | `gYearMonth 
      }
      <Subject}>String
    ]
    
  • XML Schema element declarations can bound an XML element either to a complex type or to a simple type. In the former case the conversion is almost identical as what we have seen for complex type conversion. The only difference is that this time element's tag must correspond to the name of the XML element in the schema element declaration, whereas previously it was Any type.

    In the latter case (element with simple type content), the corresponding CDuce types is an element type. Its tag must correspond to the name of the XML element in the schema element declaration; its content type its the CDuce translation of the simple type provided in the element declaration.

    For example, the following XML Schema element (corresponding to the homonymous element defined in mails.xsd):

    <xsd:element name="header">
     <xsd:complexType>
      <xsd:simpleContent>
       <xsd:extension base="xsd:string">
        <xsd:attribute ref="name" use="required" />
       </xsd:extension>
      </xsd:simpleContent>
     </xsd:complexType>
    </xsd:element>
    

    will be translated to the following CDuce type:

    # #print_type Mails.header;;
    <header name=String>String
    

    Note that the type of the element content is not a sequence unless the translation of the XML Schema types is a sequence itself (as you can notice in the example above). Compare it with the following where the element content is not a sequente, but a single record:

    # #print_type Mails.Date;;
    <Date>{ hour=Int positive=Bool year=Int minute=Int second=Int day=Int
          timezone=?{ hour=Int positive=Bool minute=Int } month=Int
          time_kind=`date | `gDay | `time | `gMonth | `dateTime | `gYear |
          `duration | `gMonthDay | `gYearMonth }
    

    XML Schema wildcards (xsd:any) and nullable elements (xsi:nil) are supported.

  • XML Schema attribute group definitions are mapped to record types containing one field for each attribute declarations contained in the group. use constraints are respected: optional attributes are mapped to optional fields, required attributes to required fields. XML Schema attribute wildcards are partly supported; they simply produce open record types instead of closed one, but the actual constraints of the wildcards are discarded.

    The following XML Schema attribute group declaration:

    <xsd:attributeGroup name="mimeTypeAttributes">
     <xsd:attribute name="type" type="mimeTopLevelType" use="required" />
     <xsd:attribute name="subtype" type="xsd:string" use="required" />
    </xsd:attributeGroup>
    

    will thus be mapped to the following CDuce type:

    # #print_type Mails.mimeTypeAttributes;;
    { type=String subtype=String }
          
  • XML Schema model group definitions are mapped to CDuce sequence types. minOccurs and maxOccurs constraints are respected, using CDuce recursive types to represent unbounded repetition (i.e. Kleene star).

    all constraints, also known as interleaving constraints, can't be expressed in the CDuce type system avoiding type sizes explosion. Thus, this kind of content models are normalized and considered, in the type system, as sequence types (the validator will reorder the actual XML documents).

    Mixed content models are supported.

    As an example, the following XML Schema model group definition:

    <xsd:group name="attachmentContent">
     <xsd:sequence>
      <xsd:element name="mimetype">
       <xsd:complexType>
        <xsd:attributeGroup ref="mimeTypeAttributes" />
       </xsd:complexType>
      </xsd:element>
      <xsd:element name="content" type="xsd:string" minOccurs="0" />
     </xsd:sequence>
    </xsd:group>
    

    will be mapped to the following CDuce type:

    # #print_type Mails.attachmentContent;;
    [ <mimetype (Mails.mimeTypeAttributes)>[  ] <content>String? ]
    

XML Schema validation

The processes of XML Schema validation and assessment check that an XML Schema instance document is valid with respect to an XML Schema document and add missing information such as default values. The CDuce's notion of Schema validation is a bit different.

CDuce permits to have XML values made of arbitrary types, for example you can have XML elements which have integer attributes. Still, this feature is rarely used because the function used to load XML documents (load_xml) returns XML values which have as leaves values of type PCDATA.

Once you have imported an XML Schema in CDuce, you can use it to validate an XML value returned by load_xml against an XML Schema component defined in it. The process of validation will basically build a CDuce value which has the type corresponding to the conversion of the XML Schema type of the component used in validation to a CDuce type. The conversion is the same described in the previous secion. Note that is not strictly necessary that the input XML value comes from load_xml it's enough that it has PCDATA values as leaves.

During validation PCDATA strings are parsed to build CDuce values corresponding to XML Schema simple types and whitespace are handled as specified by XML Schema whiteSpace facet. For example, validating the 1234567890 PCDATA string against an xsd:integer simple type will return the CDuce value 1234567890 typed with type Int.
Default values for missing attributes or elements are also added where specified.

You can use the validate keyword to perform validation in CDuce program. The syntax is as follows:
validate <expr> with <schema_ref>
where schema_ref is defined as described in XML Schema components import. Same ambiguity rules will apply here.

More in detail, validation can be applied to different kind of CDuce values depending on the type of Schema component used for validation.

  • The typical use of validation is to validate against element declaration. In such a case validate should be invoked on an XML CDuce value as in the following example.

    # let xml = <Date>"2003-10-15T15:44:01Z" in
      validate xml with Mails.Date;;
      - : Mails.Date = <Date>
                       { hour=15 positive=`true year=2003 minute=44 second=1
                       day=15 timezone={ hour=0 positive=`true minute=0 }
                       month=10 time_kind=`dateTime }
    

    The tag of the given element is checked for consistency with the element declaration; attributes and content are checked against the Schema type declared for the element.

  • Sometimes you may want to validate an element against an XML Schema complex type without having to use element declarations. This case is really similar to the previous one with the difference that the Schema component you should use is a complex type declaration, you can apply such a validation to any XML value.

    As an example:

    # let xml = load_xml "envelope.xml" ;;  
    val xml : AnyXml = <envelope
                         From="fake@microsoft.com">[
                         <From>[ 'user@unknown.domain.org' ]
                         <To>[ 'user@cduce.org' ]
                         <Date>[ '2003-10-15T15:44:01Z' ]
                         <Subject>[
                           'I desperately need XML Schema support in CDuce'
                           ]
                         <header name="Reply-To">[ 'bill@microsoft.com' ]
                         ]
    # validate xml with Mails.envelopeType;;
    - : Mails.envelopeType = <envelope
                               From="fake@microsoft.com">[
                               <From>[ 'user@unknown.domain.org' ]
                               <To>[ 'user@cduce.org' ]
                               <Date>
                                 { hour=15 positive=`true year=2003 minute=44
                                 second=1 day=15
                                 timezone={ hour=0 positive=`true minute=0 }
                                 month=10 time_kind=`dateTime }
                               <Subject>[
                                 'I desperately need XML Schema support in CDuce'
                                 ]
                               <header name="Reply-To">[ 'bill@microsoft.com' ]
                               ]
    
  • Similarly you may want to validate against a model group. In this case you can validate CDuce's sequences against model groups. Given sequences will be considered as content of XML elements.

    As an example:

    # let xml = load_xml "attachment.xml";;
    val xml : AnyXml = <attachment
                         name="signature.doc">[
                         <mimetype type="application" subtype="msword">[ ]
                         <content>[ '\n    ### removed by spamoracle ###\n  ' ]
                         ]
    # let content = match xml with <_ ..>cont -> cont;;
    val content : [ (AnyXml | Char)* ] = [ <mimetype
                                             type="application"
                                             subtype="msword">[
                                             ]
                                           <content>[
                                             '\n    ### removed by spamoracle ###\n  '
                                             ]
                                           ]
    # validate content with Mails.attachmentContent;;
    - : Mails.attachmentContent = [ <mimetype
                                      type="application"
                                      subtype="msword">[
                                      ]
                                    <content>[
                                      '\n    ### removed by spamoracle ###\n  '
                                      ]
                                    ]
    
  • Finally it is possible to validate records against attribute groups. All required attributes declared in the attribute group should have corresponding fields in the given record. The content of each of them is validate against the simple type defined for the corresponding attribute in the attribute group. Non required fields are added if missing using the corresponding default value (if any).

    As an example:

    # let record = { type = "image"; subtype = "png" };;
    val record : { type=[ 'image' ] subtype=[ 'png' ] } = { type="image"
                                                          subtype="png" }
    # validate record with Mails.mimeTypeAttributes ;;
    - : Mails.mimeTypeAttributes = { type="image" subtype="png" }
    

XML Schema instances output

It is possible to use the normal print_xml and print_xml_utf8 built-in functions to print values resulting from XML Schema validation.

Unsupported XML Schema features

The support for XML Schema embedded in CDuce does not attempt to cover the full XML Schema specification. In particular, imported schemas are not checked to be valid. You can use for instance this on-line validator to check validity of a schema.

Also, some features from the XML Schema specification are not or only partially supported. Here is a non-exhaustive list of limitations:

  • Substitution groups.
  • Some facets (pattern, totalDigits, fractionDigits).
  • <redefine> (inclusion of an XML Schema with modifications).
  • xsi:type.