CDuce: XML Namespaces

ℂDuce: Documentation: User's manual: XML Namespaces

Overview

CDuce fully implements the W3C XML Namespaces Recommendation. Atom names (hence XML element tags) and record labels (hence XML attribute names) are logically composed of a namespace URI and a local part. Syntactically, they are written as qualified names, conforming to the QName production of the Recommendation:

QName     ::= (Prefix ':')? LocalPart
Prefix    ::= NCName
LocalPart ::= NCName

The prefix in a QName must be bound to a namespace URI. In XML, the bindings from prefixes to namespace URIs are introduction through special xmlns:prefix attributes. In CDuce, instead, there are explicit namespace binders. For instance, the following XML documents

<p:a q:c="3" xmlns:p="http://a.com" xmlns:q="http://b.com"/>

can be written in CDuce:

namespace p = "http://a.com" in
namespace q = "http://b.com" in
<p:a q:c="3">[]

This element can be bound to a variable x by a let binding as follows:

let x = 
  namespace p = "http://a.com" in
  namespace q = "http://b.com" in
  <p:a q:c="3">[]

In which case the namespace declarations are local to the scope of the let. Alternatively, it is possible to use global prefix bindings:

namespace p = "http://a.com"
namespace q = "http://b.com"
let x = <p:a q:c="3">[]

Similarly, CDuce supports namespace defaulting. This is introduced by a local or global namespace "..." construction. As in the XML, default namespaces apply only to tags (atoms), not attributes (record labels). For instance, in the expression namespace "A" in <x y="3">[], the namespace for the element tag is "A", and the attribute has no namespace.

The toplevel directive #env causes CDuce to print, amongst others, the current set of global bindings.

Reusing namespace declarations

A global namespace declaration actually defines an identifier which is exported by the current compilation unit. It is possible to use this identifier in another unit to redefine another prefix with the same namespace URI. E.g., if the unit a contains:

namespace ns = "http://a.com"

then, in another unit, it is possible to declare:

namespace ans = a.ns

The open statement operates on namespace declarations; all the declarations from the open'ed unit are re-exported by the current unit.

XML Schema and namespaces

If an XML Schema has been bound to some identifier (in the current compilation unit or another one), it is possible to use this identifier in the right-hand side of a namespace declarations. The namespace URI is the targetNamespace of the XML Schema. E.g.:

schema s = "..."
namespace ns = s

Types for atoms

The type Atom represents all the atoms, in all the namespaces. An underscore in tag position (as in <_>[]) stands for this type.

Each atom constitutes a subtype of Atom. In addition to these singelton types, there are the ``any in namespace'' subtypes, written: p:* where p is a namespace prefix; this type has all the atoms in the namespace denoted by p. The token .:* represents all the atoms in the current default namespace.

When used as atoms and not tags, the singleton types and ``any in namespace'' types must be prefixed by a backquote, as for atom values: `p:x, `p:*, `.:*.

Printing XML documents

The print_xml and print_xml_utf8 operators produce a string representation of an XML document. They have to assign prefixes to namespace. In the current implementation, CDuce produces XML documents with no default namespace and only toplevel prefix bindings (that is, xmlns:p="..." attributes are only produced for the root element). Prefix names are chosen using several heuristics. First, CDuce tries using the prefixes bound in the scope of the print_xml operator. When this is not possible, it uses global ``hints'': each time a prefix binding is encountered (in the CDuce program or in loaded XML documents), it creates a global hint for the namespace. Finally, it generates fresh prefixes of the form nsn where n is an integer. For instance, consider the expression:

print_xml (namespace "A" in <a>[])

As there is no available name the prefix URI "A", CDuce generates a fresh prefix and produces the following XML documents:

<ns1:a xmlns:ns1="A"/>

Now consider this expression:

print_xml (namespace p = "A" in <p:a>[])

CDuce produces:

<p:a xmlns:p="A"/>

In this case, the prefix binding for the namespace "A" is not in the scope of print_xml, but the name p is available as a global hint. Finally, consider:

namespace q = "A" in print_xml (namespace p = "A" in <p:a>[])

Here, the prefix q is available in the scope of the print_xml. So it is used in priority:

<q:a xmlns:q="A"/>

As a final example, consider the following expression:

print_xml (namespace p ="A" in <p:a>[ (namespace p = "B" in <p:a>[]) ])

A single name p is available for both namespaces "A" and "B". CDuce choses to assign it to "A", and it generates a fresh name for "B", so as to produce:

<p:a xmlns:ns1="B" xmlns:p="A"><ns1:a/></p:a>

Note that the fresh names are ``local'' to an application of print_xml. Several application of print_xml will re-use the same names ns1, ns2, ...

Pretty-printing of XML values and types

The CDuce interpreter and toplevel use an algorithm similar to the one mentioned in the previous section to pretty-print CDuce values and types that involve namespace.

The main difference is that it does not use by default the current set of prefix bindings. The rationale is that this set can change and this would make it difficult to understand the output of CDuce. So only global hints are used to produce prefixes. Once a prefix has been allocated, it is not re-used for another namespace. The toplevel directive #env causes CDuce to print, amongst other, the table of prefixes used for pretty-printing. It is possible to reinitialize this table with the directive #reinit_ns. This directive also sets the current set if prefix bindings as a primary source of hints for assigning prefixes for pretty-printing in the future.

Accessing namespace bindings

CDuce encourages a processing model where namespace prefixes are just considered as macros (for namespaces) which are resolved by the (CDuce or XML) parser. However, some XML specifications require the application to keep for each XML element the set of locally visible bindings from prefixes to namespaces. CDuce provides some support for that.

Even if this is not reflected in the type system, CDuce can optionally attach to any XML element a table of namespace bindings. The following built-in functions allow the programmer to explictly access this information:

type Namespaces = [ (String,String)* ]
namespaces: AnyXml -> Namespaces
set_namespaces: Namespaces -> AnyXml -> AnyXml

The namespaces function raises an exception when its argument has no namespace information attached.

When XML elements are generated, either as literals in the CDuce code or by load_xml, it is possible to tell CDuce to remember in-scope namespace bindings. This can be done with the following construction:

namespace on in e

The XML elements built within e (including by calling load_xml) will be annotated. There is a similar namespace off construction to turn off this mecanism in a sub-expression, and both constructions can be used at top-level.

# namespace cduce = "CDUCE";;
# namespaces <cduce:a>[];;
Uncaught CDuce exception: [ `Invalid_argument 'namespaces' ]

# namespace on;;
# namespaces <cduce:a>[];;
- : Namespaces = [ [ "xsd" 'http://www.w3.org/2001/XMLSchema' ]
                   [ "xsi" 'http://www.w3.org/2001/XMLSchema-instance' ]
                   [ "cduce" 'CDUCE' ]
                   ]
# namespaces (load_xml "string:<a xmlns='xxx'/>");;
- : Namespaces = [ [ "" 'xxx' ] ]

The default binding for the prefix xml never appear in the result of namespaces.

The xtransform iterator does not change the attached namespace information for XML elements which are just traversed. The generic comparison operator cannot distinguish two XML elements which only differ by the attached namespace information.

Miscellaneous

Contrary to the W3C Namespaces in XML 1.1 Candidate Recommendation, a CDuce declaration namespace p = "" does not undeclare the prefix p. Instead, it binds it to the null namespace (that is, a QName using this prefix is interpreted as having no namespace).

Webmaster - Site map

ℂDuce: Documentation: User's manual: XML Namespaces

Expressions XML Schema