Design and features

Our point of view and our guideline for the design of CDuce is that a programming language for XML should take XML types ( DTD, XML Schema, Relax-NG, ...) seriously into account. The benefits are the following:

  • static verifications (e.g.: ensure that a transformation produces a valid document [1]);
  • in particular, we aim at smooth and safe compositions of XML transformations, and incremental programming;
  • static optimizations and efficient execution model (knowing the type of a document is crucial to extract information efficiently).

Some features particular to CDuce:

  • XML objects can be manipulated as first-class citizen values: elements, sequences, tags, characters and strings, attribute sets; sequences of XML elements can be specified by regular expressions, which also apply to characters strings;
  • functions themselves are first-class values, they can be manipulated, stored in data structure, returned by a function,...
  • a powerful pattern matching operation can perform complex extractions from sequences of XML elements;
  • a rich type algebra, with recursive types and arbitrary boolean combinations (union, intersection, complement) allows precise definitions of data structures and XML types; general purpose types and types constructors are taken seriously (products, extensible records, arbitrary precision integers with interval constraints, Unicode characters);
  • polymorphism through a natural notion of subtyping, and overloaded functions with dynamic dispatch;
  • an highly-effective type-driven compilation schema.

CDuce is fast, functional, type-safe, and conforms to basic standards: Unicode, XML, DTD, Namespaces are fully supported, XML Schema is partially supported.

Preliminary benchmarks suggest that despite the overhead for static type verification, a CDuce program can run faster (30% to 60%) than an equivalent XSLT style-sheet (we performed benchmarks with the xsltproc tools from the Gnome libxslt library).

The name CDuce was coined by Francesco Zappa Nardelli.

[1] Valid with respect to validity constraints that can be expressed by the type system (thus typically excluding constraints like ID and IDREF).

XDuce and CDuce

The starting point of our work on CDuce was the XDuce language developed at the UPenn DB group. Many of CDuce features originate from XDuce. Some of our achievements:

  • integration of first-class and overloaded functions, arbitrary boolean connectives, and extensible (or not) records, to the semantic definition of subtyping;
  • a subtyping algorithm without backtracking;
  • extending pattern matching to capture non consecutive subsequences; removing tail condition for exact matching (they arrived independently to another solution);
  • efficient evaluation model that takes profit of static type information;

Of course, the work on XDuce continued during our, and they developed nice ideas: mixed attribute-element types (same expressive power as our records, but they can sometimes avoid exponential explosion where we cannot); powerful filter operation.