Select from where

WE SHOULD REWRITE THIS SECTION TO GET RID OF THE MANY TYPE DEFINITIONS. JUST USE THE ONE OF USE CASES

CDuce is endowed with a select_from_where syntax to perform some SQL-like queries. The general form of select expressions is

select e from
   p1 in e1,
   p2 in e2,
       :
   pn in en
where b

where e is an expression b a boolean expression, the pi's are patterns, and the ei's are sequence expressions.

The select_from_where construction is translated into:

transform e1 with p1 -> 
   transform e2 with p2 -> 
         ...
       transform en with pn -> 
          if b then  [e] else []

XPath-like expressions

XPath-like expressions are of two kind : e/t , e/@a , (and e//t ) where e is an expression, t a type, and a an attribute.

They are syntactic sugar for :

 flatten(select x from <_ ..>[(x::t | _ )*] in e) 

and

select x from <_ a=x ..>_ in  e

Examples

Types and data for the examples

Let us consider the following DTD and the CDuce types representing a Bibliography

<!ELEMENT bib  (book* )>
<!ELEMENT book  (title,  (author+ | editor+ ), publisher, price )>
<!ATTLIST book  year CDATA  #REQUIRED >
<!ELEMENT author  (last, first )>
<!ELEMENT editor  (last, first, affiliation )>
<!ELEMENT title  (#PCDATA )>
<!ELEMENT last  (#PCDATA )>
<!ELEMENT first  (#PCDATA )>
<!ELEMENT affiliation  (#PCDATA )>
<!ELEMENT publisher  (#PCDATA )>
<!ELEMENT price  (#PCDATA )>

type bib = <bib>[book*]
type book = <book year=String>[title  (author+ | editor+ ) publisher price ]
type author = <author>[last  first ]
type editor = <editor>[last  first  affiliation ]
type title  = <title>[PCDATA ]
type last  = <last>[PCDATA]
type first  = <first>[PCDATA]
type affiliation = <affiliation>[PCDATA]
type publisher  = <publisher>[PCDATA]
type price  = <price>[PCDATA]

and some values

let biblio : bib = 
           <bib>[
              <book year="1994">[
                 <title>['TCP/IP Illustrated']
                 <author>[
	            <last>['Stevens']
		    <first>['W.']]
                 <publisher>['Addison-Wesley']
	         <price>['65.95'] ]
	     <book year="1992">[
                <title>['Advanced Programming in the Unix environment']
		<author>[
		   <last>['Stevens'] 
		   <first>['W.']]
        	<publisher>['Addison-Wesley']
        	<price>['65.95'] ]
	     <book year="2000">[ 
	        <title>['Data on the Web']
        	<author>[
		  <last>['Abiteboul'] 
		  <first>['Serge']]
	        <author>[
		  <last>['Buneman'] 
		  <first>['Peter']]
                <author>[
		  <last>['Suciu'] 
		  <first>['Dan']]
	        <publisher>['Morgan Kaufmann Publishers']
                <price>['39.95'] ]
	     <book year="1999">[
	        <title>['The Economics of Technology and Content for Digital TV']
                <editor>[
                   <last>['Gerbarg'] 
		   <first>['Darcy']
                   <affiliation>['CITI'] ]
		<publisher>['Kluwer Academic Publishers']
                <price>['129.95']]
             ]

Projections

  • All titles in the bibliography biblio
    let titles = [biblio]/book/title
    
    

    Which yields to:

    val titles : [ title* ] = [ <title>[ 'TCP/IP Illustrated' ]
                                <title>[ 'Advanced Programming in the Unix environment' ]
                                <title>[ 'Data on the Web' ]
                                <title>[ 'The Economics of Technology and Content for Digital TV' ]
                              ]
    
    
  • All authors in the bibliography biblio
    let authors = [biblio]/book/<author>_
    
    

    Yielding the result:

    val authors : [ <author>[ last first ]* ] = 
                                [ <author>[ <last>[ 'Stevens' ] <first>[ 'W.' ] ]
                                  <author>[ <last>[ 'Stevens' ] <first>[ 'W.' ] ]
                                  <author>[
                                    <last>[ 'Abiteboul' ]
                                    <first>[ 'Serge' ] ]
                                  <author>[
                                    <last>[ 'Buneman' ]
                                    <first>[ 'Peter' ] ]
                                  <author>[ <last>[ 'Suciu' ] <first>[ 'Dan' ] ]
                                ]
    
    

    Note the difference between this two projections.
    In the fist one, we use the preset type title (type title = <title>[PCDATA ] ).
    In the second one, the type <author>_ means all the xml fragments beginning by the tag author ( _ means Any), and this tag is without attribute. In contrary, we write note <author ..>_.

  • All books having an editor in the bibliography biblio
    let edibooks = [biblio]/<book year=_>[_* editor _*]
    
    

    Yielding:

    val edibooks : [ <book year=String>[ title editor+ publisher price]* ] = 
               [ <book year="1999">[
                     <title>[ 'The Economics of Technology and Content for Digital TV']
                     <editor>[
                        <last>[ 'Gerbarg' ]
                        <first>[ 'Darcy' ]
                     <affiliation>[ 'CITI' ] ]
                     <publisher>[ 'Kluwer Academic Publishers' ]
                     <price>[ '129.95' ] ]
              ]
    
    
  • All books without authors.
    let edibooks2 = [biblio]/<book ..>[(Any\author)*]
    
    

    Yielding:

    val edibooks2 : [ <book year=String>[ title editor+ publisher price]* ] = 
               [ <book year="1999">[
                     <title>[ 'The Economics of Technology and Content for Digital TV']
                     <editor>[
                        <last>[ 'Gerbarg' ]
                        <first>[ 'Darcy' ]
                     <affiliation>[ 'CITI' ] ]
                     <publisher>[ 'Kluwer Academic Publishers' ]
                     <price>[ '129.95' ] ]
              ]
    
    
  • The year of books having a price of 65.95 in the bibliography biblio
    let books = [biblio]/<book ..>[_* <price>['65.95']]/@year
    
    

    Yielding:

    val books : [ String* ] = [ "1994" "1992" ]
    
    
  • All the authors and editors in biblio
    let aebooks = [biblio]/book/(author|editor)
    
    

    Yielding:

    val aebooks : [ (editor | author)* ] = [ <author>[
                                             <last>[ 'Stevens' ]
                                             <first>[ 'W.' ]
                                             ]
                                           <author>[
                                             <last>[ 'Stevens' ]
                                             <first>[ 'W.' ]
                                             ]
                                           <author>[
                                             <last>[ 'Abiteboul' ]
                                             <first>[ 'Serge' ]
                                             ]
                                           <author>[
                                             <last>[ 'Buneman' ]
                                             <first>[ 'Peter' ]
                                             ]
                                           <author>[
                                             <last>[ 'Suciu' ]
                                             <first>[ 'Dan' ]
                                             ]
                                           <editor>[
                                             <last>[ 'Gerbarg' ]
                                             <first>[ 'Darcy' ]
                                             <affiliation>[ 'CITI' ]
                                             ]
                                           ]
    
    

An interesting point in Cduce is the static typing of an expression. By example if we consider the third projection, "All books having an editor in the bibliography", CDuce knows the type of the result of the value edibooks:

val edibooks : [ <book year=String>[ title editor+ publisher price]* ] = 
...

This type represents a book without author (see the book type in the type declaration in the top of this section). Now if we want to know all authors of this list of books edibooks:

let authorsofedibooks = edibooks/author

Yelding:

Warning at chars 24-39:
This projection always returns the empty sequence
val authorsofedibooks : [  ] = ""

In fact the value edibooks must be a subtype of [<_ ..>[Any* author Any*] *], and here this is not the case. If you want to be sure, test this:

match edibooks with [<_ ..>[_* author _*] *] -> "this is a subtype" | _ -> "this is not a subtype" ;;

An other projection should be convince you, is the query:

let freebooks = [biblio]/book/<price>['0']
]

Yelding:

val freebooks : [ <price>[ '0' ]* ] = ""

There is no free books in this bibliography, That is not indicated by the type of biblio. Then, this projection returns the empty sequence ("")

Select_from_where

The same queries we wrote above can of course be programmed with the select_from_where construction

All the titles

let tquery = select y 
             from x in [biblio]/book ,
                  y in [x]/title

This query is programmed in a XQuery-like style largely relying on the projections. Note that x and y are CDuce's patterns. The result is:

 
val tquery : [ title* ] = [ <title>[ 'TCP/IP Illustrated' ]
                            <title>[ 'Advanced Programming in the Unix environment' ]
                            <title>[ 'Data on the Web' ]
                            <title>[ 'The Economics of Technology and Content for Digital TV' ]
                          ]

Now let's program the same query with the translation given previously thus eliminating the y variable

let withouty = flatten(select [x] from x in [biblio]/book/title)

Yielding:

val tquery : [ title* ] = [ <title>[ 'TCP/IP Illustrated' ]
                            <title>[ 'Advanced Programming in the Unix environment' ]
                            <title>[ 'Data on the Web' ]
                            <title>[ 'The Economics of Technology and Content for Digital TV' ]
                          ]

But the select_from_where expressions are likely to be used for more complex queries such as the one that selects all titles whose at least one author is "Peter Buneman" or "Dan Suciu"

let sel = select y 
          from x in [biblio]/book ,
               y in [x]/title,
               z in [x]/author
where ( (z = <author>[<last>['Buneman']<first>['Peter']])
 || (z = <author>[<last>['Suciu'] <first>['Dan']]) )

Which yields:

val sel : [ title* ] = [ <title>[ 'Data on the Web' ]
                         <title>[ 'Data on the Web' ]
                         ]

Note that the corresponding semantics, as in SQL, is a multiset one. Thus duplicates are not eliminated. To discard them, one has to use the distinct_values operator.

A pure pattern example

This example computes the same result as the previous query except that duplicates are eliminated. It is written in a pure pattern form (i.e., without any XPath-like projections)

let sel = select t
from <_ ..>[(x::book| _ )*] in [biblio],
     <_ ..>[ t&title _* (<author>[<last>['Buneman']<first>['Peter']]
                       | <author>[<last>['Suciu'] <first>['Dan']]) ; _]  in x  


Note the pattern on the second line in the from clause. As the type of an element in x is type book = <book year=String>[title (author+ | editor+ ) publisher price ], we skip the tag : <_ ..>, then we then capture the corresponding title (t &title) then we skip authors _* until we find either Peter Buneman or Dan Suciu (<author>[<last>['Buneman']<first>['Peter']]| <author>[<last>['Suciu'] <first>['Dan']]), then we skip the remaining authors and the tail of the sequence by writing ; _

Result:

val sel : [ title* ] = [ <title>[ 'Data on the Web' ] ]

This pure pattern form of the query yields (in general) better performance than the same one written in an XQuery-like programming style. However, the query optimiser automatically translates the latter into a pure pattern one

Joins

This example is the exact transcription of query Q5 of XQuery use cases. On top of this section we give the corresponding CDuce types. We give here the type of the document to be joined, and the sample value.

type Reviews =<reviews>[Entry*]
type Entry = <entry> [ Title Price Review]
type Title = <title>[PCDATA]
type Price= <price>[PCDATA]
type Review =<review>[PCDATA]

let bstore2 : Reviews =
<reviews>[
    <entry>[
        <title>['Data on the Web']
        <price>['34.95']
        <review>
               ['A very good discussion of semi-structured database
               systems and XML.']
     ]
    <entry>[
        <title>['Advanced Programming in the Unix environment']
        <price>['65.95']
        <review>
               ['A clear and detailed discussion of UNIX programming.']
     ]
    <entry>[
        <title>['TCP/IP Illustrated']
        <price>['65.95']
        <review>
               ['One of the best books on TCP/IP.']
    ]
]

The queries are expressed first in an XQuery-like style, then in a pure pattern style: the first pattern-based query is the one produced by the automatic translation from the first one. The last query correponds to a pattern aware programmer's version.

XQuery style

<books-with-prices>
select <book-with-price>[t1 <price-bstore1>([p1]/Char) <price-bstore2>([p2]/Char)]
from b in [biblio]/book ,
     t1 in [b]/title,
     e in [bstore2]/Entry,
     t2 in [e]/Title,
     p2 in [e]/Price,
     p1 in [b]/price
 where t1=t2

Automatic translation of the previous query into a pure pattern (thus more efficient) one

<books-with-prices>
select <book-with-price>[t1 <price-bstore1>x10 <price-bstore2>x11 ]
from <_ ..>[(x3::book|_)*] in [biblio],
     <_ ..>[(x9::price|x5::title|_)*] in x3,
     t1 in x5,
     <_ ..>[(x6::Entry|_)*] in [bstore2],
     <_ ..>[(x7::Title|x8::Price|_)*] in x6,
     t2 in x7,
     <_ ..>[(x10::Char)*] in x9,
     <_ ..>[(x11::Char)*] in x8
 where t1=t2

Pattern aware programmer's version of the same query (hence hand optimised). This version of the query is very efficient. Be aware of patterns.

<books-with-prices>
select <book-with-price>[t2 <price-bstore1>p1 <price-bstore2>p2]
from <bib>[b::book*] in [biblio],
     <book ..>[t1&title _* <price>p1] in b,
     <reviews>[e::Entry*] in [bstore2],
     <entry>[t2&Title <price>p2 ;_] in e
where t1=t2

More complex Queries: on the power of patterns

let biblio = [biblio]/book;;

<bib>
    select <book (a)> x
      from <book (a)>[ (x::(Any\editor)|_ )* ] in biblio 

This expression returns all book in bib but removoing the editor element. If one wants to write more explicitly:

    select <book (a)> x
      from <book (a)>[ (x::(Any\<editor ..>_)|_ )* ] in biblio

Or even:

    select <book (a)> x
      from <book (a)>[ (x::(<(_\`editor) ..>_)|_ )* ] in biblio

Back to the first one:

<bib>
    select <book (a)> x
      from <(book) (a)>[ (x::(Any\editor)|_ )* ] in biblio

This query takes any element in bib, tranforms it in a book element and removes sub-elements editor, but you will get a warning as capture variable book in the from is never used: we should have written <(_) (a)> instead of <(book) (a)> the from

    select <(book) (a)> x
      from <(book) (a)>[ (x::(Any\editor)|_ )* ] in biblio

Same thing but without tranforming tag to "book".
More interestingly:

    select <(b) (a\id)> x
      from <(b) (a)>[ (x::(Any\editor)|_ )* ] in biblio

removes all "id" attribute (if any) from the attributes of the element in bib.


    select <(b) (a\id+{bing=a.id})> x
      from <(b) (a)>[ (x::(Any\editor)|_ )* ] in biblio

Changes attribute id=x into bing=x However, one must be sure that each element in bib has an id attribute if such is not the case the expression is ill-typed. If one wants to perform this only for those elements which certainly have an id attribute then:

    select <(b) (a\id+{bing=a.id})> x
      from <(b) (a&{id=_})>[ (x::(Any\editor)|_ )* ] in biblio

An unorthodox query: Formatted table generation

The following program generates a 10x10 multiplication table:

let bg ((Int , Int) -> String) 
  (y, x) -> if (x mod 2 + y mod 2 <= 0) then "lightgreen" 
            else if (y mod 2 <= 0) then "yellow"
	    else if (x mod 2 <= 0) then "lightblue"
	    else "white";;

<table border="1">
  select <tr> select <td align="right" style=("background:"@bg(x,y)) >[ (x*y) ]
              from y in [1 2 3 4 5 6 7 8 9 10] : [1--10*] 
  from x in [1 2 3 4 5 6 7 8 9 10] : [1--10*];;
 

The result is the xhtml code that generates the following table:

12345678910
2468101214161820
36912151821242730
481216202428323640
5101520253035404550
6121824303642485460
7142128354249566370
8162432404856647280
9182736455463728190
102030405060708090100

You can cut and paste the code on this page and test it on the online interpreter.