| 27 |
</box> |
</box> |
| 28 |
|
|
| 29 |
<box title="Tralala" link="Tralala"> |
<box title="Tralala" link="Tralala"> |
| 30 |
<p>Tralala est ...</p> |
<p> |
| 31 |
|
This ACI is motivated by the increasing number of applications that |
| 32 |
|
produce, consume or handle large sets of data, or |
| 33 |
|
``\emph{datamasses}''. In many cases, these are either raw data or a |
| 34 |
|
collection of data from various sources, both of which lack uniform |
| 35 |
|
descriptive criteria. Such cases require more flexibility than the |
| 36 |
|
classical relational model can provide, and have given rise to the |
| 37 |
|
so-called semi-structured data model~\cite{serge99}, of |
| 38 |
|
which XML is one of the most prominent examples. |
| 39 |
|
|
| 40 |
|
Our project intends to study the processing, querying and handling of large |
| 41 |
|
datamasses whenever data is available in XML format. We pay particular attention |
| 42 |
|
to the programming languages and query languages problems. We aim to cover in a |
| 43 |
|
uniform way a wide spectrum of different areas, namely: {\bf programming |
| 44 |
|
languages} (expressiveness, typing, new programming primitives, query underlying |
| 45 |
|
logics, logical optimization), {\bf data access\/} (streamed data, compression, |
| 46 |
|
access to secondary memory storages, persistency engines), {\bf implementation} |
| 47 |
|
(pattern matching compiling, physical optimization, subtyping verification, |
| 48 |
|
execution models for streamed data). |
| 49 |
|
|
| 50 |
|
We will tackle these challenges following three research directions: |
| 51 |
|
|
| 52 |
|
\begin{description} |
| 53 |
|
\item[query languages:] one of the characteristics of the relation model is to |
| 54 |
|
base query languages on the relational algebra or the relational calculus. |
| 55 |
|
These are paradigms characterized by {\it high declarativity\/} (in the |
| 56 |
|
sense that they describe the result rather the way to obtain the result) |
| 57 |
|
and limited expressiveness (notably, they are not Turing complete). The |
| 58 |
|
``simplicity'' of these languages is at the origin of the good |
| 59 |
|
performances, performances that can be improved by using the algebraic |
| 60 |
|
properties of the operators (logical optimization) or by secondary memory |
| 61 |
|
management techniques (physical optimization). Our goal is to develop a |
| 62 |
|
similar, or at least close, framework for the XML model, and we will |
| 63 |
|
pursue it as follows: theoretical study of the expressiveness and |
| 64 |
|
complexity of the query languages; definition of query languages for XML |
| 65 |
|
and their implementation; definition and validation of optimization |
| 66 |
|
techniques. |
| 67 |
|
|
| 68 |
|
\item[streaming:] the possibility of process streams of data without needing of storing whole documents (if not partially) is crucial in the context of datamasses. We will consider the |
| 69 |
|
aspects related to streaming also when the data is compressed. |
| 70 |
|
always possible~\cite{segoufin1}, so one of the main difficulty to |
| 71 |
|
overcome here is to identify a suitable class of ``streamable'' |
| 72 |
|
queries, with or without compression, and in the former case to |
| 73 |
|
determine optimal compression granularity. |
| 74 |
|
\item[document typing :] type systems are used in the first place for |
| 75 |
|
document validation and for checking integrity constraints, but as |
| 76 |
|
with standard programming languages, types are at the basis of many |
| 77 |
|
helpful optimizations. This makes the study of typing systems one of |
| 78 |
|
our primary objectives. |
| 79 |
|
|
| 80 |
|
Another motivation for line of work is our interest in integrity |
| 81 |
|
constraints whose satisfaction does not depend on the ordering of |
| 82 |
|
the fields in a document, unlike the constraints expressible in |
| 83 |
|
``classical'' type systems for XML such as DTD. This is a natural |
| 84 |
|
choice when processing data originating from the fusion of several |
| 85 |
|
relational databases (a frequent instance of large documents), since |
| 86 |
|
the order of the fields is then irrelevant. |
| 87 |
|
\end{description} |
| 88 |
|
The groups involved in our project have each already been working |
| 89 |
|
separately on XML document handling, although this is only one of the |
| 90 |
|
incentives for us to work together. Indeed, we share the same |
| 91 |
|
fundamental theoretic approach, namely automata theory and the |
| 92 |
|
associated logics, and the same interest in query languages and |
| 93 |
|
document validation: typing, integrity constraints |
| 94 |
|
|
| 95 |
|
Beyond our agreement on foundational tools and our agreement on goals, |
| 96 |
|
cooperation inside the project is further strengthened by the choice |
| 97 |
|
of a single software target, the CDuce language~\cite{BCF02,CDuce}, a |
| 98 |
|
joint development of LIENS and LRI, two of the sites involved in this |
| 99 |
|
project. |
| 100 |
|
</p> |
| 101 |
|
|
| 102 |
|
|
| 103 |
<p>More information about the project can we found in the following page on |
<p>More information about the project can we found in the following page on |