Thèses

  • Langage de combinateurs pour XML: Conception, Typage, Implantation. Kim Nguyen . Thèse en informatique de l'Université Paris-Sud 11
    Abstract:

    We study in this thesis the design of a programming language for XML. Many exist- ing works tackle this problem: main-stream languages coupled with XML libraries (C, Java, . . . ), "standardized" languages for XML (XSLT, XQuery, XPath), specialized languages (XDuce, CDuce, XTatic, . . . ) and lastly, restricted formalisms (k-peeble tree transducers, macro-tree transducers, regular expression filters). There are many flaws in these current approaches : main-stream and standardized languages do not take XML types into account (they are very lightly typed at best), specialized lan- guages suffers from a complex type-system (requiring the programmer to heavily annotate his/her code), do not feature type inference and provide poor code modu- larity. Restricted formalisms, which seem to best fulfill our expectations are neither practical (macro-tree transducers, k-peeble tree transducers) nor sufficiently expres- sive (regular expression filters cannot encode XPath expression for instance). Our solution consists of a restricted language of combinators, dubbed filters, which features a precise typing policy, a type inference algorithm and are powerful enough to express complex XML transformations (XPath encoding, XSLT transfor- mations, . . . ). The first part of this thesis is devoted to the theoretical definition of the core language, its semantics and its type system as well as a type inference algorithm. In a second part, we study practical aspects of this language such as its embedding in a full-fleged language (CDuce) and give an encoding of a non-trivial fragment of XPath into filters. We also show how to use the typing informations to optimize the loading of an XML document.

  • Logics for unranked and unordered trees and their use for querying semistructured data . Iovka Boneva . PhD thesis. Université des Sciences et Technologies de Lille - Lille 1
    Abstract:

    Trees are said to be unordered if the order between successor nodes of a given node is not important. Unranked trees are trees which do not have a priori bound on the number of successors of each node of the tree. Unranked and unordered trees are a possible model for semistructured data. The advantage of this model is that it is less dependant on the representation of semistructured data as XML documents, and the drowback is that formalisms (tree automata, logics) for manipulating this kind of trees are generally related with worse complexity results compared with similar formalisms for unranked and ordered trees. Our aim is a theoretical study of some logics for unranked and unordered trees, espetially from the point of view of using it as query languages for semistructured data. Thus, we are interested in the model checking problem and its complexity, in the satisfiability problem and in caracterising the expressive power of these logics. The model checking problem is, given a tree and a logic formula, to decide whether the trees satisfies the formula; model checking is the mecanism used for query evaluation. The satisfiability problem is, given a logic formula, to decide whether this formula admits some models; satisfiability can be used for query optimisation. Finally, caracterisation of the expressive power of a logic is a caracterisation of the kind of properties it can express.

  • Requêtes Graphes . Denis Debarbieux . PhD thesis. Université des Sciences et Technologies de Lille - Lille 1
    Abstract:

  • Automates d'arbres à jetons. Mathias Samuelides . Thèse en informatique de l'Université Paris 7
    Abstract:

  • Model-checking pour les ambients : des algèbres de processus aux données semi-structurées . Jean-Marc Talbot . Habilitation à diriger les recherches. Université des Sciences et Technologies de Lille - Lille 1
    Abstract:

    Le problème de model-checking pour une logique L et une classe de structures C est de savoir si une formule (close) donnée de L est vraie pour une certaine structure de C. Ce problème revêt différentes formes selon la thématique dans laquelle il est étudié : dans le cadre de la vérification, il s'agit de tester si un système (ou une abstraction de celui-ci) vérifie une spécification logique. Dans le cadre des bases de données relationnelles ou semi-structurées, ce problème, appelé également requêtes booléennes, consiste à vérifier que la base de données satisfait un certain nombre de contraintes, comme, par exemple, la conformité vis-à-vis d'un schéma XML. Dans tous les cas, lorsqu'on considère un problème de model-checking, la décidabilité de celui-ci et le cas échéant, sa complexité sont des questions cruciales. Dans ce mémoire, nous résumons nos activités de recherches sur le problème de model-checking, problème que nous avons étudié dans le cadre restreint du calcul des ambients. Cette étude a été effectuée avec une double approche, d'un cêté, dans l'optique "programmation - vérification pour les algèbres de processus" et de l'autre, dans celle de "base de données - interrogation de données semi-structurées". Le calcul des ambients est une algèbre de processus visant à modéliser l'informatique mobile; ce calcul est basé sur une notion de domaines hiérarchiquement organisés, hiérarchie qui évolue au cours de l'exécution du système. Nous nous sommes intéressés au problème de vérification du calcul des ambients pour des spécifications écrites dans une logique dédiée appelée logique des ambients. Pour divers fragments du calcul et de la logique, nous avons établi des bornes fines concernant la décidabilité et la complexité du problème de model-checking. Nous montrons en particulier qu'en présence de réplication, le problème de model-checking est indécidable, y compris pour des petits fragments de la logique et du calcul. Débarrassés de leurs mécanismes d'exécution, les processus des ambients se résument à leurs structures hiérarchiques correspondant à des données arborescentes. Cette vision du calcul et de la logique associée a donné naissance au langage TQL (Tree Query Language) d'interrogation de données semi-structurées. Nous avons mené une étude formelle de cette logique TQL, en particulier concernant les problèmes de model-checking et de satisfiabilité (l'existence d'une structure de la classe C qui est modèle d'une formule donnée de la logique L). Nous avons également mesuré l'expressivité de cette logique en la comparant notamment à la logique monadique du second-ordre et à certaines de ses extensions. Ces travaux ont été menés au sein de l'équipe STC du LIFL. De plus, les recherches sur la validation et l'interrogation de données semi-structurées s'inscrivent dans la thématique du projet MOSTRARE de l'INRIA Futurs.

  • Langages de requêtes pour XML à base de patterns : conception, optimisation et implantation. Cédric Miachon . Thèse en informatique de l'Université Paris-Sud 11
    Abstract:

    Dans les dernières années XML est devenu un véritable modèle de bases de données permettant de représenter, stocker et échanger des données semi-structurées. Il est devenu alors nécessaire de développer des langages de requêtes efficaces pour ce modèle. Différents langages de requêtes existent qui utilisent une primitive de déconstruction dans le but de capturer des parties de documents XML, qui peuvent êtes vus comme des arbres. Il existe deux déconstructeurs : (1). la navigation par chemins qui permet de naviguer en profondeur (par des projections) à l'intérieur d'un arbre afin de capturer un sous-arbre (XPath), (2). le filtrage par motifs qui permet de capturer en largeur différents sous-arbres (XDuce, CDuce). L'objectif de cette thèse est d'offrir au langage CDuce un langage de requêtes déclaratif, qui puisse tirer parti du typage fort et statique de CDuce. Ce langage de requêtes (appelé CQL) est formellement défini et permet d'utiliser et de combiner en une requête les deux déconstructeurs, dans le but d'écrire des requêtes concises et expressives. Partant du postulat que le filtrage par motifs est plus performant que la navigation descendante, nous donnons une traduction optimisante qui réécrit toutes les projections d'une requête en motifs. Cette traduction et d'autres optimisations ont été validées par des jeux de tests et des "microbenchmarks", ainsi que comparées avec d'autres moteurs de requêtes. L'écriture de requêtes avec motifs pouvant être laborieux pour un utilisateur non averti, une interface graphique (appelée PBE) est proposée qui permet de faciliter cette écriture en étant guidée par les types de la DTD.

  • XML Access Modules: Towards Physical Data Independence in XML Databases . Andrei Arion . Thèse en informatique de l'Université Paris-Sud 11
    Abstract:

    The purpose of this thesis is to design a framework for achieving the goal of physical data independence in XML databases. We first propose the XML Access Modules - a rich tree pattern language featuring multiple returned nodes, nesting, structural identifiers and optional nodes, and we show how it can be used to uniformly describe a large set of XML storage schemes, indices and materialized views.
    A second part of this thesis focuses on the problem of XQuery rewriting using XML Access Modules. As a first step of our rewriting approach we present an al- gorithm to extract XML Access Modules patterns from XQuery and we show that the patterns we identify are strictly larger than in previous works, and in particular may span over nested XQuery blocks. We characterize the complexity of tree pattern containment (which is a key subproblem of rewriting) and rewriting itself, under the constraints expressed by a structural summary, whose enhanced form also entails integrity constraints. We also show how to exploit the structural identifiers from the view
    definitions in order to enhance the rewriting opportunities. We have implemented our framework in the ULoad prototype and we evalu- ate the performance of our approach for query containment and rewriting.

Publications

  • Semantic Subtyping: dealing set-theoretically with function, union, intersection, and negation types. A. Frisch, G. Castagna, and V. Benzaken. . Journal of the ACM 55(4):1-64, 2008.
    Abstract:

    Subtyping relations are usually defined either syntactically by a formal system or semantically by an interpretation of types into an untyped denotational model. This work shows how to define a subtyping relation semantically in the presence of Boolean connectives, functional types and dynamic dispatch on types, without the complexity of denotational models, and how to derive a complete subtyping algorithm.

  • Path Summaries and Path Partitioning in Modern XML Databases. Andrei Arion, Angela Bonifati, Ioana Manolescu, and Andrea Pugliese. Dans World Wide Web Journal, volume 11, no 1, february 2008
    Abstract:

    XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document's content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary's many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-effi cient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the effi ciency of twig query processing up to two orders of magnitude over the similar tag-partitioned indexing model. We have implemented the path-partitioned storage model and path summaries in the XQueC compressed database prototype [8]. We present an experimental evaluation of a path summary's practical feasibility and of tree pattern matching in a path-partitioned store.

  • XPath, transitive closure logic, and nested tree walking automata. Balder ten Cate , and Luc Segoufin . In the ACM Principle of Databases Systems(PODS) 2008
    Abstract:

    We consider the navigational core of XPath, extended with two operators: the Kleene star for taking the transitive closure of path expressions, and a subtree relativisation operator, allowing one to restrict attention to a specific subtree while evaluating a subexpression. We show that the expressive power of this XPath dialect equals that of \fotc, first order logic extended with unary transitive closure. We also give a characterization in terms of nested tree-walking automata. Using the latter we then proceed to show that the language is strictly less expressive than MSO. This solves an open question about the relative expressive power of FO+TC1 and MSO on trees. We also investigate the complexity for our XPath dialect. We show that query evaluation be done in polynomial time (combined complexity), but that satisfiability and query containment (as well as emptiness for our automaton model) are 2ExpTime-complete (it is ExpTime-complete for Core XPath).

  • Complexity of pebble tree-walking automata . Mathias Samuelides , and Luc Segoufin . In the Fundamentals of Computation Theory (FCT) 2007
    Abstract:

    We consider tree-walking automata using k pebbles. The pebbles are either strong (can be lifted from anywhere) or weak (can be lifted only when the automaton is on it). For each k, we give the precise complexities of the problems of emptiness and inclusion of tree-walking automata using k pebbles.

  • Constant-memory validation of streaming XML documents against DTDs . Cristina Sirangelo , and Luc Segoufin . In the Int'l Conference on Database Theory (ICDT) 2007
    Abstract:

    In this paper we investigate the problem of validating, with constant memory, streaming XML documents with respect to a DTD. Such constant memory validations can only be performed for some but not all DTDs. This paper gives a non trivial interesting step towards characterizing those DTDs for which a constant-memory on-line algorithm exists.

  • Expressive power of pebble automata . Mikolaj Bojanczyk , Mathias Samuelides , Thomas Schwentick , and Luc Segoufin . In the Int'l Colloquium on Automata, Languages and Programming (ICALP) 2006
    Abstract:

    Two variants of pebble tree-walking automata on trees are considered that were introduced in the literature. It is shown that for each number of pebbles, the two models have the same expressive power both in the deterministic case and in the nondeterministic case. Furthermore, nondeterministic (resp. deterministic) tree-walking automata with n+1 pebbles can recognize more languages than those with n pebbles. Moreover, there is a regular tree language that is not recognized by any tree-walking automaton with pebbles. As a consequence, FO+posTC is strictly included in MSO over trees.

  • Structured Materialized Views for XML Queries. Andrei Arion, Veronique Benzaken, Ioana Manolescu, and Yannis Papakonstantinou. Proceedings of the Very Large Databases (VLDB) Conference, 2007
    Abstract:

    The performance of XML database queries can be greatly enhanced by rewriting them using materialized views. We study the problem of rewriting a query using materialized views, where both the query and the views are described by a tree pattern language, appropriately extended to capture a large XQuery subset. The pattern language features optional nodes and nesting, allowing to capture the data needs of nested XQueries. The language also allows describing storage features such as structural identifiers, which enlarge the space of rewritings. We study pattern containment and equivalent rewriting under the constraints expressed in a structural summary, whose enhanced form also entails integrity constraints. Our approach is implemented in the ULoad [7] prototype and we present a performance analysis.

  • XQueC: A Query-Conscious XML Database. Andrei Arion, Angela Bonifati, Ioana Manolescu, and Andrea Pugliese. In ACM Transactions on Internet Technologies, vol.7, no.4, november 2007
    Abstract:

    XML compression has gained prominence recently because it counters the disadvantage of the "verbose" representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where queries must be run over compressed documents, compression may not be benefficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits. While balancing the interests of compression and query processing has received significant attention in the domain of relational databases, these results do not immediately translate to XML data. In this paper, we address the problem of embedding compression into XML databases without degrading query performance. Since the setting is rather different from relational databases, the choice of compression granularity and compression algorithms must be revisited. Query execution in the compressed domain must also be rethought in the framework of XML query processing, due to the richer structure of XML data. Indeed, a proper storage design for the compressed data plays a crucial role here. The XQueC system (standing for XQuery Processor and Compressor) covers a wide set of XQuery queries in the compressed domain, and relies on a workload-based cost model to perform the choices of the compression granules and of their corresponding compression algorithms. As a consequence, XQueC provides efficient query processing on compressed XML data. An extensive experimental assessment is presented, showing the effectiveness of the cost model, the compression ratios and the query execution times.

  • Algebra-Based Tree Pattern Extraction in XQuery. Andrei Arion, Veronique Benzaken, Ioana Manolescu, Yannis Papakonstantinou, and Ravi Vijay. Proceedings of the Flexible Query Answering Systems (FQAS), 2006
    Abstract:

    Query processing performance in XML databases can be greatly enhanced by the usage of materialized views whose content has been stored in the database. This requires a method for identifying query subexpressions matching the views, a process known as view-based query rewriting. This process is quite complex for relational databases, and all the more daunting on XML databases. Current XML materialized view proposals are based on tree patterns, since query navigation is conceptually close to such patterns. However, the existing algorithms for extracting tree patterns from XQuery do not detect patterns across nested query blocks. Thus, complex, useful tree pattern views may be missed by the rewriting algorithm. We present a novel tree pattern extraction algorithm from XQuery queries, able to identify larger patterns than previous methods. Our algorithm has been implemented in an XML database prototype.

  • Semantic sub typing for the π-calculus. G. Castagna, R. De Nicola, and D. Varacca. Theoretical Computer Science , 2007. Special issue in honor of Mario Coppo, Mariangiola Dezani-Ciancagli ni and Simona Ronchi della Rocca. To appear.
    Abstract:

    Subtyping relati ons for the π-calculus are usually defined in a syntactic way, by means of structural rules. We propose a semantic characterisation of channel types and use it to derive a subtyping relation. The type system we consider includes read-only and write-only channel types, as well as boolean combinations of types. A set-theoretic interpretation of types is provided, in which boolean combinations of types are interpreted as the corresponding set-theoretic operations. Subtyping is defined as inclusion of the interpretations. We prove the decidability of the subtyping relation and sketch the subtyping algorithm. In order to fully exploit the type system, we define a variant of the π-calculus where communication is subjected to pattern matching that performs dynamic typecase.

  • Typed Iterators for XML. G. Castagna, and K. Nguyen. Dans ICFP '08, ACM-SIGPLAN Conference on Functional Programming XML, septembre, 2008.
    Abstract:

    XML transformations are very sensitive to types: XML types describe the tags and attributes of XML elements as well as the number, kind, and order of their sub-elements. Therefore, even very basic operations such as changing a tag, renaming an attribute, or adding an element generally imply conspicuous changes from the type of the input to the type of the output. Such operations are applied on XML documents by iterators that, to be useful, need to be typed by some kind of polymorphism that goes beyond what currently exists. For this reason these iterators are not programmed but, rather, hard-coded in the language. However, this approach soon reaches its limits, as the hard-coded iterators cannot cover fairly standard usage scenarios. As a solution to this problem we propose a generic language to define iterators for XML data to be grafted on some host programming language. We show that our language mostly offers the required degree of polymorphism, study its formal properties, and show its expressiveness and practical impact by providing several usage examples and encodings.

  • A Theory of Contracts for Web Services. G. Castagna, N. Gesbert, and L. Padovani. Dans POPL '08, 35th ACM Symposium on Principles of Programming Languages, janvier, 2008.
    Abstract:

    Contracts are behavioural descriptions of Web services. We devise a theory of contracts that formalises the compatibility of a client to a service, and the safe replacement of a service with another service. The use of contracts statically ensures the successful completion of every possible interaction between compatible clients and services. The technical device that underlies the theory is the definition of filters, which are explicit coercions that prevent some possible behaviours of services and, in doing so, they make services compatible with different usage scenarios. We show that filters can be seen as proofs of a sound and complete subcontracting deduction system which simultaneously refines and extends Hennessy's classical axiomatisation of the must testing preorder. The relation is decidable and the decision algorithm is obtained via a cut-elimination process that proves the coherence of subcontracting as a logical system. Despite the richness of the technical development, the resulting approach is based on simple ideas and basic intuitions. Remarkably, its application is mostly independent of the language used to program the services or the clients. We also outline the possible practical impact of such a work and the perspectives of future research it opens.

  • Pattern by Example: type-driven visual programming of XML queries. V. Benzaken, G. Castagna, D. Colazzo, and C. Miachon. Dans PPDP '08, ACM Conference on Principle and Practice of Declarative Programming, 2008.
    Abstract:

    We present Pattern-by-Example (PBE), a graphical language that allows users with little or no knowledge of pattern-matching and functional programming to define complex and optimized queries on XML documents. We demonstrate the key features of PBE by commenting an interactive session and then we present its semantics, by formally defining a translation from PBE graphical queries into CQL ones. The advantages of the approach are twofold. First, it generates queries that are provably correct with respect to types: the type of the result is displayed to the user and this constitutes a first and immediate visual check of the semantic correctness of the resulting query. The second advantage is that a semantics formally―thus, unambiguously―defined is an important advancement over some current approaches in which standard usage and learning methods are based on ``trial and error'' techniques

  • Testing XML constraint satisfiability. N. Bidoit, and D. Colazzo. Electronic Notes in Theoretical Computer Science, Volume 174(6):45-61, 2007
    a full version of : International Workshop on Hybrid Logic 2006 (HyLo)colocated with LICS 2006.
    Abstract:

    In a previous paper, we have showed that Hybrid Modal Logic can be successfully used to model \semi\/ data and provides a simple and well suited formalism for capturing ``well typed'' references and of course a powerful language for expressing constraint. This paper builds on the previous one and provides a tableau proof technique for constraint satisfiability testing in the presence of schemas.

  • Capturing well typed references in DTDs. . N. Bidoit, and D. Colazzo. Proceedings of the Journées Bases de données, 2006.
    A full version entitled Hybrid Logic for expressing XML schemas with typed references is under submission.
    Abstract:

    Surprisingly enough, there has been few investigations for typing references of semidata and XML documents. This paper build on a previous proposal [BiCeTh04] introducing simple schemas with well-typed references and showing that such schemas, called normalized ref-schemas, are expressible as formulas of Hybrid Modal Logic. The aim of the present paper is to extend normalized ref-schemas in order to allow one for general regular expressions and provide a fully general notion of schema capturing well-typed references, called ref-schemas. The main contribution of the paper is to show that ref-schemas are still expressible in Hybrid Modal Logic which entails that tools like for instance the tableau system developed in [hylo] can be used in order to check for constraint satisfiability in presence of ref-schemas.

  • Type-Based XML Projection. V. Benzaken, G. Castagna, D. Colazzo, and K. Nguyen. Dans VLDB 2006, 32nd International Conference on Very Large Data Bases, pag. 271-282, 2006.
    Abstract:

    XML data projection (or pruning) is one of the main optimization techniques recently adopted in the context of main-memory XML query-engines. The underlying idea is quite simple: given a query Q over a document D, the subtrees of D not necessary to evaluate Q are pruned, thus obtaining a smaller document D'. Then Q is executed over D', hence avoiding to allocate and process nodes that will never be reached by navigational specifications in Q. In this article, we propose a new approach, based on types, that greatly improves current solutions. Besides providing comparable or greater precision and far lesser pruning overhead our solution, unlike current approaches, takes into account backward axes, predicates, and can be applied to multiple queries rather than just to single ones. A side contribution is a new type system for XPath able to handle backward axes, which we devise in order to apply our solution. The soundness of our approach is formally proved. Furthermore, we prove that the approach is also complete (i.e., yields the best possible type-driven pruning) for a relevant class of queries and DTDs, which include nearly all the queries used in the XMark and XPathMark benchmarks. These benchmarks are also used to test our implementation and show and gauge the practical benefits of our solution.

  • Encoding ℂDuce into the ℂPi-calculus. G. Castagna, M. Dezani-Ciancaglini, and D. Varacca. Dans CONCUR 2006, 17th. International Conference on Concurrency Theory, n° 4137, LNCS, pag. 310-326, Springer, 2006.
    Abstract:

    CDuce is a functional programming language featuring overloaded functions and a rich type system with recursive types, subtyping, union, negation and intersection types. The boolean constructors have a set-theoretic behaviour defined via a semantic interpretation of the types. The Cpi-calculus is an extension of the pi-calculus that enriches Pierce and Sangiorgi pi-calculus subtyping with union, intersection, and negation types. It is based on the same set-theoretic interpretation as CDuce. In this work we present a type faithful encoding of the CDuce into the Cpi-calculus. This encoding is a modification of the Milner-Turner encoding of the lambda-calculus with subtyping into the pi-calculus with subtyping. The main difficulty to overcome was to find an encoding of the types that respects the subtyping relation. Besides the technical challenge, this effort is interesting because it sheds new light on the Milner-Turner encoding and on the relations between sequential and remote execution of functions/services, in particular in the presence of type-driven semantics. It also confirms the validity of the equational laws for union and intersection types in pi-calculus.

  • A formal account of contracts for Web Services. S. Carpineti, G. Castagna, C. Laneve, and L. Padovani. Dans WS-FM, 3rd Int. Workshop on Web Services and Formal Methods, n° 4184, LNCS, pag. 148-162, Springer, 2006.
    Abstract:

    The Web Service Description Language (WSDL) and the Web Service Conversation Language (WSCL) provide standard technologies for describing the static and dynamic interfaces of Web services. In particular, they permit the specification of the valid sequences of messages―the contract―required to interact successfully with a Web service. However, neither of them provides a formal definition of contract. We define a formal contract language along with conformance and compliance relations so that clients conforming to a contract are able to complete interactions with services compliant with the same contract. Our contract language is suitable for publication in a WSDL interface or as a WSCL resource. We take CCS as the formal model which our investigation is based upon.

  • Patterns and types for querying XML. G. Castagna. Dans Proceedings of DBPL 2005, 10th International Symposium on Database Programming Languages Lecture Notes in Computer Science, n.3774, pages 1-26 Springer (full version) and XSym 2005, 3rd International XML Database Symposium Lecture Notes in Computer Science n.3671, pages 1-3, Springer (summary), 2005. Joint DBPL-XSym invited talk.
    Abstract:

    Among various proposals for primitives for deconstructing XML data two approaches seem to clearly stem from practice: path expressions, widely adopted by the database community, and regular expression patterns, mainly developed and studied in the programming language community. We think that the two approaches are complementary and should be both integrated in languages for XML, and we see in that an opportunity of collaboration between the two communities. With this aim, we give a presentation of regular expression patterns and the type systems they are tightly coupled with. Although this article advocates a construction promoted by the programming language community, we will try to stress some characteristics that the database community, we hope, may find interesting.

  • A gentle introduction to semantic subtyping. G. Castagna, and A. Frisch. Dans Proceedings of PPDP '05, the 7th ACM SIGPLAN International Symposium on Principles and Practice of Declarative Programming, pages 198-208, ACM Press (full version) and ICALP '05, 32nd International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science n. 3580, pages 30-34, Springer (summary), juillet, 2005. Joint ICALP-PPDP keynote talk.
    Abstract:

    Subtyping relations are usually defined either syntactically by a formal system or semantically by an interpretation of types into an untyped denotational model. In this work we show step by step how to define a subtyping relation semantically in the presence of functional types and dynamic dispatch on types, without the complexity of denotational models, and how to derive a complete subtyping algorithm. It also provides a recipe to add set-theoretic union, intersection, and negation types to your favourite language. The presentation is voluntarily kept informal and discursive and the technical details are reduced to a minimum since we rather insist on the motivations, the intuition, and the guidelines to apply the approach.

  • Semantic subtyping: challenges, perspectives, and open problems. G. Castagna. Dans ICTCS 2005, Italian Conference on Theoretical Computer Science, n° 3701, Lecture Notes in Computer Science, pag. 1―20, Springer, 2005.
    Abstract:

    Semantic subtyping is a relatively new approach to define subtyping relations where types are interpreted as sets and union, intersection and negation types have the corresponding set-theoretic interpretation. In this lecture we outline the approach, give an aperçu of its expressiveness and generality by applying it to the λ-calculus with recursive and product types and to the π-calculus. We then discuss in detail the new challenges and research perspectives that the approach brings forth.

  • Error Mining for Regular Expression Patterns. G. Castagna, D. Colazzo, and A. Frisch. Dans ICTCS 2005, Italian Conference on Theoretical Computer Science, n° 3701, Lecture Notes in Computer Science, pag. 160-172, Springer, 2005.
    Abstract:

    In the design of type systems for XML programming languages based on regular expression types and patterns the focus has been over result analysis, with the main aim of statically checking that a transformation always yields data of an expected output type. While being crucial for correct program composition, result analysis is not sufficient to guarantee that patterns used in the transformation are correct. In this paper we motivate the need of static detection of incorrect patterns, and provide a formal characterization based on pattern matching operational semantics, together with locally exact type analysis techniques to statically detect them.

  • Semantic subtyping for the pi-calculus. G. Castagna, R. De Nicola, and D. Varacca. LICS 2005, IEEE Symposium on Logic in Computer Science.
    Abstract:

    Subtyping relations for the pi-calculus are usually defined in a syntactic way, by means of structural rules. We propose a semantic characterisation of channel types and use it to derive a subtyping relation. The type system we consider includes read-only and write-only channel types, product types, recursive types, as well as unions, intersections, and negations of types which are interpreted as the corresponding set-theoretic operations. We prove the decidability of the subtyping relation, formally describe the subtyping algorithm, and use the techniques developped for the decidability of subtyping to prove the decidability of the typing relation.

    In order to fully exploit the expressiveness of the new type system (which subsumes several existing ones), we endow the pi-calculus with structured channels where communication is subjected to pattern matching that performs dynamic typecase. This paves the way toward a novel integration of functional and concurrent features, obtained by combining the pi-calculus with CDuce, a functional programming language for XML manipulation that is based on semantic subtyping.

  • Parametric polymorphism for XML. H. Hosoya, A. Frisch, and G. Castagna. In The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2005.
    Abstract:

    Although several type systems have been investigated for XML, parametric polymorphism is rarely treated. This well-established typing discipline can also be useful in XML processing in particular for programs involving "parametric schemas," i.e., schemas parameterized over other schemas (e.g., SOAP). The difficulty in treating polymorphism for XML lies in how to extend the "semantic" approach used in the mainstream (monomorphic) XML type systems. A naive extension would be "semantic" quantification over all substitutions for type variables. However, this approach reduces to an NEXPTIME-complete problem for which no practical algorithm is known, and, moreover, the type system obtained in this way is characterized by a hardly useful expressiveness. In this paper, we propose a lighter-weight way of extending the semantic approach, where we interpret type variables as markings in data values for indicating the parameterized subparts. As a result, we can construct a sensible polymorphic type system both with a semantic flavor and a set of practical algorithms needed for typechecking. Most of these algorithms can be obtained by local modifications to existing ones for a monomorphic system.

  • A Full Pattern-based Paradigm for XML Query Processing. V. Benzaken, G. Castagna, and C. Miachon. Proceedings of the 7th International Symposium on Practical Aspects of Declarative Languages, LNCS 3350, 2005.
    Abstract:

    In this article we investigate a novel execution paradigm---ML-like pattern-matching---for XML query processing and use it to extend the CDuce language with query capabilities. We show that such a paradigm is well adapted for a common and frequent set of queries and advocate that it constitutes a candidate for efficient execution of XML queries far better than the current XPath-based query mechanisms. We support our claim by comparing performances of XPath-based queries with pattern based ones, and by comparing the latter with the two efficiency-best XQuery processor we are aware of.

  • Schema-Guided Induction of Monadic Queries. Jérôme Champavère, Rémi Gilleron, Aurélien Lemay, and Joachim Niehren. 9th International Colloquium on Grammatical Inference.
    Abstract:
  • Classes of Tree Homomorphisms with Decidable Preservation of Regularity. Guillem Godoy, Sebastian Maneth, and Sophie Tison. Eleventh International Conference on Foundations of Software Science and Computation Structures.
    Abstract:
  • On Complexity of Model-Checking for the TQL Logic. Iovka Boneva, and Jean-Marc Talbot. 3rd IFIP International Conference on Theoretical Computer Science.
    Abstract:

    In this paper we study the complexity of the model-checking problem for the tree logic introduced as the basis for the query language TQL [Cardelli, Ghelli 01: Query Language Based on the Ambient Logic]. We define two distinct fragments of this logic: TL containing only spatial connectives and TLe containing spatial connectives and quantification. We show that the combined complexity of TL is PSPACE-hard. We also study data complexity of model-checking and show that it is linear for TL, hard for all levels of the polynomial hierarchy for TLe and PSPACE-hard for the full logic. Finally we devise a polynomial space model-checking algorithm showing this way that the model-checking problem for the TQL logic is PSPACE-complete.

  • Expressiveness of a spatial logic for trees. Iovka Boneva, Jean-Marc Talbot, and Sophie Tison. 20th Annual IEEE Symposium on Logic in Computer Science.
    Abstract:

    In this paper we investigate the quantifier-free fragment of the TQL logic proposed by Cardelli and Ghelli. The TQL logic, inspired from the ambient logic, is the core of a query language for semistructured data represented as unranked and unordered trees. The fragment we consider here, named STL, contains as main features spatial composition and location as well as a fixed point construct. We prove that satisfiability for STL is undecidable.We show also that STL is strictly more expressive that the Presburger monadic second-order logic (PMSO) of Seidl, Schwentick and Muscholl when interpreted over unranked and unordered edge-labeled trees. We define a class of tree automata whose transitions are conditioned by arithmetical constraints; we show then how to compute from a closed STL formula a tree automaton accepting precisely the models of the formula. Finally, still using our tree automata framework, we exhibit some syntactic restrictions over STL formulae that allow us to capture precisely the logics MSO and PMSO.

  • Habilitation. Model-checking pour les ambients : des algèbres de processus aux données semi-structurées. Jean-Marc Talbot. Habilitation à diriger les recherches. Université des Sciences et Technologies de Lille - Lille 1.
    Abstract:

    Le problème de model-checking pour une logique L et une classe de structures C est de savoir si une formule (close) donnée de L est vraie pour une certaine structure de C. Ce problème revêt différentes formes selon la thématique dans laquelle il est étudié : dans le cadre de la vérification, il s'agit de tester si un système (ou une abstraction de celui-ci) vérifie une spécification logique. Dans le cadre des bases de données relationnelles ou semi-structurées, ce problème, appelé également requêtes booléennes, consiste à vérifier que la base de données satisfait un certain nombre de contraintes, comme, par exemple, la conformité vis-à-vis d'un schéma XML. Dans tous les cas, lorsqu'on considère un problème de model-checking, la décidabilité de celui-ci et le cas échéant, sa complexité sont des questions cruciales. Dans ce mémoire, nous résumons nos activités de recherches sur le problème de model-checking, problème que nous avons étudié dans le cadre restreint du calcul des ambients. Cette étude a été effectuée avec une double approche, d'un cêté, dans l'optique programmation - vérification pour les algèbres de processus et de l'autre, dans celle de base de données - interrogation de données semi-structurées. Le calcul des ambients est une algèbre de processus visant à modéliser l'informatique mobile; ce calcul est basé sur une notion de domaines hiérarchiquement organisés, hiérarchie qui évolue au cours de l'exécution du système. Nous nous sommes intéressés au problème de vérification du calcul des ambients pour des spécifications écrites dans une logique dédiée appelée logique des ambients. Pour divers fragments du calcul et de la logique, nous avons établi des bornes fines concernant la décidabilité et la complexité du problème de model-checking. Nous montrons en particulier qu'en présence de réplication, le problème de model-checking est indécidable, y compris pour des petits fragments de la logique et du calcul. Débarrassés de leurs mécanismes d'exécution, les processus des ambients se résument à leurs structures hiérarchiques correspondant à des données arborescentes. Cette vision du calcul et de la logique associée a donné naissance au langage TQL (Tree Query Language) d'interrogation de données semi-structurées. Nous avons mené une étude formelle de cette logique TQL, en particulier concernant les problèmes de model-checking et de satisfiabilité (l'existence d'une structure de la classe C qui est modèle d'une formule donnée de la logique L). Nous avons également mesuré l'expressivité de cette logique en la comparant notamment à la logique monadique du second-ordre et à certaines de ses extensions. Ces travaux ont été menés au sein de l'équipe STC du LIFL. De plus, les recherches sur la validation et l'interrogation de données semi-structurées s'inscrivent dans la thématique du projet MOSTRARE de l'INRIA Futurs.

  • Automata and Logics for Unranked and Unordered Trees. Iovka Boneva, and Jean-Marc Talbot. 20th International Conference on Rewriting Techniques and Applications.
    Abstract:

    In this paper, we consider the monadic second order logic (MSO) and two of its extensions, namely Counting MSO (CMSO) and Presburger MSO (PMSO), interpreted over unranked and unordered trees. We survey classes of tree automata introduced for the logics PMSO and CMSO as well as other related formalisms; we gather results from the literature and sometimes clarify or fill the remaining gaps between those various formalisms. Finally, we complete our study by adapting these classes of automata for capturing precisely the expressiveness of the logic MSO.

  • Requêtes Graphes. Denis Debarbieux. Journées sur les documents multistructurés.
    Abstract:
  • Interactive Learning of Node Selecting Tree Transducers. Julien Carme , Rémi Gilleron, Aurélien Lemay , and Joachim Niehren. ML.
    Abstract:

    We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished. In practice, users prefer to provide partial annotations. We resolve this by intelligent tree pruning heuristics. We introduce pruning NSTTs - a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987). We have implemented our interactive learning algorithm and integrated it into a visually interactive Web information extraction system -- called Squirrel -- by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction.

  • Path Constraints in Semi-Structured Data. Yves André, Anne-Cécile Caron, Denis Debarbieux, Yves Roos, and Sophie Tison. TCS.
    Abstract:

    We consider semistructured data as multi rooted edge-labeled directed graphs, and path inclusion constraints on these graphs. A path inclusion constraint $p \preceq q$ is satisfied by a semistructured data if any node reached by the regular query $p$ is also reached by the regular query $q$. In this paper, two problems are mainly studied: the implication problem and the problem of the existence of a finite exact model. \beginitemize \item We give a new decision algorithm for the implication problem of a constraint $p \preceq q$ by a set of bounded path constraints $p_i \preceq u_i$ where $p$, $q$, and the $p_i$'s are regular path expressions and the $u_i$'s are words, improving in this particular case, the more general algorithms of S.~Abiteboul and V.~Vianu, and N.~Alechina et al. In the case of a set of word equalities $u_i \equiv v_i$, we provide a more efficient decision algorithm for the implication of a word equality $u \equiv v$, improving the more general algorithm of P.~Buneman et al.. We prove that, in this case, implication for non-deterministic models is equivalent to implication for (complete) deterministic ones. \item We introduce the notion of exact model: an exact model of a set of path constraints $\cal C$ satisfies the constraint $p \preceq q$ if and only if this constraint is implied by $\cal C$. We prove that any set of constraints has an exact model and we give a decidable characterization of data which are exact models of bounded path inclusion constraints sets. \enditemize

  • Composing Monadic Queries in Trees. Emmanuel Filiot, Joachim Niehren, Jean-Marc Talbot, and Sophie Tison. PLAN-X International Workshop.
    Abstract:

    Node selection in trees is a fundamental operation to XML databases, programming languages, and information extraction. We propose a new class of querying languages to define n-ary queries. The choice of the underlying monadic querying language is parametric. We show that compositions of monadic MSO-definable queries capture n-ary MSO-definable queries, and distinguish an MSO-complete n-ary query language that enjoys an efficient query answering algorithm.

  • Minimizing Tree Automata for Unranked Trees. Wim Martens, and Joachim Niehren. 10th International Symposium on Database Programming Languages.
    Abstract:

    Automata for unranked trees form a foundation for XML schemas, querying and pattern languages. We study the problem of efficiently minimizing such automata. We start with the unranked tree automata (UTAs) that are standard in database theory, assuming bottom-up determinism and that horizontal recursion is represented by deterministic finite automata. We show that minimal UTAs in that class are not unique and that minimization is NP-hard. We then study more recent automata classes that do allow for polynomial time minimization. Among those, we show that bottom-up deterministic stepwise tree automata yield the most succinct representations.

  • N-ary Queries by Tree Automata . Joachim Niehren, Laurent Planque, Jean-Marc Talbot, and Sophie Tison. 10th International Symposium on Database Programming Languages.
    Abstract:

    We investigate n-ary node selection queries in trees by successful runs of tree automata. We show that run-based n-ary queries capture MSO, contribute algorithms for enumerating answers of n-ary queries, and study the complexity of the problem. We investigate the subclass of run-based n-ary queries by unambiguous tree automata. Keywords: XML databases, logic, automata.

  • Indexes and path constraints in semistructured data. Yves Andre, Anne-Cecile Caron, Denis Debarbieux, and Yves Roos. DEXA Workshop on Logical Aspects and Applications of Integrity Constraints.
    Abstract:

    In this paper, we study semistructured data and indexes preserving inclusion constraints. A semistructured datum is modelled by multi-rooted edge-labeled directed graphs. We consider regular path queries and inclusion constraints other these data. These constraints are binary relations over regular path expressions q and r, and are interpreted on a datum as ``for this datum, the answer to query q is included in the answer to query r''. We study how to represent inclusion constraints that are common to several data. Our work is based on two existing indexes: dataguide and 1-index. Given a set of data S, we extract from the dataguide of S a finite set C(S) of (finite) inclusion constraints such that an inclusion constraint is satisfied by a datum of S if and only if it is implied by C(S). We use 1-index which are covering indexes preserving inclusion constraints. Experiments compare the different ways of using the 1-index to index a set of data.

  • Extraction and Implication of Path Constraints. Yves André, Anne-Cécile Caron, Denis Debarbieux, Yves Roos, and Sophie Tison. 29th Symposium on Mathematical Foundations of Computer Science.
    Abstract:

    We consider semistructured data as rooted edge-labeled directed graphs, and path inclusion constraints on these graphs. In this paper, we give a new decision algorithm for the implication problem of a constraint $p \preceq q$ by a set of constraints $p_i \preceq u_i$ where $p$, $q$, and the $p_i$'s are regular path expressions and the $u_i$'s are non-empty paths, improving in this particular case, the more general algorithms of S. Abiteboul and V. Vianu, and Alechina et al. Moreover, in the case of a set of word equalities $u_i \equiv v_i$, we give a more efficient decision algorithm for the implication of a word equality $u \equiv v$, improving the more general algorithm of P. Buneman et al., and we prove that, in this case, the implication problem for non-deterministic models or for (complete) deterministic models are equivalent.

  • Observational Semantics for a Concurrent Lambda Calculus with Reference Cells and Futures. Joachim Niehren , David Sabel , Manfred Schmidt-Schauß , and Jan Schwinghammer. 23rd Conference on Mathematical Foundations of Programming Semantics.
    Abstract:

    We present an observational semantics for lambda(fut), a concurrent lambda calculus with reference cells and futures. The calculus lambda(fut) models the operational semantics of the concurrent higher-order programming language Alice ML. Our result is a powerful notion of equivalence that is the coarsest nontrivial congruence distinguishing observably different processes. It justifies a maximal set of correct program transformations, and it includes all of lambda(fut)'s deterministic reduction rules, in particular, call-by-value beta reduction.

  • Satisfiability of a Spatial Logic with Tree Variables. Emmanuel Filiot , Jean-Marc Talbot , and Sophie Tison. 16th EACSL Annual Conference on Computer Science and Logic.
    Abstract:

    We investigate in this paper the spatial logic TQL for querying semi-structured data, represented as unranked ordered trees over an infinite alphabet. This logic consists of usual Boolean connectives, spatial connectives (derived from the constructors of a tree algebra), tree variables and a fixpoint operator for recursion. Motivated by XML-oriented tasks, we investigate the guarded TQL fragment. We prove that for closed formulas this fragment is MSO-complete. In presence of tree variables, this fragment is strictly more expressive than MSO as it allows for tree (dis)equality tests, i.e. testing whether two subtrees are isomorphic or not. We devise a new class of tree automata, called TAGED, which extends tree automata with global equality and disequality constraints. We show that the satisfiability problem for guarded TQL formulas reduces to emptiness of TAGED. Then, we focus on bounded TQL formulas: intuitively, a formula is bounded if for any tree, the number of its positions where a subtree is captured by a variable is bounded. We prove this fragment to correspond with a subclass of TAGED, called bounded TAGED, for which we prove emptiness to be decidable. This implies the decidability of the bounded guarded TQL fragment. Finally, we compare bounded TAGED to a fragment of MSO extended with subtree isomorphism tests. We investigate in this paper the spatial logic TQL for querying semi-structured data, represented as unranked ordered trees over an infinite alphabet. This logic consists of usual Boolean connectives, spatial connectives (derived from the constructors of a tree algebra), tree variables and a fixpoint operator for recursion.

  • Polynomial Time Fragments of XPath with Variables. Emmanuel Filiot , Joachim Niehren , Jean-Marc Talbot , and Sophie Tison. 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.
    Abstract:

    Variables in XPath 2.0 are fundamental for selecting n-tuples of nodes in trees. The navigational core of XPath 2.0 is known to capture first-order (FO) logic while being PSPACE complete with respect to model checking. In this paper, we distinguish a fragment of Core XPath 2.0 that we call the polynomial-time path language (PPL). We show that PPL remains FO-complete even though enjoying polynomial time query answering (and thus model checking).

  • BiRFSA languages and minimal NFAs. Michel Latteux, Yves Roos, and Alain Terlutte. to appear in RAIRO ITA.
    Abstract:

    In this paper, we define the notion of biRFSA which is a residual finate state automaton (RFSA) whose the reverse is also an RFSA. The languages recognized by such automata are called biRFSA languages. We prove that the canonical RFSA of a biRFSA language is a minimal NFA for this language and that each minimal NFA for this language is a sub-automaton of the canonical RFSA. This leads to a characterization of the family of biRFSA languages. In the second part of this paper, we define the family of biseparable automata. We prove that every biseparable NFA is uniquely minimal among all NFAs recognizing a same language, improving the result of H. Tamm and E. Ukkonen for bideterministic automata.

  • When Ambients Cannot be Opened. Iovka Boneva, and Jean-Marc Talbot. TCS.
    Abstract:

    We investigate expressiveness of a fragment of the ambient calculus, a formalism for describing distributed and mobile computations. More precisely, we study expressiveness of the pure and public ambient calculus from which the capability open has been removed, in terms of the reachability problem of the reduction relation. Surprisingly, we show that even for this very restricted fragment, the reachability problem is not decidable. At a second step, for a slightly weaker reduction relation, we prove that reachability can be decided by reducing this problem to markings reachability for Petri nets. Finally, we show that the name-convergence problem as well as the model-checking problem turn out to be undecidable for both the original and the weaker reduction relation.

  • A Concurrent Lambda Calculus with Futures. Joachim Niehren, Jan Schwinghammer, and Gert Smolka. TCS.
    Abstract:

    We introduce a new concurrent lambda calculus with futures, lambda(fut), to model the operational semantics of Alice, a concurrent extension of ML. lambda(fut) is a minimalist extension of the call-by-value lambda-calculus that yields the full expressiveness to define, combine, and implement a variety of standard concurrency constructs such as channels, semaphores, and ports. We present a linear type system for lambda(fut) by which the safety of such definitions and their combinations can be proved: Well-typed implementations cannot be corrupted in any well-typed context.

  • On the Minimization of XML Schemas and Tree Automata for Unranked Trees. Wim Martens, and Joachim Niehren. JCSS.
    Abstract:

    Automata for unranked trees form a foundation for XML schemas, querying and pattern languages. We study the problem of efficiently minimizing such automata. First, we study unranked tree automata that are standard in database theory, assuming bottom-up determinism and that horizontal recursion is represented by deterministic finite automata. We show that minimal automata in that class are not unique and that minimization is NP complete. Second, we study more recent automata classes that do allow for polynomial time minimization. Among those, we show that bottom-up deterministic stepwise tree automata yield the most succinct representations. Third, we investigate abstractions of ML schema languages. In particular, we show that the class of one-pass preorder typable schemas allows for polynomial time minimization and unique minimal models.

  • Logics for unranked and unordered trees and their use for querying semistructured data. Iovka Boneva. PhD thesis. Université des Sciences et Technologies de Lille - Lille 1.
    Abstract:

    Trees are said to be unordered if the order between successor nodes of a given node is not important. Unranked trees are trees which do not have a priori bound on the number of successors of each node of the tree. Unranked and unordered trees are a possible model for semistructured data. The advantage of this model is that it is less dependant on the representation of semistructured data as XML documents, and the drowback is that formalisms (tree automata, logics) for manipulating this kind of trees are generally related with worse complexity results compared with similar formalisms for unranked and ordered trees. Our aim is a theoretical study of some logics for unranked and unordered trees, espetially from the point of view of using it as query languages for semistructured data. Thus, we are interested in the model checking problem and its complexity, in the satisfiability problem and in caracterising the expressive power of these logics. The model checking problem is, given a tree and a logic formula, to decide whether the trees satisfies the formula; model checking is the mecanism used for query evaluation. The satisfiability problem is, given a logic formula, to decide whether this formula admits some models; satisfiability can be used for query optimisation. Finally, caracterisation of the expressive power of a logic is a caracterisation of the kind of properties it can express. We start from a spatial logic which has been introduced as the basis of a query language for semistructured data. This logic is quite expressive and has several kinds of operators. We consider different fragments of this logic. We identify several interesting fragments of the logic for which the satisfiability problem is decidable, when this problem is known to be undecidable for other fragments. We compare fragments of the logic with monadic second order logic on trees and extensions of it. Finally, studying the model checking problem and its complexity allowed us to propose a model checking algorithm and to establish several results on theoretic complexity of model checking for several fragments of the spatial logic.

  • Complexity of Earliest Query Answering with Streaming Tree Automata. Olivier Gauwin, Anne-Cécile Caron, Joachim Niehren, and Sophie Tison. ACM SIGPLAN Workshop on Programming Language Techniques for XML (PLAN-X). PLAN-X Workshop of ACM POPL.
    Abstract:

    We investigate the complexity of earliest query answering for n-ary node selection queries defined by streaming tree automata (STAs). We elaborate an algorithm that selects query answers upon reception of the shortest relevant prefix of the input tree on the stream. For queries defined by deterministic STAs, our algorithm runs in polynomial time combined complexity. In the general case, we show that earliest query answering is DEXPTIME-hard (even for n=0) and thus DEXPTIME-complete.

  • Monotone AC-Tree Automata. Hitoshi Ohsaki, Jean-Marc Talbot, Sophie Tison, and Yves Roos. 12th International Conference on Logic for Programming Artificial Intelligence and Reasoning.
    Abstract:

    This paper considers several questions about monotone AC-tree automata, a class of equational tree automata whose transition rules correspond to rules in Kuroda normal form of context-sensitive grammars. Whereas it has been proved that this class has a decision procedure to determine if, given a monotone AC-tree automaton, it accepts no terms, other important decidability or complexity results have not been well-investigated yet. In the paper, it is proved that the membership problem for monotone AC-tree automata is PSPACE-complete. The expressiveness of monotone AC-tree automata is also studied: precisely, it is proved that the family of AC-regular tree languages is strictly subsumed in that of AC-monotone tree languages. The proof technique used in obtaining the above result yields the answers to two different questions, specifically that the family of monotone AC-tree languages is not closed under complementation, and that the inclusion problem for monotone AC-tree automata is undecidable.

  • Efficient Inclusion Checking for Deterministic Tree Automata and DTDs. Jérôme Champavère, Rémi Gilleron, Aurélien Lemay, and Joachim Niehren. 2nd International Conference on Language and Automata Theory and Applications. To appear.
    Abstract:

    We present a new algorithm for testing language inclusion L(A) subset L(B) between tree automata in time O(|A|*|B|) where B is deterministic. We extend this algorithm for testing inclusion between automata for unranked trees A and deterministic DTDs D in time O(|A|*|Sigma|*|D|). No previous algorithms with these complexities exist.

  • Semantic Subtyping: Dealing set-theoretically with function, union, intersection, and negation types. Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. JACM.
    Abstract:
  • XML Access Modules: Towards Physical Data Independence in XML Databases. A. Arion, V. Benzaken, and I. Manolescu. 2nd International Workshop on XQuery Implementation, Experience, and Perspectives (XIME-P), 2005
    Abstract:

    We introduce XML Access Modules (XAMs), a step towards physical data independence in XDBMSs. A XAM describes, in an algebraic-style formalism, the information contained in a persistent XML storage structure, which may be a storage module, an index, or a materialized view. The set of XAMs describing the storage is used by the optimizer to build data access plans. Using XAMs, a change to the storage (adding or removing a storage structure) is communicated to the optimizer simply by updating the XAM set. One of the most useful XAM features is the ability to model indexes whose ``keys'' and ``values'' may be complex combinations of XML structure and values. We present the algebraic XAM foundations, with a particular focus on its support for indexes.

  • ULoad: Choosing the Right Storage for your XML Application. A. Arion, V. Benzaken, and I. Manolescu. Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), 2005.
    Abstract:

    We demonstrate ULoad, an XML storage tuning tool, which is a step towards achieving physical data independence for XML. ULoad is meant to help the database administrator (DBA) in choosing the persistent XML storage and indexing modules best suited for a given dataset, and workload, thus achieving our flexibility and extensibility requirements. Given a document to store, and a query workload, ULoad: allows the DBA to choose, customize, and apply some storage and indexing models, picked among a large set of existing ones; lets the DBA define her own specialized persistent structures; presents to the DBA the query execution plans (QEPs) for the workload queries, on the storage model chosen, and their costs; and may propose a storage adapted to the data and workload, based on a cost-driven search. At the core of ULoad lies a novel algebraic formalism (with a simple graphical representation), called XML Access Modules (XAMs), describing the information contained in a persistent XML storage structure.

  • XML Query Processing: Storage and Query Model Interplay. I. Manolescu. Tutorial at the EDBT Summer School, 2004. Also presented at the SBBD Conference, 2004.
    Abstract:
  • Path Sequences-Based XML Query Processing. I. Manolescu, A. Arion, A. Bonifati, and A. Pugliese. Presented at Journées de Bases de Données Avancées (BDA) 2004 (informal proceedings)
    Abstract: We present the path sequence storage model, a new logical model for storing XML documents. This model partitions XML data and content according to the document paths; and uses ordered sequences as logical and physical structures. Besides being very compact, this model has a strong impact on the efficiency of XQuery evaluation. Its main advantages are: fast and selective data access plans; intelligent combination of such plans, based on a summary of the document paths; and intelligent usage of a concise path summary. The model is implemented in the GeX system. We describe our physical storage scheme, query processing and optimization techniques. We demonstrate the performance advantages or our approach through a series of experiments.

Slides

Documents

  • Projet ACI- Masse de données. (Descriptif complet du projet).
    Abstract:

    Ce document donne une description complète et détaillé des aspects scientifiques et administratifs du projet

  • Projet ACI- Masse de données. (Rapport final du projet).
    Abstract:

    Ce document donne une description complète et détaillé des resultats obtenus par le projet

  • Projet ACI- Masse de données. (Évaluation du projet).
    Abstract:

    Ce document contient l'évaluation finale du projet

All pages of this site were automatically generated from an XML description of the content by the following CDuce program.

Powered by CDuce