Table des matières

Sujet de thèse

Encadrants : Bernd AMANN et Dan VODISLAV

Ce sujet de thèse est proposé dans le cadre du Programme Pluriformation (PPF) Wisdom et sera encadré par Bernd AMANN (professeur LIP6) et Dan Vodislav (maître de conférences CNAM). Le financement demandé permettrait de financer 3 ans de thèse avec un budget total de XXX kE.

Sujet : Modèles et applications de syndication de données sur le web

PhD subject

Supervisors: Bernd AMANN and Dan VODISLAV

This PhD thesis is proposed in the context of the Wisdom PPF and will be supervised by Bernd AMANN (Professor at LIP6) and Dan VODISLAV (Assistant Professor at CNAM). The demanded funding would allow financing 3 years of PhD, for a total budget of 90 kE [check].

The subject mainly addresses the topics of workpackages 2 and 5, i.e. the problem of modeling RSS feeds as an extension of XML data with temporal, dynamic features and the problem of creating tools and applications based on this model.

Dan: participation to other WP

Subject: Models and applications for data syndication on the web

Context

In order to reduce the time interval necessary for an information published on a web site to reach the interested users, more and more web sites apply web syndication techniques for publishing their contents. These techniques consist in publishing new information in form of web feeds or blogs to interested users who actively subscribe to these blogs. They reduce the publication lag of web information and allow users to create their personal information space observing the evolution of well-defined information sources.

Whereas web content syndication can be considered as a new efficient way of sharing information on the web, it also suffers from well-known problems related to the large scale of the web. The number of web feeds and blogs is constantly growing which creates new issues in feed management and feed aggregation. Specialized web syndication portals like Blastfeed.com, Plazoo.com and Technorati.com try to solve some of these problems by collecting and aggregating web feed data. One goal of these portals is to index feed data (similar to search engines for standard web ressources) based on efficient refresh algorithms to reduce the publication lag mentioned before.

We propose to handle the new issues in RSS syndication by considering web content syndication as a large-scale distributed XML data management problem:

Goals and roadmap

The first goal is the define a formal XML-RSS data model and algebra combining the semantics of RSS, XML and RDF. In particular, the model should be able to represent

The starting point will be existing work on XML [ZPR02,JLS+01,FFM+00] and RDF algebras [FHVB02] and languages [KAC+02] for defining a new algebra taking also into account temporal properties and relationships of RSS metadata streams. This algebra will be the basis for the definition of an XML-RSS query language, as an extension of XQuery.

The second goal is the definition of a framework for creating applications based on the XML-RSS data model. For instance, we consider applications that produce new RSS feeds and XML data , by transforming input RSS feeds and XML data. In this content we consider two main issues:

References

[AAC+99] S. Abiteboul, B. Amann, S. Cluet, A. Eyal, L. Mignet, and T. Milo. Active views for electronic commerce. VLDB 1999.

[FFM+00] P. Fankhauser, M. Fernández, A. Malhotra, M. Rys, J. Siméon P. Wadler, The XML Query Algebra, W3C Working Draft 04 December 2000

[FHVB02] F. Frasincar, G. Houben, R. Vdovjak, P. Barna, RAL: an Algebra for Querying RDF, WISE’02

[JLS+01] H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava, et al., TAX: A Tree Algebra for XML, DBLP’01

[KAC+02] Gregory Karvounarakis, Sofia Alexaki, Vassilis Christophides, Dimitris Plexousakis, Michel Scholl: RQL: a declarative query language for RDF. WWW 2002: 592-603

[MAA+05] T. Milo, S. Abiteboul, B. Amann, O. Benjelloun, and F. Dang Ngoc. Exchanging intensional xml data. In SIGMOD, 2003. an extended version of this article has been published in ACM Transactions on Database Systems 30(1).

[PPV05] M. Petropoulos , Y. Papakonstantinou , V. Vassalos, Graphical query interfaces for semistructured data: the QURSED system, ACM Transactions on Internet Technology (TOIT), v.5 n.2, p.390-438, May 2005

[VCC+06] D. Vodislav, S. Cluet, G. Corona et I. Sebei. Views for simplifying access to heterogeneous XML data. In CoopIS, pp. 72-90, Springer, 2006.

[Yah07] Yahoo! pipes, http://pipes.yahoo.com

[ZPR02] X. Zhang, B. Pielech, E.A. Rundesnteiner, Honey, I shrunk the XQuery!: an XML algebra optimization approach, WIDM2002