Bases de Données / Databases

Site Web de l'équipe BD du LIP6 / LIP6 DB Web Site

Outils pour utilisateurs

Outils du site


roses:resultats

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

roses:resultats [30/03/2015 15:31]
127.0.0.1 modification externe
roses:resultats [11/01/2022 11:37] (Version actuelle)
amann
Ligne 1: Ligne 1:
 +====== Résultats ======
 Web 2.0 technologies have transformed the Web from a publishing-only environment into a vibrant information place where yesterday'​s end users become nowdays content generators themselves. Web syndication formats such as RSS1 or Atom2 emerge as a popular means for timely delivery of frequently updated Web content. Information publishers provide brief summaries of the content they deliver on the Web, called information items, while information consumers subscribe to a number of RSS/Atom feeds (i.e., streams) and get informed about newly published items. Almost every personal weblog, news portal, or discussion forum employ now RSS/Atom feeds for enhancing traditional pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Note also, that social media applications such as Twitter and Facebook rely on RSS to notify users about the newly available posts of their preferred friends (or followees). Unfortunately,​ preliminary works on RSS/Atom statistical characteristics do not provide a precise and up-to-dated characterization of feeds' behavior and content which could be effectively used for tuning refreshing policies of RSS aggregators,​ benchmarking scalability and performance of RSS continuous monitoring and filtering mechanisms or evaluating various RSS item mining, recommendation,​ enrichment and archiving techniques. We have extracted statistics and presented the first large-scale analysis of three complementary RSS/Atom parameters: (a) feeds' publishing activity; (b) items' structure and length; (c) the vocabularies employed by their textual content. Our empirical study relies on a testbed acquired over several monthes (always growing on the web-site), but originaly over a 8 month period of 10,794,285 items belonging to 8,155 productive feeds (out of the 12,611 harvested ones) and it is made available on [[http://​deptmedia.cnam.fr/​~traversn/​roses/​|line]]. The main conclusions drawn from our experiments are: Web 2.0 technologies have transformed the Web from a publishing-only environment into a vibrant information place where yesterday'​s end users become nowdays content generators themselves. Web syndication formats such as RSS1 or Atom2 emerge as a popular means for timely delivery of frequently updated Web content. Information publishers provide brief summaries of the content they deliver on the Web, called information items, while information consumers subscribe to a number of RSS/Atom feeds (i.e., streams) and get informed about newly published items. Almost every personal weblog, news portal, or discussion forum employ now RSS/Atom feeds for enhancing traditional pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Note also, that social media applications such as Twitter and Facebook rely on RSS to notify users about the newly available posts of their preferred friends (or followees). Unfortunately,​ preliminary works on RSS/Atom statistical characteristics do not provide a precise and up-to-dated characterization of feeds' behavior and content which could be effectively used for tuning refreshing policies of RSS aggregators,​ benchmarking scalability and performance of RSS continuous monitoring and filtering mechanisms or evaluating various RSS item mining, recommendation,​ enrichment and archiving techniques. We have extracted statistics and presented the first large-scale analysis of three complementary RSS/Atom parameters: (a) feeds' publishing activity; (b) items' structure and length; (c) the vocabularies employed by their textual content. Our empirical study relies on a testbed acquired over several monthes (always growing on the web-site), but originaly over a 8 month period of 10,794,285 items belonging to 8,155 productive feeds (out of the 12,611 harvested ones) and it is made available on [[http://​deptmedia.cnam.fr/​~traversn/​roses/​|line]]. The main conclusions drawn from our experiments are:
  
roses/resultats.1427722315.txt.gz · Dernière modification: 30/03/2015 15:31 par 127.0.0.1