Bases de Données / Databases

Site Web de l'équipe BD du LIP6 / LIP6 DB Web Site

Outils pour utilisateurs

Outils du site


site:recherche:logiciels:rdfdist

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
site:recherche:logiciels:rdfdist [11/05/2015 14:01]
amine
site:recherche:logiciels:rdfdist [26/04/2017 13:06] (Version actuelle)
amann
Ligne 1: Ligne 1:
-===== RDFdist =====+{{indexmenu_n>​1}} 
 + 
 +===== RDFdist ​:  RDF distribution approaches using Spark =====
  
 This wiki page provides information about the experiments RDF distribution approaches using [[http://​spark.apache.org/​|Spark]]. This wiki page provides information about the experiments RDF distribution approaches using [[http://​spark.apache.org/​|Spark]].
Ligne 11: Ligne 13:
 We also created two queries for the [[https://​www.wikidata.org/​wiki/​Wikidata:​Main_Page|Wikidata]] dataset which are referred to as Query 5 and Query 6. We also created two queries for the [[https://​www.wikidata.org/​wiki/​Wikidata:​Main_Page|Wikidata]] dataset which are referred to as Query 5 and Query 6.
  
-===Query 1 (synthetic, ​LUMB)===+===Query 1 (synthetic, ​LUBM)===
 <​code>​ <​code>​
 SELECT ?x ?y ?z  SELECT ?x ?y ?z 
Ligne 108: Ligne 110:
  
 val t1 = java.lang.System.currentTimeMillis();​ val t1 = java.lang.System.currentTimeMillis();​
 +
 +
 +/**
 +* set inputData with the path to the data encoded as quadruples (see Datasets excerpts)
 +*/
  
 // loading and transformating the dataset // loading and transformating the dataset
Ligne 149: Ligne 156:
 val takco : Long = 1115684864 val takco : Long = 1115684864
  
-def ajout(a : ListBuffer[(Long,​ Long, Long)], e: (Long, Long, Long) ) : ListBuffer[(Long,​ Long, Long)] = { 
-  a += e 
-  return a 
-} 
  
     // -----------------------------------------------------------     // -----------------------------------------------------------
Ligne 260: Ligne 263:
 ====Graph partitioning-based approaches==== ====Graph partitioning-based approaches====
  
 +===Huang Approach ===
 +<​code>​
 +import org.apache.spark.HashPartitioner
 +import scala.collection.mutable.ListBuffer
 +
 +val folder= ​ "​lubm"​ //"​watdiv" ​
 +val dataset= "​univ"​ //"​watdiv" ​
 +val scale="​1k"​
 +val part=20 //10, 20
 +
 +val folderName = folder +scale
 +val fileName = dataset+scale+"​_encoded_unique_quads.part."​+part+"​.2hop"​
 +
 +val t1 = java.lang.System.currentTimeMillis();​
 +
 +val quads_I_SPO = sc.textFile(s"/​user/​olivier/​${folderName}/​${fileName}"​).coalesce(part).map(x=>​x.split(","​)).map(t=>​(t(3).replace("​)",""​).toLong,​ (t(0).replace("​(",""​).toLong,​t(1).toLong,​t(2).toLong)))
 +
 +val quadsDist = quads_I_SPO.partitionBy(new HashPartitioner(part)).persist
 +
 +
 +
 +val t2 = java.lang.System.currentTimeMillis();​
 +
 +print("​Loading time of quads : "​+(t2-t1)/​1000 +" sec")
 +
 +    val advisor : Long = 1233125376
 +    val worksFor : Long = 1136656384
 +    val suborg : Long = 1224736768
 +    val memof : Long = 113246208
 +    val undeg : Long = 1101004800
 +    val teaof : Long = 1199570944
 +    val takco : Long = 1115684864
 +
 +
 +    // -----------------------------------------------------------
 +    // Query 1 : (not part of the benchmark)
 +    // Pattern: (x advisor y) (y worksFor z) (z subOrganisation t)
 +    // -----------------------------------------------------------
 +
 +var t1 = java.lang.System.currentTimeMillis();​
 +
 +var pataws = quadsDist.filter({case(i,​(s,​p,​o)) => p==advisor}).map({case(i,​(s,​p,​o)) => (o,s)}).
 +    join(quadsDist.filter({case(i,​(s,​p,​o)) => p==worksFor}).map({case(i,​(s,​p,​o)) => (s,​o)}),​part).
 +    map({case (y,(x,z)) => (z,​(x,​y))}).
 +    join(quadsDist.filter({case(i,​(s,​p,​o)) => p==suborg}).map({case(i,​(s,​p,​o)) => (s,o)}), part).
 +    map({case (z,​((x,​y),​t)) => (x,y,z,t)})
 +
 +pataws.count
 +
 +var pataws2 = pataws.flatMap(x=>​x).distinct
 +
 +var t2= java.lang.System.currentTimeMillis();​
 +
 +println("​Processing Q1 "+ (t2 - t1) +" msec for "​+part+"​ partitions"​);​
 +
 +
 +    // -----------------------------------------------------------
 +    // LUBM 2 : MSU
 +    // Pattern: (x memberOf y) (y subOrg z) (x UndergraduateDegreeFrom z)
 +    // -----------------------------------------------------------
 +
 +var t1 = java.lang.System.currentTimeMillis();​
 +
 +//var pmemof = quadsDist.filter({case(i,​(s,​p,​o)) => p==memof}).cache()
 +
 +var patmsu = quadsDist.filter({case(i,​(s,​p,​o)) => p==memof}).map({case(i,​(s,​p,​o)) => (o,s)}).
 +             ​join(quadsDist.filter({case(i,​(s,​p,​o)) => p==suborg}).map({case(i,​(s,​p,​o)) => (s,​o)}),​part).
 +             ​map({case (y,(x,z)) => (x+""​+z,​(x,​y,​z))}). ​
 +             ​join(quadsDist.filter({case(i,​(s,​p,​o)) => p==undeg}).map({case(i,​(x,​p,​z))=>​ (x+""​+z,​null)}))
 +
 +patmsu.count
 +
 +var patmsu2 = patmsu.flatMap(identity).distinct
 +
 +var t2= java.lang.System.currentTimeMillis();​
 +
 +println("​Processing Q2 "+ (t2 - t1) +" msec for "​+part+"​ partitions"​);​
 +
 +    // -----------------------------------------------------------
 +    // LUBM 9 : ATT
 +    // Pattern: (x advisor y) (y teacherOf z) (x takesCourse z)
 +    // -----------------------------------------------------------
 +
 +var t1 = java.lang.System.currentTimeMillis();​
 +
 +var patatt = quadsDist.filter({case(i,​(s,​p,​o)) => p==advisor}).map({case(i,​(s,​p,​o)) => (o,s)}).
 +              join(quadsDist.filter({case(i,​(s,​p,​o)) => p==teaof}).map({case(i,​(s,​p,​o)) => (s,o)}), part).
 +              map({case (y,(x,z)) => (x+""​+z,​(x,​y,​z))}).
 +              join(quadsDist.filter({case(i,​(s,​p,​o)) => p==takco}).map({case(i,​(s,​p,​o))=>​ (s+""​+o,​null)}),​ part)
 +
 +patatt.distinct.count
 +
 +var t2= java.lang.System.currentTimeMillis();​
 +
 +println("​Processing Q3 (LUBM #9) "+ (t2 - t1) +" msec for "​+part+"​ partitions"​);​
 +
 +
 +    // -----------------------------------------------------------
 +    // LUBM 12 : WS
 +    // Pattern: (y worksFor z) (z subOrganisation t)
 +    // -----------------------------------------------------------
 +
 +val t1 = java.lang.System.currentTimeMillis();​
 +
 +val patws = quadsDist.filter({case(i,​(s,​p,​o)) => p==worksFor}).map({case(i,​(s,​p,​o)) => ((i,​o),​s)}).join(quadsDist.filter({case(i,​(s,​p,​o)) => p==suborg}).map({case(i,​(s,​p,​o)) => ((i,​s),​o)}),​part).map({case ((i,​k),​(s,​o)) => (s,o)})
 +
 +val ans_patws = patws.distinct.count
 +
 +val t2= java.lang.System.currentTimeMillis();​
 +
 +println("​Processing LUBM #12 "+ (t2 - t1) +" msec for "​+part+"​ partitions"​);​
 +</​code>​
 +
 +===Warp===
 <​code>​ <​code>​
 // Spark implementation of WARP replication // Spark implementation of WARP replication
Ligne 289: Ligne 406:
 val folderName = folder + scale val folderName = folder + scale
 val fileName = dataset + scale val fileName = dataset + scale
 +/**
 +* set inputData with the path to the data encoded as quadruples (see Datasets excerpts)
 +*/
 val inputData = s"/​user/​olivier/​${folderName}/​${fileName}_encoded_unique_quads.part.${machine}"​ val inputData = s"/​user/​olivier/​${folderName}/​${fileName}_encoded_unique_quads.part.${machine}"​
  
Ligne 616: Ligne 736:
  
 </​code>​ </​code>​
 +===2-hop based approach===
 +<​code>​
 +val folder= ​ "​lubm"​
 +val dataset= "​univ"​
 +val scale="​10k"​
  
 +val folderName = folder +scale
 +val part = Array(5,​10,​20)
 +
 +for (p <- part)
 +{
 +val fileName = dataset+scale+"​_encoded_unique_quads.part."​+p
 +val fileNamewatdiv2k_encoded_unique_quads.partNew.5
 +val t1 = java.lang.System.currentTimeMillis();​
 +
 +val quads = sc.textFile(s"/​user/​olivier/​${folderName}/​${fileName}"​).map(x=>​x.split(","​)).map(t=>​(t(0).replace("​(",""​).toLong,​t(1).toLong,​t(2).toLong,​t(3).replace("​)",""​).toLong))
 +
 +var addOneHop = quads.map({case(s,​p,​o,​i)=>​(o,​i)}).join(quads.map({case(s,​p,​o,​i)=>​(s,​(p,​o,​i))})).filter({case(termS,​(i1,​(p,​o,​i2)))=>​i1!=i2}).distinct.map({case(termS,​(i1,​(p,​o,​i2)))=>​(termS,​p,​o,​i1)})
 +
 +val newQuads = quads.union(addOneHop).distinct
 +val newQuadsSize = newQuads.count
 +
 +val t2 = java.lang.System.currentTimeMillis();​
 +val hopSize = addOneHop.count
 +println(s"​Time to compute one more hop on $folderName for $p partitions is ${t2-t1}"​)
 +println(s"​ new size = $newQuadsSize , added $hopSize"​)
 +newQuads.saveAsTextFile(s"/​user/​olivier/​${folderName}/​${fileName}.2hop"​)
 +
 +</​code>​
 ====Datasets excerpts==== ====Datasets excerpts====
 +===Encoding of LUBM concepts and properties===
 +
 +<​code>​
 +Properties:
 +0 <​http://​www.w3.org/​1999/​02/​22-rdf-syntax-ns#​type>​
 +603979776 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​officeNumber>​
 +671088640 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​name>​
 +738197504 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​title>​
 +805306368 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​age>​
 +872415232 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​telephone>​
 +939524096 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​emailAddress>​
 +1006632960 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​researchInterest>​
 +1082130432 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​researchProject>​
 +1090519040 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​hasAlumnus>​
 +1098907648 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​degreeFrom>​
 +1101004800 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​undergraduateDegreeFrom>​
 +1103101952 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​mastersDegreeFrom>​
 +1105199104 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​doctoralDegreeFrom>​
 +1107296256 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​orgPublication>​
 +1115684864 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​takesCourse>​
 +1124073472 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​member>​
 +1132462080 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​memberOf>​
 +1136656384 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​worksFor>​
 +1138753536 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​headOf>​
 +1140850688 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​teachingAssistantOf>​
 +1149239296 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​listedCourse>​
 +1157627904 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​softwareDocumentation>​
 +1166016512 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​publicationAuthor>​
 +1174405120 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​softwareVersion>​
 +1182793728 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​affiliateOf>​
 +1191182336 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​tenured>​
 +1199570944 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​teacherOf>​
 +1207959552 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​publicationDate>​
 +1216348160 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​affiliatedOrganizationOf>​
 +1224736768 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​subOrganizationOf>​
 +1233125376 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​advisor>​
 +1241513984 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​publicationResearch>​
 +
 +Concepts:
 +0 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Schedule>​
 +268435456 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Organization>​
 +301989888 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​College>​
 +335544320 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Department>​
 +369098752 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Institute>​
 +402653184 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​ResearchGroup>​
 +436207616 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Program>​
 +469762048 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​University>​
 +536870912 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Publication>​
 +570425344 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Software>​
 +603979776 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Book>​
 +637534208 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Specification>​
 +671088640 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Manual>​
 +704643072 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Article>​
 +713031680 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​TechnicalReport>​
 +721420288 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​ConferencePaper>​
 +729808896 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​JournalArticle>​
 +738197504 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​UnofficialPublication>​
 +805306368 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Person>​
 +872415232 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​TeachingAssistant>​
 +939524096 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Student>​
 +956301312 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​GraduateStudent>​
 +973078528 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​UndergraduateStudent>​
 +1006632960 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Employee>​
 +1015021568 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​ResearchAssistant>​
 +1023410176 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Director>​
 +1031798784 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​AdministrativeStaff>​
 +1033895936 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​SystemsStaff>​
 +1035993088 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​ClericalStaff>​
 +1040187392 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Faculty>​
 +1042284544 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​PostDoc>​
 +1044381696 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Professor>​
 +1044643840 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Chair>​
 +1044905984 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​VisitingProfessor>​
 +1045168128 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​AssociateProfessor>​
 +1045430272 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Dean>​
 +1045692416 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​FullProfessor>​
 +1045954560 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​AssistantProfessor>​
 +1046478848 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Lecturer>​
 +1073741824 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Work>​
 +1140850688 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Course>​
 +1174405120 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​GraduateCourse>​
 +1207959552 <​http://​www.univ-mlv.fr/​~ocure/​lubm.owl#​Research> ​
 +</​code>​
 +
 +===LUBM Univ1  ===
 +-[[http://​webia.lip6.fr/​~baazizi/​research/​iswc2015eval/​sources/​univ1_encoded_unique.id|encoded triples]](2.1MB)
 +
 +-[[http://​webia.lip6.fr/​~baazizi/​research/​iswc2015eval/​sources/​quads_plus_replicas.id|encoded quaruples with replication]](2.3MB)
 +
 +-[[http://​webia.lip6.fr/​~baazizi/​research/​iswc2015eval/​sources/​quads.id|replicated quaruples]](0.3MB)
 +
 +
  
 +-[[http://​webia.lip6.fr/​~baazizi/​research/​iswc2015eval/​sources/​univ1.nt|univ1.nt]] (16.3MB)
site/recherche/logiciels/rdfdist.1431345688.txt.gz · Dernière modification: 11/05/2015 14:01 par amine