Purely Relational XQuery Database-Supported XML Processors Torsten Grust Technische Universität München http://www-db.in.tum.de/~grust/ LUG Erding, Oct 2007 A Database Guy’s View of XML, XPath, and XQuery Prof. Dr. Torsten Grust 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. Prof. Dr. Torsten Grust 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. • Wait! Sizable data volume...? Use Relational Databases! Prof. Dr. Torsten Grust 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. • Wait! Sizable data volume...? Use Relational Databases! <?xml version=”1.0”?> <root> <elem att=”val”> <!-- comment --> text </elem> </root> Prof. Dr. Torsten Grust let $doc := doc(“in.xml”) for $t in $doc//text() return count($t/../comment()) 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. • Wait! Sizable data volume...? Use Relational Databases! <?xml version=”1.0”?> <root> <elem att=”val”> <!-- comment --> text </elem> </root> Prof. Dr. Torsten Grust let $doc := doc(“in.xml”) for $t in $doc//text() return count($t/../comment()) 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. • Wait! Sizable data volume...? Use Relational Databases! pre size level 1 1 10 2 1 20 3 1 30 ⋮ Prof. Dr. Torsten Grust ⋮ let $doc := doc(“in.xml”) for $t in $doc//text() return count($t/../comment()) ⋮ 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. • Wait! Sizable data volume...? Use Relational Databases! pre size level 1 1 10 2 1 20 3 1 30 ⋮ Prof. Dr. Torsten Grust ⋮ SELECT pre,size, ROW_NUMBER (…) FROM t000 WHERE value GROUP BY iter,… ⋮ 2 Technische Universität München A Database Guy’s View of XML, XPath, and XQuery • Build XML processors that do not stumble once XML input volumes become sizable. • Wait! Sizable data volume...? Use Relational Databases! F L E H S E OFF-TH pre size level 1 1 10 2 1 20 3 1 30 ⋮ Prof. Dr. Torsten Grust ⋮ SELECT pre,size, ROW_NUMBER (…) FROM t000 WHERE value GROUP BY iter,… ⋮ 2 Technische Universität München RDBMSs as Processors for Non-Relational Languages • Relational databases contain the best understood and most scalable query processors available today. • Can we use relational query engines as-is to efficiently process non-relational languages? Prof. Dr. Torsten Grust 3 Technische Universität München RDBMSs as Processors for Non-Relational Languages • Relational databases contain the best understood and most scalable query processors available today. • Can we use relational query engines as-is to efficiently process non-relational languages? X Prof. Dr. Torsten Grust L 3 X Technische Universität München RDBMSs as Processors for Non-Relational Languages • Relational databases contain the best understood and most scalable query processors available today. • Can we use relational query engines as-is to efficiently process non-relational languages? X L R X R-1 SQL / Relational Algebra Prof. Dr. Torsten Grust 3 Technische Universität München RDBMSs as Processors for Non-Relational Languages • Relational databases contain the best understood and most scalable query processors available today. • Can we use relational query engines as-is to efficiently process non-relational languages? XQuery R R-1 SQL / Relational Algebra Prof. Dr. Torsten Grust 3 Technische Universität München Purely Relational XQuery • Find a relational representation of XQuery (XML, XPath) that leverages the table-based processing model: • Exploit functionality already built into the database system (do not invade the database kernel): SQL idioms, OLAP extensions, partitioned B-Trees. • Run on top off-the-shelf relational database systems: Prof. Dr. Torsten Grust 4 Technische Universität München Purely Relational XQuery • Find a relational representation of XQuery (XML, XPath) that leverages the table-based processing model: • Exploit functionality already built into the database system (do not invade the database kernel): SQL idioms, OLAP extensions, partitioned B-Trees. • Run on top off-the-shelf relational database systems: DB2 (kxsystems) ® IBM DB2 V9 SQL Server 2005 PostgreSQL DB2 Version 9 Prof. Dr. Torsten Grust 4 kdb+ MonetDB Technische Universität München Pathfinder pathfinder-xquery.org Relational Database System (Kernel) Prof. Dr. Torsten Grust 5 Technische Universität München Pathfinder pathfinder-xquery.org Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding Relational Database System (Kernel) Prof. Dr. Torsten Grust 5 Technische Universität München Pathfinder pathfinder-xquery.org XQuery Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding Relational Database System (Kernel) Prof. Dr. Torsten Grust 5 Technische Universität München Pathfinder pathfinder-xquery.org XQuery Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding SQL Relational Database System (Kernel) Prof. Dr. Torsten Grust 5 Technische Universität München Pathfinder pathfinder-xquery.org XQuery Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding RA SQL Relational Database System (Kernel) Prof. Dr. Torsten Grust 5 Technische Universität München Pathfinder & DB2 DB2 Against 110+ MB of XML ® DB2 Version 9 for Linux, UNIX, an Prof. Dr. Torsten Grust 6 Technische Universität München Pathfinder & DB2 DB2 Against 110+ MB of XML ® DB2 Version 9 for Linux, UNIX, an Prof. Dr. Torsten Grust 6 Technische Universität München Slow Motion Replay XQuery Pathfinder SQL DB2 Prof. Dr. Torsten Grust ® DB2 Version 9 for Linux, UNIX, and Windows 7 Technische Universität München Slow Motion Replay XQuery XQuery (XMark Q8) let $a := doc("XMark-110mb.xml") return … Pathfinder <item person="{ $p/name/text() }"> { count($ca) } </item> SQL DB2 Prof. Dr. Torsten Grust ® DB2 Version 9 for Linux, UNIX, and Windows 7 Technische Universität München Slow Motion Replay XQuery SQL:1999 (no SQL/XML) Pathfinder SQL DB2 Prof. Dr. Torsten Grust ® DB2 Version 9 WITH t0000 (iter,pre) AS SELECT … FROM … WHERE … t0001 (iter) AS SELECT … FROM … WHERE … SELECT FROM WHERE ORDER BY pre,size,kind,… … … … for Linux, UNIX, and Windows 7 Technische Universität München Slow Motion Replay XQuery Pathfinder SQL DB2 Relational Encoding of Result ® pre size kind 5072509 2 1 5072510 0 2 5072511 0 3 ⋮ Prof. Dr. Torsten Grust DB2 Version 9 for Linux, UNIX, and Windows 7 Technische Universität München Slow Motion Replay XQuery Pathfinder SQL DB2 Serialized XML Text ® <?xml version="1.0" encoding="UTF-8"?> <XQuery> <item person=”Jixiang …”>0</item> <item person=”Harpreet …”>0</item> … </XQuery> Prof. Dr. Torsten Grust DB2 Version 9 for Linux, UNIX, and Windows 7 Technische Universität München Table Access Modes Prof. Dr. Torsten Grust 8 Technische Universität München Table Access Modes • Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. A Prof. Dr. Torsten Grust 8 B Technische Universität München Table Access Modes • Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. Prof. Dr. Torsten Grust 8 A B Technische Universität München Table Access Modes • Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. Prof. Dr. Torsten Grust 8 A B Technische Universität München Table Access Modes • Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 2. Index-based access: B-Trees, range scans. Prof. Dr. Torsten Grust 8 Technische Universität München Table Access Modes • Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 2. Index-based access: B-Trees, range scans. Prof. Dr. Torsten Grust 8 Technische Universität München Table Access Modes • Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 2. Index-based access: B-Trees, range scans. • But: We will benefit only if we actually operate the relational database in this bulk-oriented mode. Prof. Dr. Torsten Grust 8 Technische Universität München The Core of XQuery Prof. Dr. Torsten Grust 9 Technische Universität München The Core of XQuery (e1,..,en) Prof. Dr. Torsten Grust ordered item sequences 9 Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 Prof. Dr. Torsten Grust ordered item sequences iteration 9 Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 Prof. Dr. Torsten Grust ordered item sequences iteration variable binding 9 Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 if (e1) then e2 else e3 Prof. Dr. Torsten Grust ordered item sequences iteration variable binding conditionals 9 Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> Prof. Dr. Torsten Grust ordered item sequences iteration variable binding conditionals XML node construction 9 Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e1/child::t, e1/following::t Prof. Dr. Torsten Grust 9 ordered item sequences iteration variable binding conditionals XML node construction XPath location steps Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e1/child::t, e1/following::t unordered { e1 } Prof. Dr. Torsten Grust 9 ordered item sequences iteration variable binding conditionals XML node construction XPath location steps local order indifference Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e1/child::t, e1/following::t unordered { e1 } fn:doc(e1), fn:data(e1) ordered item sequences iteration variable binding conditionals XML node construction XPath location steps local order indifference built-in functions + Document order, node identity, dynamic typing, schema validation, user-defined functions, ... Prof. Dr. Torsten Grust 9 Technische Universität München The Core of XQuery (e1,..,en) for $x in e1 return e2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e1/child::t, e1/following::t unordered { e1 } fn:doc(e1), fn:data(e1) ordered item sequences iteration variable binding conditionals XML node construction XPath location steps local order indifference built-in functions + Document order, node identity, dynamic typing, schema validation, user-defined functions, ... • The XQuery–SQL gap appears substantial. Prof. Dr. Torsten Grust 9 Technische Universität München FP-Style Iteration Prof. Dr. Torsten Grust 10 Technische Universität München FP-Style Iteration • The XQuery for..in..return construct performs side-effect free iteration: for $x in (e1,..,en) return b Prof. Dr. Torsten Grust 10 Technische Universität München FP-Style Iteration • The XQuery for..in..return construct performs side-effect free iteration: for $x in (e1,..,en) return b ( b[e1/$x],..,b[en/$x] ) • Individual iterations cannot interfere and may be evaluated in any order (or in parallel). Prof. Dr. Torsten Grust 10 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. Prof. Dr. Torsten Grust 11 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 Prof. Dr. Torsten Grust 11 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 Prof. Dr. Torsten Grust 11 iter pos item 1 1 42 2 1 42 3 1 42 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 for $x in (10,20,30) return $x + 42 Prof. Dr. Torsten Grust 11 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 Prof. Dr. Torsten Grust 11 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 let $x := (10,20,30) return $x Prof. Dr. Torsten Grust 11 Technische Universität München • Loop Lifting “Relational loop unrolling”: For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 let $x := (10,20,30) return $x iter pos item 1 1 10 1 2 20 1 3 30 Prof. Dr. Torsten Grust 11 Technische Universität München Loop Lifting Prof. Dr. Torsten Grust 12 Technische Universität München Loop Lifting for $x in (10,20,30) return $x + 42 Prof. Dr. Torsten Grust 12 Technische Universität München Loop Lifting for $x in (10,20,30) return $x + 42 iter 1 2 3 pos item 1 10 1 20 1 30 Prof. Dr. Torsten Grust iter1 pos1 item1 1 1 42 2 1 42 3 1 42 12 Technische Universität München Loop Lifting for $x in (10,20,30) return $x + 42 iter 1 2 3 pos item 1 10 1 20 1 30 iter1 pos1 item1 1 1 42 2 1 42 3 1 42 ⋈ iter = iter1 + item2:(item,item1) π iter,pos,item2 Prof. Dr. Torsten Grust 12 Technische Universität München Loop Lifting for $x in (10,20,30) return $x + 42 iter 1 2 3 pos item 1 10 1 20 1 30 iter1 pos1 item1 1 1 42 2 1 42 3 1 42 ⋈ iter = iter1 iter 1 2 3 + item2:(item,item1) pos item iter1 pos1 item1 1 10 1 1 42 1 20 2 1 42 1 30 3 1 42 π iter,pos,item2 Prof. Dr. Torsten Grust 12 Technische Universität München Loop Lifting for $x in (10,20,30) return $x + 42 iter 1 2 3 pos item 1 10 1 20 1 30 iter1 pos1 item1 1 1 42 2 1 42 3 1 42 ⋈ iter = iter1 iter 1 2 3 + item2:(item,item1) pos item iter1 pos1 item1 1 10 1 1 42 1 20 2 1 42 1 30 3 1 42 item2 52 62 72 π iter,pos,item2 Prof. Dr. Torsten Grust 12 Technische Universität München Loop Lifting for $x in (10,20,30) return $x + 42 iter 1 2 3 pos item 1 10 1 20 1 30 iter1 pos1 item1 1 1 42 2 1 42 3 1 42 ⋈ iter = iter1 + item2:(item,item1) iter 1 2 3 π iter,pos,item2 Prof. Dr. Torsten Grust 12 pos item2 1 52 1 62 1 72 Technische Universität München Loop Lifting Prof. Dr. Torsten Grust 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x mod 2 = 0) then “even” else “odd” Prof. Dr. Torsten Grust 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x $x mod 2 = 0) 0 then “even” else “odd” iter 1 2 3 4 pos 1 1 1 1 item false true false true Prof. Dr. Torsten Grust 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x $x mod 2 = 0) 0 then “even” else “odd” σ item iter 1 2 3 4 pos 1 1 1 1 item false true false true Prof. Dr. Torsten Grust π iter × pos item 1 “even” ⋅ ⋃ pos item 1 “odd” σ ¬ item π iter × 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x $x mod 2 = 0) 0 then “even” else “odd” iter 2 4 pos item 1 true 1 true σ item iter 1 2 3 4 pos 1 1 1 1 item false true false true Prof. Dr. Torsten Grust π iter × pos item 1 “even” ⋅ ⋃ pos item 1 “odd” σ ¬ item π iter × 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x $x mod 2 = 0) 0 then “even” else “odd” iter 2 4 σ item iter 1 2 3 4 pos 1 1 1 1 item false true false true Prof. Dr. Torsten Grust π iter × pos item 1 “even” ⋅ ⋃ pos item 1 “odd” σ ¬ item π iter × 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x $x mod 2 = 0) 0 then “even” else “odd” iter 2 4 σ item iter 1 2 3 4 pos 1 1 1 1 item false true false true Prof. Dr. Torsten Grust π iter × pos item 1 “even” ⋅ ⋃ pos item 1 “odd” σ ¬ item pos item 1 “even” 1 “even” π iter × 13 Technische Universität München Loop Lifting for $x in (1,2,3,4) return if ($x $x mod 2 = 0) 0 then “even” else “odd” iter 1 σ item iter 1 2 3 4 pos 1 1 1 1 item false true false true Prof. Dr. Torsten Grust π iter × pos item 1 “even” π iter 3 1 4 1 “odd” “even” ⋅ ⋃ pos item 1 “odd” σ ¬ item 2 pos item 1 “odd” 1 “even” × 13 Technische Universität München Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra ⋅ ⋯ σ π ⋃ ⋈ Prof. Dr. Torsten Grust 14 Technische Universität München Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra ⋅ ⋯ σ π ⋃ ⋈ • Compiles into RA that mimics capabilites of SQL:1999 query engines. • RA dialect is ... 1. primitive 2. explicit 3. designed to be easily analyzable Prof. Dr. Torsten Grust 14 Technische Universität München Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra ⋅ ⋯ σ π ⋃ ⋈ Code Gen Code Gen Code Gen Prof. Dr. Torsten Grust • Compiles into RA that mimics capabilites of SQL:1999 query engines. • RA dialect is ... 1. primitive 2. explicit 3. designed to be easily analyzable 14 Technische Universität München Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra ⋅ ⋯ σ π ⋃ ⋈ Code Gen Code Gen Code Gen Compiles into RA that mimics capabilites of SQL:1999 query engines. • RA dialect is ... L MI Q S Q L 1. primitive 2. explicit 3. designed to be easily analyzable (kxsystems) Prof. Dr. Torsten Grust • 14 Technische Universität München Typical Plans ... Prof. Dr. Torsten Grust 15 Technische Universität München Typical Plans ... Prof. Dr. Torsten Grust 15 Technische Universität München Relational XQuery Optimization let $d := fn:doc(⋯) return for $a in $d//a for $b in $d//b where $b/@c = $a/@d return $b Prof. Dr. Torsten Grust 16 Technische Universität München Relational XQuery Optimization • We use concepts in the relational domain to analyze plans and to steer simplification and optimization. • Detect join-like XQuery expressions. Let XQuery’s syntactic diversity not affect the detection: let $d := fn:doc(⋯) return for $a in $d//a for $b in $d//b where $b/@c = $a/@d return $b Prof. Dr. Torsten Grust 16 Technische Universität München Relational XQuery Optimization • We use concepts in the relational domain to analyze plans and to steer simplification and optimization. • Detect join-like XQuery expressions. Let XQuery’s syntactic diversity not affect the detection: let $d := fn:doc(⋯) return $d//b[@c = $d//a/@c] Prof. Dr. Torsten Grust 16 Technische Universität München Relational XQuery Optimization • We use concepts in the relational domain to analyze plans and to steer simplification and optimization. • Detect join-like XQuery expressions. Let XQuery’s syntactic diversity not affect the detection: let $d := fn:doc(⋯) return $d//b[@c = $d//a/@c] • For both queries, the value-based XQuery join surfaces as a multi-valued dependency in the algebraic plans. Prof. Dr. Torsten Grust 16 Technische Universität München ¶ (iter, pos, item:res) Rewriting XMark Q8 NOT (res:<item>) @ (pos), val: #1 U @ (item), val: false DIFF @ (item), val: true DISTINCT ¶ (iter:outer) ¶ (iter) ROW# (pos1:<sort, pos>/outer) |X| (iter = inner) @ (item), val: 1 @ (pos), val: #1 ¶ (iter) SEL (item) ¶ (iter, pos, item:res) = (res:<item, item1>) |X| (iter = iter1) ¶ (iter, pos, item:cast) ¶ (iter1:iter, item1:cast) CAST (cast:<item>), type: str CAST (cast:<item>), type: str ¶ (iter:inner, pos, item) @ (pos), val: #1 |X| (iter = outer) ¶ (iter:inner, item) @ (pos), val: #1 ¶ (outer:iter, sort:pos, inner) NUMBER (inner) ¶ (iter:inner, item) ¶ (iter:outer, pos:pos1, item) ROW# (pos1:<sort, pos>/outer) |X| (iter = inner) ¶ (iter, pos, item:cast) CAST (cast:<item>), type: uA ¶ (outer:iter, sort:pos, inner) ¶ (iter, pos, item:res) access attribute value (res:<item>) @ (pos), val: #1 ¶ (iter:inner, item) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT ¶ (iter, item) ROW# (pos:<item>/iter) /| attribute::attribute id { atomic* } (iter, item) ¶ (iter:inner, item) |X| (iter = outer) ¶ (iter:inner, pos, item) ¶ (outer:iter, sort:pos, inner) NUMBER (inner) ¶ (iter:outer, pos:pos1, item) ROW# (pos1:<sort, pos>/outer) |X| (iter = inner) ¶ (iter, pos, item:cast) CAST (cast:<item>), type: uA ¶ (outer:iter, sort:pos, inner) ¶ (iter, pos, item:res) access attribute value (res:<item>) @ (pos), val: #1 ¶ (iter:inner, item) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT ¶ (iter, item) ROW# (pos:<item>/iter) /| attribute::attribute person { atomic* } (iter, item) ¶ (iter, item) ROW# (pos:<item>/iter) DISTINCT ¶ (iter, item) ROW# (pos:<item>/iter) /| child::element buyer { item* } (iter, item) ¶ (iter, item) @ (pos), val: #1 |X| (iter = outer) ¶ (iter:inner, item) ¶ (outer:iter, sort:pos, inner) ROW# (pos1:<sort, pos>/outer) NUMBER (inner) |X| (iter = inner) ROW# (pos:<item>/iter) ¶ (iter, pos, item:res) DISTINCT access textnode content (res:<item>) ¶ (outer:iter, sort:pos, inner) ¶ (iter, item) @ (pos), val: #1 ROW# (pos:<item>/iter) ¶ (iter:inner, item) /| child::element closed_auction { item* } (iter, item) NUMBER (inner) ¶ (iter, item) ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) DISTINCT DISTINCT ¶ (iter, item) ¶ (iter, item) ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) /| child::text (iter, item) /| child::element closed_auctions { item* } (iter, item) ¶ (iter, item) ¶ (iter, item) ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) DISTINCT DISTINCT ¶ (iter, item) ¶ (iter, item) ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) /| child::element name { item* } (iter, item) /| child::element site { item* } (iter, item) ¶ (iter, item) ¶ (iter:inner, item) @ (pos), val: #1 |X| (iter = outer) ¶ (iter:inner, item) ¶ (outer:iter, sort:pos, inner) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT ¶ (iter, item) ROW# (pos:<item>/iter) /| child::element person { item* } (iter, item) ¶ (iter, item) ROW# (pos:<item>/iter) Prof. Dr. Torsten Grust 17 Technische Universität München DISTINCT ¶ (iter, item) ROW# (pos:<item>/iter) /| child::element people { item* } (iter, item) ¶ (iter, item) ROW# (pos1:<sort>) Rewriting XMark Q8 • |X| (iter = inner) FRAG_UNION FRAGs @ (pos), val: #1 ROOTS ELEM (iter, item:<iter, item><iter, pos, item>) ELEM_TAG ¶ (iter, pos:pos1, item) @ (item), val: item ROW# (pos1:<ord>/iter) U @ (ord), val: #1 FRAG_UNION ¶ (iter, pos, item:res) FRAG_UNION @ (pos), val: #1 ROOTS FRAGs Rewriting phases: @ (ord), val: #2 ¶ (iter, item:res) FRAGs ROOTS ATTR (res:<item, item1>) TEXT (res:<cast>) |X| (iter = iter1) CAST (cast:<item>), type: str @ (item), val: person ¶ (iter1:iter, item1:item) U fn:string_join @ (pos), val: #1 @ (item), val: 0 @ (item), val: " " DIFF ¶ (iter:inner) ¶ (iter) ¶ (iter:outer, pos:pos1, item) COUNT (item:/iter) ¶ (iter:outer) 1. Constant propagation and projection pushdown, |X| (iter = inner) ¶ (iter) |X| (iter = iter1) ¶ (iter1:iter) SEL (item) ¶ (iter, item:res) NOT (res:<item>) U @ (item), val: false @ (item), val: true DIFF DISTINCT ¶ (iter:outer) ¶ (iter:inner) |X| (iter = inner) ¶ (iter) SEL (item) ¶ (iter, item:res) NOT (res:<item>) U @ (item), val: false @ (item), val: true DIFF DISTINCT ¶ (iter:outer) ¶ (iter:inner) |X| (iter = inner) ¶ (iter) SEL (item) ¶ (iter, item:res) = (res:<item, item1>) |X| (iter = iter1) ¶ (iter, item:cast) CAST (cast:<item>), type: str ¶ (iter1:iter, item1:cast) ¶ (iter:inner, item) CAST (cast:<item>), type: str |X| (iter = outer) ¶ (iter:inner, item) ¶ (outer:iter, inner) ¶ (iter:inner, item) NUMBER (inner) ¶ (iter:outer, item) |X| (iter = inner) ¶ (iter, item:cast) CAST (cast:<item>), type: uA ¶ (outer:iter, inner) ¶ (iter, item:res) access attribute value (res:<item>) ¶ (iter:inner, item) NUMBER (inner) /| attribute::attribute id { atomic* } (iter, item) ¶ (iter:inner, item) |X| (iter = outer) ¶ (iter:inner, item) ¶ (outer:iter, inner) NUMBER (inner) ¶ (iter:outer, item) |X| (iter = inner) ¶ (iter, item:cast) CAST (cast:<item>), type: uA ¶ (outer:iter, inner) ¶ (iter, item:res) access attribute value (res:<item>) ¶ (iter:inner, item) ROW# (pos1:<sort>/outer) |X| (iter = inner) NUMBER (inner) /| attribute::attribute person { atomic* } (iter, item) ¶ (iter, pos, item:res) /| child::element buyer { item* } (iter, item) access textnode content (res:<item>) @ (pos), val: #1 ¶ (outer:iter, sort:pos, inner) ¶ (iter:inner, item) NUMBER (inner) |X| (iter = outer) ¶ (iter:inner, item) ¶ (outer:iter, inner) NUMBER (inner) /| child::element closed_auction { item* } (iter, item) /| child::element closed_auctions { item* } (iter, item) ROW# (pos:<item>/iter) /| child::element site { item* } (iter, item) /| child::text (iter, item) ¶ (iter:inner, item) /| child::element name { item* } (iter, item) ¶ (iter:inner, item) ¶ (outer:iter, sort:pos, inner) NUMBER (inner) ROW# (pos:<item>) /| child::element person { item* } (iter, item) /| child::element people { item* } (iter, item) Prof. Dr. Torsten Grust 17 Technische Universität München /| child::element site { item* } (iter, item) FRAG_UNION EMPTY_FRAG ROOTS FRAGs DOC @ (item), val: "auctionG.xml" Rewriting XMark Q8 • Rewriting phases: SERIALIZE 1. Constant propagation and projection pushdown, ¶ (item, pos) ROW# (pos:<pos1>) ¶ (pos1, item) FRAG_UNION |X| (iter = iter1) FRAGs ROOTS ELEM (iter1, item:<iter1, item><iter1, pos, item>) ELEM_TAG ¶ (iter1, item, pos) @ (item), val: item ROW# (pos:<pos1>/iter1) U @ (pos1), val: #1 FRAG_UNION FRAG_UNION ROOTS ¶ (iter1, item) FRAGs ROOTS FRAGs TEXT (item:<item2>) ATTR (item:<item2, item1>) CAST (item2:<item1>), type: str @ (item2), val: person U fn:string_join 2. functional dependency and data flow analysis, @ (pos1), val: #2 ¶ (iter1, item) @ (item1), val: 0 @ (item1), val: " " DIFF ¶ (iter1:iter) ¶ (iter1, item1, pos) ¶ (iter1) COUNT (item1:/iter1) ROW# (pos:<pos1>/iter1) ¶ (iter1) ¶ (iter1, pos1, item1) |X| (iter = iter2) ¶ (iter:iter2) DISTINCT ¶ (item, item1, iter1, iter2:iter) ¶ (iter2) SEL (item) ¶ (iter2, item) = (item:<item3, item4>) ¶ (iter2, item3, item4) CAST (item4:<item>), type: str CAST (item3:<item2>), type: str ¶ (iter2, item2, item) CAST (item:<item4>), type: uA access attribute value (item4:<item3>) ¶ (item3, iter2, item2) |X| (iter1 = iter3) ¶ (iter1:iter3, item3) /| attribute::attribute id { atomic* } (iter3, item3) ¶ (iter2, item2:item1, iter3:iter1) ¶ (item3:item, iter3:iter1) NUMBER (iter1) ¶ (item, iter2, item1) CAST (item1:<item3>), type: uA access attribute value (item3:<item2>) ¶ (item2, item, iter2) |X| (iter = iter3) ¶ (iter:iter3, item2) /| attribute::attribute person { atomic* } (iter3, item2) access textnode content (item1:<item>) ¶ (item:item1, iter2:iter, iter3:iter) /| child::element buyer { item* } (iter3, item2) ROW# (pos1:<item>/iter1) ¶ (item2:item, iter3:iter) /| child::text (iter1, item) NUMBER (iter) /| child::element name { item* } (iter1, item) ¶ (item:item1, iter1:iter) ¶ (item1, iter1:iter) NUMBER (iter) ¶ (item) ROW# (pos1:<item1>) ¶ (item1) /| child::element person { item* } (iter, item1) /| child::element closed_auction { item* } (iter, item) /| child::element people { item* } (iter, item1) /| child::element closed_auctions { item* } (iter, item) /| child::element site { item* } (iter, item1) FRAG_UNION /| child::element site { item* } (iter, item) @ (iter), val: #1 EMPTY_FRAG ¶ (item1:item) FRAGs ROOTS DOC TBL: (iter | item) [#1,"auctionG.xml"] Prof. Dr. Torsten Grust 17 Technische Universität München Rewriting XMark Q8 • Rewriting phases: 1. Constant propagation and projection pushdown, SERIALIZE ¶ (item, pos) ROW# (pos:<pos1>) ¶ (pos1, item) FRAG_UNION 2. functional dependency and data flow analysis, FRAGs |X| (iter = iter1) ROOTS ELEM (iter1, item:<iter1, item><iter1, pos, item>) ELEM_TAG ¶ (iter1, item, pos) @ (item), val: item ROW# (pos:<pos1>/iter1) U @ (pos1), val: #1 FRAG_UNION FRAG_UNION ROOTS ¶ (iter1, item) FRAGs ROOTS FRAGs TEXT (item:<item2>) ATTR (item:<item2, item1>) CAST (item2:<item1>), type: str @ (item2), val: person U fn:string_join @ (item1), val: 0 @ (item1), val: " " DIFF ¶ (iter1:iter) ¶ (iter1, item1, pos) ¶ (iter1) COUNT (item1:/iter1) ROW# (pos:<pos1>/iter1) 3. algebraic XQuery join detection, @ (pos1), val: #2 ¶ (iter1, item) ¶ (iter1) ¶ (iter1, pos1, item1) DISTINCT ¶ (iter, iter1) |X| (item1 = item) access textnode content (item1:<item>) ¶ (iter, item1) ROW# (pos1:<item>/iter1) access attribute value (item1:<item>) /| child::text (iter1, item) ¶ (iter1, item) access attribute value (item:<item1>) ¶ (item, iter:iter2) /| child::element name { item* } (iter1, item) ¶ (item1, iter1:iter2) /| attribute::attribute person { atomic* } (iter2, item) ¶ (item:item1, iter1:iter) /| attribute::attribute id { atomic* } (iter2, item1) /| child::element buyer { item* } (iter2, item) NUMBER (iter) ¶ (item1, iter2:iter) ¶ (item:item1, iter2:iter) ROW# (pos1:<item1>) NUMBER (iter) ¶ (item1) ¶ (item1) /| child::element person { item* } (iter, item1) /| child::element closed_auction { item* } (iter, item1) /| child::element people { item* } (iter, item1) /| child::element closed_auctions { item* } (iter, item1) /| child::element site { item* } (iter, item1) FRAG_UNION EMPTY_FRAG ROOTS FRAGs DOC TBL: (iter | item1) [#1,"auctionG.xml"] Prof. Dr. Torsten Grust 17 Technische Universität München Rewriting XMark Q8 • Rewriting phases: 1. Constant propagation and projection pushdown, 2. functional dependency and data flow analysis, SERIALIZE ROOTS twig (iter, item) ELEM (iter, item) fcns Attach (item), val: item fcns ATTR (iter, item, item1) TEXT (iter, item) Attach (item), val: person Project (iter, item) CAST (item:<item1>), type: str UNION Attach (item1), val: 0 DIFF Project (iter) COUNT (item1:/iter) fn:string_join access textnode content (item1:<item>) Project (iter) ThetaJoin (item eq item1) Attach (item1), val: " " Project (iter, item) 3. algebraic XQuery join detection, Project (item) /|+ child::text (item:item1) level=5 access attribute value (item1:<item>) /|+ child::element name { item* } (item1:iter) level=4 Project (iter, item1) access attribute value (item:<item1>) /|+ attribute::attribute id { atomic* } (item:iter) level=4 Project (item1) /|+ attribute::attribute person { atomic* } (item1:item) level=5 Project (iter) /|+ child::element person { item* } (iter:item) level=3 Project (item) /|+ child::element buyer { item* } (item:item1) level=4 Project (item) Project (item1) /|+ child::element people { item* } (item:item1) level=2 /|+ child::element closed_auction { item* } (item1:item) level=3 Project (item) /|+ child::element closed_auctions { item* } (item:item1) level=2 Project (item1) 4. XPath join graph isolation, Prof. Dr. Torsten Grust 17 /|+ child::element site { item* } (item1:item) level=1 ROOTS DOC (iter, item) TBL: (iter | item) [#1,"auctionG.xml"] Technische Universität München Rewriting XMark Q8 • Rewriting phases: 1. Constant propagation and projection pushdown, 2. functional dependency and data flow analysis, SERIALIZE FRAG UNION FRAGs ROOTS twig (iter, item) ELEM (iter, item) fcns fcns @item:’item’ TEXT (iter, item) ATTR (iter, item, item1) @item:’person’ πiter ,item1 nil πiter ,item CAST (item:¡item1¿), type: str ∪ @item1 :0 \ πiter COUNTitem1 :()/iter πiter 3. algebraic XQuery join detection, ! " item = item1 πitem πiter ,item1 item:item1 πitem1 item1 :item attribute(person) GPS = 960, level = 5 descendant::text() GPS = 802, level = 5 item1 :item item:iter item1 :item attribute(id) GPS = 799, level = 4 item:iter πitem πiter descendant::element(buyer) GPS = 959, level = 4 item:item1 πitem1 :item ∪ EMPTY FRAG 4. XPath join graph isolation, FRAGs item:iter descendant::element(person) GPS = 798, level = 3 πitem ROOTS DOC (iter, item) TBL: (iter — item) 5. data guide exploitation (“Node GPS”). Pathfinder & IBM DB2 V9, 115 MB XML input data < 1 sec Prof. Dr. Torsten Grust 17 Technische Universität München Rewriting XMark Q8 • πiter ! " Rewriting phases: item=item1 1. Constant propagation and projection pushdown, πiter ,item1 πitem 2. functional dependency and data flow analysis, item:item1 πitem1 item1 :item item1 :item 3. algebraic XQuery join detection, 4. XPath join graph isolation, descendant::text() GPS = 802, level = 5 item:iter attribute(person) GPS = 960, level = 5 item1 :item a GPS = 79 item:iter 5. data guide exploitation (“Node GPS”). πitem descendant::element(buyer) Pathfinder & IBM DB2 V9, 115 MB XML input data GPS = 959, level = 4 < 1 sec item:item1 Prof. Dr. Torsten Grust 17 ∪ πitem1Technische :item πitem Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 IDX_GUIDE_PRE_VALUE XMARK_1.DOC Prof. Dr. Torsten Grust DB2 Version 9 NLJOIN( for Linux, UNIX, and Windows NLJOIN(47) 75.02 IXSCAN(51) 50.01 NLJOIN( IXSCAN(49) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC XMARK_1.DOC 18 IXSCAN(63) 50.0 IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München upstream Can DB2 Cope? ® SORT(33) 373.32 *) Indexes avised by DB2 itself HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 IDX_GUIDE_PRE XMARK_1.DOC IXSCAN(53) 50.01 *) IDX_GUIDE_PRE_VALUE XMARK_1.DOC DB2 Version 9 for Linux, UNIX, and Windows NLJOIN(47) 75.02 NLJOIN( IXSCAN(51) 50.01 IDX_GUIDE_PRE_PRE_PLUS_SIZE XMARK_1.DOC Prof. Dr. Torsten Grust NLJOIN( IXSCAN(49) 50.01 *) IDX_VALUE_KIND_PRE_SIZE XMARK_1.DOC 18 IXSCAN(63) 50.0 *) IDX_GUIDE_PRE_VAL XMARK_1.DOC Technische Universität München Order Indifference let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return fn:distinct-doc-order( for $x in $warning//* return $x/text() ) Prof. Dr. Torsten Grust 19 Technische Universität München Order Indifference • Order in XML and XQuery processing is essential: let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return fn:distinct-doc-order( for $x in $warning//* return $x/text() ) Prof. Dr. Torsten Grust 19 Technische Universität München Order Indifference • Order in XML and XQuery processing is essential: let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return fn:distinct-doc-order( for $x in $warning//* return $x/text() ) Do not press button, computer will explode! Prof. Dr. Torsten Grust 19 Technische Universität München Order Indifference • Order in XML and XQuery processing is essential: let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! Prof. Dr. Torsten Grust 19 Technische Universität München Order Indifference • Order in XML and XQuery processing is essential: let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! • Order-awareness thus is often deeply wired into XQuery processors. • Supporting constructs like unordered { e1 } is challenging. Prof. Dr. Torsten Grust 19 Technische Universität München Order Indifference • Order in XML and XQuery processing is essential: let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! • Order-awareness thus is often deeply wired into XQuery processors. SERIALIZE ¶ (item, pos) • ROW# (pos:<pos1>) Supporting constructs like unordered { e1 } is challenging. ¶ (pos1, item) FRAG_UNION FRAGs |X| (iter = iter1) ROOTS ELEM (iter1, item:<iter1, item><iter1, pos, item>) ELEM_TAG ¶ (iter1, item, pos) @ (item), val: item ROW# (pos:<pos1>/iter1) U @ (pos1), val: #1 Prof. Dr. Torsten Grust 19 @ (pos1), val: #2 Technische Universität München FRAG_UNION FRAG_UNION FRAGs ¶ (iter1, item) ROOTS ¶ (iter1, item) FRAGs ROOTS TEXT (item:<item2>) Order Indifference • Order in XML and XQuery processing is essential: let $warning := <p>Do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! • Order-awareness thus is often deeply wired into XQuery processors. π iter,item SERIALIZE ¶ (item, pos) • ROW# (pos:<pos1>) Supporting constructs like unordered { e1 } is challenging. ¶ (pos1, item) FRAG_UNION FRAGs |X| (iter = iter1) ROOTS ELEM (iter1, item:<iter1, item><iter1, pos, item>) ELEM_TAG ¶ (iter1, item, pos) @ (item), val: item ROW# (pos:<pos1>/iter1) U @ (pos1), val: #1 Prof. Dr. Torsten Grust 19 @ (pos1), val: #2 Technische Universität München FRAG_UNION FRAG_UNION FRAGs ¶ (iter1, item) ROOTS ¶ (iter1, item) FRAGs ROOTS TEXT (item:<item2>) More Purely Relational XQuery Prof. Dr. Torsten Grust 20 Technische Universität München More Purely Relational XQuery • Declarative XQuery debugging with “time travel”. • Users observe expressions and may traverse iterations forward/backward. Prof. Dr. Torsten Grust 20 Technische Universität München More Purely Relational XQuery • Declarative XQuery debugging with “time travel”. • Users observe expressions and may traverse iterations forward/backward. Prof. Dr. Torsten Grust 20 iter pos item 1 2 3 4 … Technische Universität München More Purely Relational XQuery • • Declarative XQuery debugging with “time travel”. • Users observe expressions and may traverse iterations forward/backward. iter pos item 1 2 3 4 … Dependable cardinality estimates at XQuery subexpression level (# items per iteration and overall contribution). for $x in $doc//x return if (e1) then e2 else e3 Prof. Dr. Torsten Grust 20 Technische Universität München More Purely Relational XQuery • • Declarative XQuery debugging with “time travel”. • iter pos item Users observe expressions and may traverse iterations forward/backward. 1 2 3 4 … Dependable cardinality estimates at XQuery subexpression level (# items per iteration and overall contribution). for $x in $doc//x 4.3 return if (e1) then e2 else e3 8 Prof. Dr. Torsten Grust 20 2 Technische Universität München Atomization Indexes <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> b day c Same d different shirt Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node a b c atom1 atom2 atom3 b day c Same atom4 d different shirt d Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node a b c atom1 atom2 atom3 b day c Same atom4 d different shirt d Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node a b c atom1 atom2 atom3 b day c Same atom4 d different shirt Same d shirt different day Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node a b c d atom1 atom2 atom3 b day c Same atom4 d different shirt Same shirt shirt different day Prof. Dr. Torsten Grust 21 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node atom1 atom2 a Same shirt b Same shirt c Same shirt Same d shirt shirt different day Prof. Dr. Torsten Grust b day c Same atom3 atom4 different day different 21 d different shirt Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node atom1 atom2 a Same shirt b Same shirt c Same shirt Same d shirt shirt different day Prof. Dr. Torsten Grust b day c Same atom3 atom4 different day different 21 d different shirt dict atom Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> node atom1 atom2 a Sameδ1 shirt b Sameδ1 shirt c Sameδ1 shirt Sameδ1 d shirt shirt different day Prof. Dr. Torsten Grust b day c Same atom3 atom4 different day different 21 d different shirt dict atom δ1 Same Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> b c Same d different shirt node atom1 atom2 atom3 atom4 a δ1 δ2 δ3 δ4 b δ1 δ2 δ3 c δ1 δ2 δ1 d δ2 δ2 δ3 δ4 Prof. Dr. Torsten Grust day dict δ1 δ2 δ3 δ4 21 atom Same shirt different day Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> b c Same d different shirt node atom1 atom2 atom3 atom4 a δ51 δ2 δ3 δ4 b δ51 δ2 δ3 c δ51 δ2 δ1 d δ2 δ2 δ3 δ4 Prof. Dr. Torsten Grust day dict δ1 δ2 δ3 δ4 δ5 21 atom Same shirt different day Same shirt Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> b c Same d different shirt node atom1 atom2 atom3 atom4 a δ651 δ2 δ3 δ4 b δ651 δ2 δ3 c δ51 δ2 δ1 d δ2 δ2 δ3 δ4 Prof. Dr. Torsten Grust day dict δ1 δ2 δ3 δ4 δ5 δ6 21 atom Same shirt different day δ1 δ2 δ5 δ3 Technische Universität München Atomization Indexes a <a> <b> <c>Same<d>shirt</d></c> different </b> day </a> b c Same d different shirt node atom1 atom2 a δ6 δ4 b δ6 c δ5 δ1 d δ2 δ2 δ3 δ4 Prof. Dr. Torsten Grust day dict δ1 δ2 δ3 δ4 δ5 δ6 21 atom Same shirt different day δ1 δ2 δ5 δ3 Technische Universität München Relational Encodings for XML • XML documents represent ordered, unranked trees (nodes of 7 kinds). • Relational XQuery processing calls for an adequate relational encoding of such trees: 1. Schema-oblivious (XQuery constructs arbitrary XML fragments), 2. accessible for the query processor and index support ( 107 nodes in typical GB-range XML instances). Prof. Dr. Torsten Grust 22 Technische Universität München Relational Encodings for XML • XML documents represent ordered, unranked trees (nodes of 7 kinds). • Relational XQuery processing calls for an adequate relational encoding of such trees: 1. Schema-oblivious (XQuery constructs arbitrary XML fragments), 2. accessible for the query processor and index support ( 107 nodes in typical GB-range XML instances). doc Prof. Dr. Torsten Grust ✘ X 1 “<a>b<c/>d<e/><f>g</f></a>” 2 … 22 Technische Universität München Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> Prof. Dr. Torsten Grust 23 Technische Universität München Pre/Postorder Encoding a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> Prof. Dr. Torsten Grust b f h c d 23 g e i j Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> Prof. Dr. Torsten Grust 1b3 b 5f8 f 7h7 h 2c2 c 3d0 d 23 6g4 g 4e1 e 8i5 i j 9j6 Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b 5f8 f 7h7 h 2c2 c 3d0 d 6g4 g 4e1 e 8i5 i j 9j6 pre post node a 0 9 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j Prof. Dr. Torsten Grust 23 Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b 7h7 h 2c2 c 23 6g4 g 4e1 e 3d0 d document order Prof. Dr. Torsten Grust 5f8 f 8i5 i j 9j6 pre post node a 0 9 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b 7h7 h 2c2 c 23 6g4 g 4e1 e 3d0 d document order Prof. Dr. Torsten Grust 5f8 f 8i5 i j 9j6 pre post node kind a 0 9 elem 1 3 b elem 2 2 c elem 3 0 d elem 4 1 e text 5 8 f elem 6 4 g comm 7 7 h elem 8 5 i elem 9 6 j elem Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b 7h7 h 2c2 c 23 6g4 g 4e1 e 3d0 d document order Prof. Dr. Torsten Grust 5f8 f 8i5 i j 9j6 pre post node kind size level a 0 0 9 elem 9 1 3 b elem 3 1 2 2 c elem 2 2 3 0 d elem 0 3 4 1 e text 0 3 5 8 f elem 4 1 6 4 g comm 0 2 7 7 h elem 2 2 8 5 i elem 0 3 9 6 j elem 0 3 Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b 7h7 h 2c2 c 6g4 g 4e1 e 3d0 d document order Prof. Dr. Torsten Grust 5f8 f 23 pre 0 1 2 3 4 5 6 7 8 9 8i5 i node a b c d e f g h i j j 9j6 kind size level 0 elem 9 elem 3 1 elem 2 2 elem 0 3 text 0 3 elem 4 1 comm 0 2 elem 2 2 elem 0 3 elem 0 3 Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b 6g4 g 4e1 e 3d0 d document order 5 Prof. Dr. Torsten Grust 7h7 h 2c2 c post 5 5f8 f pre 23 pre 0 1 2 3 4 5 6 7 8 9 8i5 i node a b c d e f g h i j j 9j6 kind size level 0 elem 9 elem 3 1 elem 2 2 elem 0 3 text 0 3 elem 4 1 comm 0 2 elem 2 2 elem 0 3 elem 0 3 Technische Universität München Pre/Postorder Encoding 0a9 a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3 b f 4e1 e document order j b c d Prof. Dr. Torsten Grust g e 5 6g4 g 3d0 d h 5 7h7 h 2c2 c post a 5f8 f i pre 23 pre 0 1 2 3 4 5 6 7 8 9 8i5 i node a b c d e f g h i j j 9j6 kind size level 0 elem 9 elem 3 1 elem 2 2 elem 0 3 text 0 3 elem 4 1 comm 0 2 elem 2 2 elem 0 3 elem 0 3 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: Prof. Dr. Torsten Grust 24 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: B-Tree a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 pre post node a 0 9 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j Technische Universität München B-Trees Accelerate XPath Location Steps • The XPath axes ancestor, descendant, preceding, following partition any XML document: B-Tree a f h j 5 b Prof. Dr. Torsten Grust g c i e d 5 24 pre post node a 0 9 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j Technische Universität München Partitioned B-Trees Prof. Dr. Torsten Grust 25 Technische Universität München Partitioned B-Trees • Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: Prof. Dr. Torsten Grust 25 Technische Universität München Partitioned B-Trees • Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 1. level (support XPath child axis) Prof. Dr. Torsten Grust 25 B-Tree Technische Universität München Partitioned B-Trees • Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 1. level (support XPath child axis) B-Tree level = 1 level = 2 Prof. Dr. Torsten Grust 25 level = 3 Technische Universität München Partitioned B-Trees • Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 1. level (support XPath child axis) B-Tree 2. element tag name 3. path to node (“Node GPS”) level = 1 level = 2 • level = 3 Given a Pathfinder-generated workload, the index advisor of IBM DB2 V9 proposes such partitionings automatically. Prof. Dr. Torsten Grust 25 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. Pathfinder XQuery Compiler c3 c4 XPath/XQuery Semantics XML Encoding c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Injecting More Tree Awareness • In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. Pathfinder XQuery Compiler c3 c4 staircase join XPath/XQuery Semantics XML Encoding c1 c2 Prof. Dr. Torsten Grust 26 Technische Universität München Staircase Join: Context Pruning • XPath following axis: c3 c4 c1 c2 Prof. Dr. Torsten Grust 27 Technische Universität München Staircase Join: Context Pruning • XPath following axis: c1 c2 Prof. Dr. Torsten Grust 27 Technische Universität München Staircase Join: Context Pruning • XPath following axis: c1 c1 c2 c2 Prof. Dr. Torsten Grust 27 Technische Universität München Staircase Join: Context Pruning • XPath following axis: c1 c2 c2 • Index scan yields duplicate free result in document order. Prof. Dr. Torsten Grust 27 Technische Universität München • Staircase Join: Skipping XPath descendant axis: c3 c4 c1 c2 Prof. Dr. Torsten Grust 28 Technische Universität München • Staircase Join: Skipping XPath descendant axis: c3 c1 Prof. Dr. Torsten Grust 28 Technische Universität München • Staircase Join: Skipping XPath descendant axis: c3 c1 scan Prof. Dr. Torsten Grust 28 Technische Universität München • Staircase Join: Skipping XPath descendant axis: c3 c1 v v c1 scan Prof. Dr. Torsten Grust 28 Technische Universität München • Staircase Join: Skipping XPath descendant axis: c3 c1 v v c1 scan • skip scan Index scan yields duplicate free result in document order. Prof. Dr. Torsten Grust 28 Technische Universität München MonetDB/XQuery Prof. Dr. Torsten Grust 29 Technische Universität München MonetDB/XQuery • MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. • Full vertical fragmentation (column store). Prof. Dr. Torsten Grust 29 pre post node a 0 9 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j Technische Universität München MonetDB/XQuery • MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. • Full vertical fragmentation (column store). Prof. Dr. Torsten Grust 29 pre post 0 9 1 3 2 2 3 0 4 1 5 8 6 4 7 7 8 5 9 6 pre node a 0 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j Technische Universität München MonetDB/XQuery • MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. • Full vertical fragmentation (column store). Prof. Dr. Torsten Grust 29 pre post 0 9 1 3 2 2 3 0 4 1 5 8 6 4 7 7 8 5 9 6 pre node a 0 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j Technische Universität München MonetDB/XQuery • MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. • • Full vertical fragmentation (column store). Pathfinder + MonetDB = MonetDB/XQuery. • • Implementation of XQuery 1.0. Evaluates queries against GB-range XML instances in interactive time. pre post 0 9 1 3 2 2 3 0 4 1 5 8 6 4 7 7 8 5 9 6 pre node a 0 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j + XQuery Update, XQuery RPC ( execute at uri { e } ), support for multi-dimensional XML, ... Prof. Dr. Torsten Grust 29 Technische Universität München “DB2’s XML support doesn’t beat its relational self”. Prof. Dr. Torsten Grust 30 Technische Universität München “DB2’s XML support doesn’t beat its relational self”. • Pathfinder builds on 30+ years of development of relational database technology. Prof. Dr. Torsten Grust 30 Technische Universität München “DB2’s XML support doesn’t beat its relational self”. • Pathfinder builds on 30+ years of development of relational database technology. • Outperforms the built-in DB2 V9 pureXML® engine, especially for large XML input instances. Prof. Dr. Torsten Grust 30 Technische Universität München “DB2’s XML support doesn’t beat its relational self”. • Pathfinder builds on 30+ years of development of relational database technology. • • Outperforms the built-in DB2 V9 pureXML® engine, especially for large XML input instances. Covers other data-intensive languages: 1. SQL/XML Prof. Dr. Torsten Grust 30 Technische Universität München “DB2’s XML support doesn’t beat its relational self”. • Pathfinder builds on 30+ years of development of relational database technology. • • Outperforms the built-in DB2 V9 pureXML® engine, especially for large XML input instances. Covers other data-intensive languages: 1. SQL/XML 2. LINQ + other “Nested Loop” languages. Prof. Dr. Torsten Grust 30 Technische Universität München [email protected] http://www-db.in.tum.de/~grust/ Prof. Dr. Torsten Grust 31 Technische Universität München Prof. Dr. Torsten Grust 32 Technische Universität München