Relational Algebra: Mother Tongue XQuery: Fluent How to Compile XQuery Expressions into a Relational Algebra Torsten Grust Jens Teubner University of Konstanz Dept. of Computer and Information Science Konstanz, Germany June 21, 2004 T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs Relational XPath Back-Ends XQuery is More than XPath XQuery FLWOR Expressions Relational XPath Back-Ends Relational databases can efficiently back XPath evaluation. Encode XML structure using a numbering scheme. “XPath accelerator” with pre/post tuples. Re-use existing database technology. Storage management, index structures, query optimization. Support XPath evaluation through tailor-made operators. “Staircase join” exploits properties of the pre/post encoding. “Holistic” join algorithms to process entire XPath expressions. ➠ Compile XQuery expressions into Relational Algebra. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs Relational XPath Back-Ends XQuery is More than XPath XQuery FLWOR Expressions XQuery is More than XPath Important XQuery features are not yet covered. Iteration in XQuery FLOWR expressions. Contrary to set-oriented processing in relational databases? Construction of transient XML tree nodes. Document table must be “extended” during query processing. This talk addresses the first issue. (The paper additionally covers the second.) The ideas are orthogonal to efficient XPath evaluation. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs Relational XPath Back-Ends XQuery is More than XPath XQuery FLWOR Expressions XQuery FLWOR Expressions XQuery is built around a looping primitive, the for construct. for $v in (x1 , x2 ,..., xn ) return e ≡ x x ( e[ 1/$v ], e[ 2/$v ], . . . , e[xn/$v ] ) $v is successively bound to the values of xi . The return expression e is evaluated for each binding. XQuery is a functional-style language. ➠ It is sound to evaluate e for all bindings in parallel. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs The XQuery Data Model Loop Lifting for Constant Subexpressions The XQuery Data Model XQuery’s basic data type is the sequence. Any expression evaluates to an ordered sequence of items. Sequences are always flat. We may represent a sequence using a two-column table. pos 1 2 3 item "a" "b" "c" Encode order in pos, the atom value in item. (For now, we assume a polymorphic item type.) T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs The XQuery Data Model Loop Lifting for Constant Subexpressions Loop Lifting for Constant Subexpressions We extend our sequence encoding by the column iter that accounts for the independent iterations. iter 1 1 2 2 3 3 pos 1 2 1 2 1 2 item 10 20 10 20 10 20 Example: for $v0 in (1,2,3) return 10 → 10 appears in 3 iterations. for $v0 in (1,2,3) return (10, 20) → (10, 20) appears in 3 iterations. We refer to this as the loop lifted representation of a sequence. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs Deriving a Loop Lifted Value Representation Nested XQuery Expressions Mapping Between Scopes Deriving a Loop Lifted Value Representation We derive a compilation procedure that solely operates on loop lifted sequences. Example: Body of for $v in (10, 20, 30) return $v iter 1 2 3 pos 1 1 1 item 10 20 30 Start with representation of (10, 20, 30). Generate a new iteration for each value. Each value forms a singleton sequence. (pos = 1 for each tuple) The row number operator % in our algebra generates unique iter s. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs Deriving a Loop Lifted Value Representation Nested XQuery Expressions Mapping Between Scopes Nested XQuery Expressions XQuery allows arbitrary expression nesting. Example: for$v0 in (1,2) return for $v0·0 in (10,20) return s s0 s0·0 { ($v0 ,$v0·0 ) We need to map value representations between scopes. Variables defined in surrounding scopes. The expression result of the for expression. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Motivation Value Representation Relational FLWORs Deriving a Loop Lifted Value Representation Nested XQuery Expressions Mapping Between Scopes Mapping Between Scopes We capture expression nesting with help of a relation map. Example: (previous slide) for$v0 in (1,2) return for $v0·0 in (10,20) return s s0 s0·0 { ($v0 ,$v0·0 ) outer 1 1 2 2 inner 1 2 3 4 map(0,0·0) The relation lists corresponding iter values of a for body and its surrounding scope. Value mapping then renders into a join with this relation. We handle arbitrary nesting this way. T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Features and Requirements Evaluation & Summary Translatable Subset of XQuery Core A Relational Algebra to Evaluate XQuery Translatable Subset of XQuery Core We support an XQuery subset that suffices to handle the XMark benchmark set. literals arithmetics builtin functions variable bindings iteration conditionals sequence construction function calls element construction XPath steps 42, "foo", (), . . . e1 + e2 , e1 - e2 , . . . fn:sum(e), fn:count(e), fn:doc(uri) let $v := e1 return e2 for $v at $p in e1 return e2 if p then e1 else e2 e1 , e2 f (e1 , e2 , ..., en ) element e1 { e2 } e/α::ν T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Features and Requirements Evaluation & Summary Translatable Subset of XQuery Core A Relational Algebra to Evaluate XQuery A Relational Algebra to Evaluate XQuery We generate a (almost) standard Relational Algebra. πa1 :b1 ,...,an :bn σa . ∪ × − o na=b %b:ha1 ,...,an i/p α,ν ε ~b:ha1 ,...,an i ab count, . . . projection/renaming selection disjoint union Cartesian product difference equi-join row numbering XPath axis join element construction n-ary arithmetic/comparison operator ∗ literal table count and other aggregation functions T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent Features and Requirements Evaluation & Summary XMark on DB2 Summary XMark on DB2 We implemented our translation for XMark queries in SQL. 1.1 MB 110 MB 1.1 GB XMark 1 XMark 2 XMark 6 XMark 7 0.001 0.01 0.1 1.0 execution time [s] T. Grust, J. Teubner 10 100 Relational Algebra: Mother Tongue—XQuery: Fluent Features and Requirements Evaluation & Summary XMark on DB2 Summary Summary We propose a fully relational evaluation for XQuery. A compilation procedure translates XQuery expressions into Relational Algebra. We can handle FLWOR expression, element construction and others. We can deal with arbitrary expression nesting. Experiments with an SQL based DBMS are promising. This work is part of our ongoing project “Pathfinder.” Resulting algebra expressions, however, can get large. . . T. Grust, J. Teubner Relational Algebra: Mother Tongue—XQuery: Fluent ¶ (iter:outer,pos:pos1,item) ROW# (pos1:<iter,pos>/outer) |X| (iter=inner) ¶ (iter,item:pre) BOX (item) ¶ (outer:iter,inner) × ROW# (inner:<iter,pos>) TBL <("pos",int)> SEL (res) BOX (item) 1 ROW# (pos:<item>/iter) = res:(level,zero) /|child::open_auction × ELEM doc U UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> × ¶ (iter) × TBL <("pos",int)> ¶ (iter1:iter) /|child::open_auction UNBOX (item) ¶ (iter,item) SEL (item) U doc ¶ (iter,pos,item:res) BOX (item) = res:(item,item1) /|child::open_auctions UNBOX (item) ¶ (iter,item) U doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> BOX (item) × ROW# (pos:<item>/iter) TBL <("pos",int)> /|child::site UNBOX (item) ¶ (iter,item) doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> 1,@14 ¶ (iter,item) doc UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,item) ROW# (inner:<iter,pos>) BOX (item) BOX (item) ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) /|child::open_auction UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> U × TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,item) BOX (item) doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,item) doc × 1,@14 TBL <("iter",int)> 0 TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("iter",int)> 1,@14 doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> U ¶ (iter,item) U 1 ROW# (inner:<iter,pos>) ¶ (iter,item) 0 doc BOX (item) /|child::open_auctions U UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,item) BOX (item) BOX (item) ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) /|child::site /|child::site U doc UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,item) × TBL <("iter",int)> TBL <("pos",int),("item",node)> 1,@14 TBL <("iter",int)> 0 TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> U doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> U doc × BOX (item) 1,@14 U ¶ (iter,item) ROW# (pos:<item>/iter) UNBOX (item) 1,@14 UNBOX (item) BOX (item) doc TBL <("pos",int),("item",node)> /|child::open_auction TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> /|child::open_auctions ¶ (iter,item) BOX (item) 0 BOX (item) ROW# (pos:<item>/iter) U doc BOX (item) TBL <("pos",int),("item",node)> 1,@14 U doc × TBL <("iter",int)> BOX (item) ROW# (pos:<item>/iter) UNBOX (item) TBL <("pos",int),("item",node)> UNBOX (item) ¶ (iter,item) /|child::open_auction TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter:inner,item) ROW# (pos:<item>/iter) ¶ (iter,item) U /|child::site doc TBL <("pos",int)> ROW# (inner:<iter,pos>) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> BOX (item) BOX (item) × 0 BOX (item) TBL <("pos",int),("item",node)> U ¶ (iter,item) ROW# (pos:<item>/iter) × 1 /|child::site U UNBOX (item) BOX (item) ROW# (pos:<item>/iter) ¶ (iter:inner,item) UNBOX (item) doc ROW# (inner:<iter,pos>) × /|child::open_auctions doc /|child::open_auctions UNBOX (item) doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ROW# (pos:<item>/iter) 1,1 U U doc TBL <("pos",int),("item",int)> /|child::bidder TBL <("pos",int)> ROW# (pos:<item>/iter) UNBOX (item) UNBOX (item) ¶ (iter,item) doc BOX (item) ¶ (iter:inner,item) ROW# (pos:<item>/iter) UNBOX (item) ROW# (inner:<iter,pos>) U ¶ (iter,item) /|child::bidder ROW# (pos:<item>/iter) UNBOX (item) U ¶ (iter,item) ¶ (iter:inner,item) ¶ (iter,item) × UNBOX (item) BOX (item) BOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> BOX (item) /|child::site doc 1 TBL <("pos",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> BOX (item) BOX (item) ROW# (inner:<iter,pos>) U doc UNBOX (item) ROW# (pos:<item>/iter) × TBL <("pos",int)> BOX (item) /|child::site TBL <("pos",int),("item",node)> ¶ (outer:iter,inner) ROW# (pos:<item>/iter) BOX (item) ¶ (iter,item) ROW# (item:<inner>/outer) /|child::bidder ¶ (iter,item) /|child::open_auctions UNBOX (item) U BOX (item) ROW# (pos:<item>/iter) UNBOX (item) ROW# (pos:<item>/iter) UNBOX (item) × ¶ (iter) U ROW# (pos:<item>/iter) /|child::open_auctions ¶ (iter1:iter,item1:item) BOX (item) BOX (item) ROW# (pos:<item>/iter) doc × ¶ (iter:inner,item) /|child::open_auction doc /|child::open_auction 1 /|child::bidder 1,@14 ROW# (pos:<item>/iter) |X| (iter=iter1) TBL <("pos",int)> 1,1 BOX (item) TBL <("pos",int),("item",node)> BOX (item) ¶ (iter:inner,item) ROW# (inner:<iter,pos>) × 0 ROW# (inner:<iter,pos>) ROW# (inner:<iter,pos>) U ¶ (iter,item) 1 1 /|child::open_auction U ¶ (iter:inner,item) UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> 1 ¶ (iter:inner,item) 1 ¶ (iter,item) ¶ (iter:inner,item) TBL <("pos",int)> TBL <("pos",int)> × TBL <("pos",int)> BOX (item) 0 doc ¶ (iter:inner,item) doc U doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("iter",int)> UNBOX (item1) × BOX (item) TBL <("pos",int),("item",int)> ROW# (pos:<item>/iter) U ROW# (inner:<iter,pos>) UNBOX (item) TBL <("pos",int)> × = res:(item,item1) BOX (item) ROW# (pos:<item>/iter) /|child::bidder 1 ¶ (iter,item) × 1 × TBL <("iter",int)> ¶ (outer:iter,inner) TBL <("pos",int)> UNBOX (item) × ¶ (iter) BOX (item) TBL <("pos",int),("item",node)> UNBOX (item) ¶ (iter,item) BOX (item) ROW# (inner:<iter,pos>) BOX (item) 0 UNBOX (item) ROW# (item:<inner>/outer) × TBL <("iter",int)> |X| (iter=iter1) doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (iter,pos,item:res) /|child::increase U ¶ (iter,item) doc BOX (res) ¶ (iter1:iter,item1:item) ¶ (iter:inner,item) 1 U TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ROW# (pos:<item>/iter) UNBOX (item) UNBOX (item) U ¶ (iter,item) BOX (item) UNBOX (item1) /|child::site /|child::bidder UNBOX (item) UNBOX (item) doc TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ROW# (pos:<item>/iter) ROW# (pos:<item>/iter) ¬ res:(item) U ¶ (iter,item) TBL <("iter",int),("pos",int),("item",*)> doc BOX (item) BOX (item) SEL (res) /|child::text() UNBOX (item) BOX (res) ROW# (pos:<item>/iter) ¶ (iter1:iter) U ¶ (iter,item) ROW# (inner:<iter,pos>) |X| (iter=iter1) ROW# (pos:<item>/iter) UNBOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> ¶ (outer:iter,inner) ¶ (iter,pos,item) BOX (item) TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> /|child::open_auctions UNBOX (item) U ¶ (iter,pos,item) |X| (iter=iter1) ROW# (pos:<item>/iter) doc BOX (item) ROW# (pos:<item>/iter) |X| (iter=inner) 1,"increase" ROW# (inner:<iter,pos>) BOX (item) U ¶ (iter,item) 0 ROW# (pos1:<iter,pos>/outer) TBL <("pos",int),("item",str)> ¶ (iter:inner,item) 1 UNBOX (item) ¶ (iter:outer,pos:pos1,item) BOX (item) UNBOX (item) TBL <("zero",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)> TBL <("pre",node),("size",int),("level",int),("kind",int),("prop",str),("frag",int)>