Indexing XML Data in RDBMS using ORDPATH

Werbung
Indexing XML Data in
RDBMS using ORDPATH
Microsoft® SQL Server 2005™
Concepts developed by:
Patrick O‘Neil, Elizabeth O‘Neil,
(University of Massachusetts Boston)
Shankar Pal, Istvan Cseri, Oliver Seeliger, Gideon Schaller,
Leo Giakoumakis, Vasili Zolotov, Nigel Westbury
(Microsoft Corporation)
XML Data Model
Sample XML Data (serialized form):
<BOOK ISBN=”1-55860-438-3”>
<SECTION>
<TITLE> Bad Bugs</TITLE>
Nobody loves bad bugs.
<FIGURE CAPTION=”Sample bug”/>
</SECTION>
<SECTION>
<TITLE> Tree Frogs </TITLE>
All right-thinking people
<BOLD> love </BOLD> tree frogs.
</SECTION>
</BOOK>
5. Juli 2006
Stephan Müller
2
XML Data Model
XML Document / Fragment - Properties:
1
2
ISBN
Hierarchy
4 Title
3
Book
Section
Section
8
5 Nobody 6 Figure
Caption
Title
9
All right
Bold
Frogs
10
11
12
7
Document Order: 1 < 2 < 3 < 4 < 5 < ….. < 11 < 12
5. Juli 2006
Stephan Müller
3
XML Data Stored in a Relational Database
SQL Command:
CREATE TABLE docs (
id INT PRIMARY KEY,
xdoc XML
);
SQL with embedded XQuery and XPath:
Created docs Table:
ID
XDOC
1
XML Fragment as BLOB
2
XML Document as BLOB
…
…
7
XML Fragment as BLOB
…
…
SELECT id, xdoc.query(‘
for $s in
/BOOK[@ISBN=“1-55860-438-3“]//SECTION
return <topic> { data($s/TITLE) } </topic> ‘)
FROM docs;
5. Juli 2006
Stephan Müller
4
ORDPATH
What we expect from a labeling scheme:
Introduction
►
Support for structural fidelity
(Hierarchy + Document Order)
►
Support for efficient structural modifications to the XML
tree
- insert sub-tree
- delete sub-tree
without relabeling !!!
- move sub-tree
►
Support for high-performance query plans for native XML
queries using relational primitives
►
Independence of XML schemas typing XML instances
5. Juli 2006
Stephan Müller
6
1
Example of an Initial Load
Book
1.1
Section
1.3
1.5
Section
Hierarchy
ISBN
1.3.5
Title
Nobody
1.3.1
1.3.3
Figure
Caption
Title
All right
Bold
Frogs
1.5.1
1.5.3
1.5.5
1.5.7
1.3.5.1
Document Order:
1 < 1.1 < 1.3 < 1.3.1 < … < 1.5.7
5. Juli 2006
Primary Index: infoset
ORDPATH
TAG
NODE_TYPE
VALUE
1
1 (BOOK)
1 (Element)
Null
1.1
2 (ISBN )
2 (Attribute)
'1'1-5586055860-438438-3'
1.3
3 (SECTION)
1 (Element)
Null
1.3.1
4 (TITLE)
1 (Element)
'Bad Bugs'
1.3.3
--
4 (Value
(Value))
'Nobody loves bad bugs'
bugs'
1.3.5
5 (FIGURE)
1 (Element)
Null
1.3.5.1
6 (CAPTION)
2 (Attribute)
'Sample bug'
bug'
1.5
3 (SECTION)
1 (Element)
Null
1.5.1
4 (TITLE)
1 (Element)
'Tree frogs'
frogs'
1.5.3
--
4 (Value
(Value))
'All rightright-thinking people'
people'
1.5.5
7 (BOLD)
1 (Element)
'love'
love'
1.5.7
--
4 (Value
(Value))
'tree frogs'
frogs'
Li/Oi Pair Design
Li /Oi Pair Design
1.5.3.-9.11
ORDPATH Example Value:
Li /Oi Pair Desgin:
ORDPATH bit pattern:
L0
O0
L1
O1
…
LK
OK
0100101101010110001111111000011
We need a prefix-free Li encoding…
5. Juli 2006
Stephan Müller
9
Prefix–Free Encoding of the Li Bitstrings (using the Fano Condition)
5. Juli 2006
Stephan Müller
10
Li /Oi Pair Design
1.5.3.-9.11
ORDPATH Example Value:
Using Li values from Figure 3.2a
L0 = 3 O0 = 1 L1 = 3 O1 = 5 L2 = 3 O2 = 3 L3 = 4 O3 = -9 L4 = 4 O4 = 11
01
001
01
101
01
011
00011
1111
100
0011
ORDPATH bit pattern
0100101101010110001111111000011
(Figure 3.2a)
5. Juli 2006
Stephan Müller
11
Li /Oi Pair Design
Advantages of comparing ORDPATH Values:
►
Determination of ancestor–descendent relationships for
any two ORDPATHs is very easy.
►
Easy determination of the distance between two
ORDPATHs.
►
Simple bitstring (or byte-by-byte) comparison yields
document order.
5. Juli 2006
Stephan Müller
12
Descendants of a given Context Node
Context Node
( cn = 1.3 )
1
Book
1.1
1.3
ISBN
Section
Section
1.5
1.3.5
Title
Nobody
1.3.1
1.3.3
Figure
Caption
Title
All right
Bold
Frogs
1.5.1
1.5.3
1.5.5
1.5.7
1.3.5.1
5. Juli 2006
Stephan Müller
13
Descendants of a given Context Node
SQL Query:
SELECT
FROM
WHERE
AND
Ordpath
infoset
1.3 < Ordpath
1.4 > Ordpath
(cn)
(cn+1)
Infoset Table:
ORDPATH
TAG
NODE_TYPE
VALUE
1
1 (BOOK)
1 (Element)
Null
1.1
2 (ISBN )
2 (Attribute)
'1'1-5586055860-438438-3'
1.3
3 (SECTION)
1 (Element)
Null
1.3.1
4 (TITLE)
1 (Element)
'Bad Bugs'
1.3.3
--
4 (Value
(Value))
'Nobody loves bad bugs'
bugs'
1.3.5
5 (FIGURE)
1 (Element)
Null
1.3.5.1
6 (CAPTION)
2 (Attribute)
'Sample bug'
bug'
1.5
3 (SECTION)
1 (Element)
Null
1.5.1
4 (TITLE)
1 (Element)
‚Tree frogs'
frogs'
1.5.3
--
4 (Value
(Value))
‚All rightright-thinking people'
people'
1.5.5
7 (BOLD)
1 (Element)
‚love'
love'
1.5.7
--
4 (Value
(Value))
‚tree frogs'
frogs'
14
Arbitrary Inserts
Arbitrary Insertions
Rightmost / Leftmost Insertion:
3.5
Parent
5. Juli 2006
Child4
Child1
Child2
3.5.-1
3.5.1
3.5.3
Stephan Müller
Child3
3.5.5
16
Arbitrary Insertions
Careting in nodes between two existing nodes…
3.5
3.5.1
3.5.2
3.5.2.1
3.5.3
3.5.2.2
3.5.2.2.-1
5. Juli 2006
Stephan Müller
3.5.2.3
3.5.2.2.1
17
Arbitrary Insertions
Careting in nodes between two existing nodes…
3.5
Parent
5. Juli 2006
Child1
Child3
Child6
Child5
Child4
3.5.1
3.5.2.1
3.5.2.2.-1
3.5.2.2.1
3.5.2.3
Stephan Müller
Child2
3.5.3
18
Comment
Note:
► Multiple levels of carets are extremely rare in practice.
Advantage:
► Insertions require no relabelings of old nodes…
We avoid updates to primary key values which would
involve the primary index and all secondary indexes.
5. Juli 2006
Stephan Müller
19
Conclusion
ORDPATH …
►
… is a hierarchical prefix-based labeling scheme.
►
… provides efficient access to subtrees.
►
… provides all kinds of modifications.
►
…
5. Juli 2006
Stephan Müller
20
Herunterladen