XML Processing - Institute of Bioinformatics

Werbung
IF
S
ng rm
su nfo
e
l
i
r
Vo Bio 11
r
0
de S 2
S
in
at
ik
Modul 3:
XML Processing
a.Univ.-Prof. Dr. Werner Retschitzegger
IFS
Johannes Kepler University Linz
www.jku.ac.at
Institute of Bioinformatics
www.bioinf.jku.at
Introduction XPath XQuery XML & DB
Information Systems Group
www.ifs.uni-linz.ac.at
XML Processing
Outline
„
Introduction
z
z
z
z
z
„
„
„
Motivation
XML Processing Alternatives – Overview
Extensions of Existing Languages
Interfaces to Existing Languages
Native XML Processing
XPath
XQuery
XML & DB
The following slides are based (among others) on:
„ Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.
„ Walmsley, Priscilla, XQuery, OReilly, March 2007.
„ Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-2
Introduction XPath XQuery XML & DB
XML Processing
Motivation
„
Huge amount of XML data, steadily growing
„
We need to “process” it, including its “storage”
z
z
z
z
z
z
z
z
„
Filter, search, select, join, aggregate
Create new pieces of information
Clean, normalize the data
Update it
Verify the correctness
Take actions based on the existing data
Write complex execution flows
Store it efficiently
No common architecture like for RDBS
z
Applications are too heterogeneous
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-3
Introduction XPath XQuery XML & DB
XML Processing
XML Processing Alternatives – Overview
(1) Existing Language Extensions
z
Procedural
z
Declarative
{
{
JavaScript (ECMA), AJAX, PHP
UE IFS2
SQL/XML – part of the SQL:2003-Standard
(2) Interfaces to Existing Languages
z
XML APIs – Generic Mapping
{
z
VO IFS2
DOM, SAX, StaX
XML Data Binding – Non-Generic Mapping
{
{
{
{
JAXB 2.0 – Java Architecture for XML Binding
SDO – Service Data Objects (J2EE platform)
ADO – ActiveX data objects (.NET platform)
VO/UE
EMF – Eclipse Modeling Framework
Model Engineering
(3) Native XML Processing
z
Pure XML Type System
{
XPath, XSLT and XQuery
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
VO IFS2
M3-4
Introduction XPath XQuery XML & DB
XML Processing
(1) Extensions to Existing Languages
„
Extension of the type system of existing languages with XML types
„
Extension of the API
z
z
z
„
Import of XML data into this type system
XML retrieval and manipulation
XPath-based or XPath inspired
Example: SQL/XML
Relational Data Ö XML Data
EMPLOYEES
EMPLOYEE_ID
FIRST_NAME LAST_NAME
EMPLOYEE_ID RESULT
----------- ---------------------------201 <Emp>Michael Hartstein</Emp>
202 <Emp>Pat Fay</Emp>
203 <Emp>Susan Mavris</Emp>
SELECT e.employee_id,
XMLElement("Emp",
e.first_name||' '||e.last_name) AS result
FROM
employees e
WHERE employee_id > 200;
EMP_RESUMES
XML Data Ö Relational Data
RESULT
------AD_PRES
RESUME
<RESUME>
<FULL_NAME>S.King</FULL_NAME>
<JOB_HISTORY>
<JOB_ID>AD_PRES</JOB_ID>
</JOB_HISTORY>
…
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme
(IFS)
M3-5
</RESUME>
SELECT e.resume.extract
('//JOB_ID/text()') result
FROM emp_resumes e
WHERE e.employee_id = 100;
Introduction XPath XQuery XML & DB
(2) Interfaces to Existing Languages
XML Processing
XML API’s
„
Mapping of XML data to generic
XML programmatic APIs
Programming languages
(e.g. Java, C#) are used to
manipulate the data
Re-serialize it at the end
„
More details later on …
„
„
<purchaseOrder>
<lineItem>
…
</lineItem>
<lineItem>
…
</lineItem>
</purchaseOrder>
<book>
<author>…</author>
<title>…</title>
…
</book>
Generic Mappings
Class DomNode{
public String getNodeName();
public String getNodeValue();
public void setNodeValue(nodeValue);
public short getNodeType();
}
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-6
Introduction XPath XQuery XML & DB
XML Processing
(2) Interfaces to Existing Languages
XML Data Binding
„
„
„
Mapping of the XML Schema
of the XML data to appropriate
code in the target language
Based on this mapping,
marshalling / unmarshalling
between XML and objects
Advantages
Customization of translation possible
Binding
Compiler
Translation
Va
lid
at
ion
XML Schema
Instances
Deserialization
Data
Abstraction
Derived
Derived
Classes
Classes
and
and
Interfaces
Interfaces
Instances
(Unmarshalling)
Objects
XML Document
Serialization
(Marshalling)
Data Binding
Framework
Abstraction from low-level APIs
getter/setter-methods
& the details of the parsing process
<type name=“book-type”>
z Development effort and
<sequence>
<attribute name=“year” type=“xs:integer”/>
error-proness can be reduced
z
„
Disadvantages
High memory demands for
large XML documents
z XML Schemaevolution leads to
a new generation of the
corrsponding classes
z
<element name=“title” type=“xs:string”/>
<sequence minoccurs=“0”>
<element name=“author” type=“xs:string/>
</sequence>
</sequence>
</type>
<element name=“book” type=“book-type”>
Non-Generic
Mappings
http://www.rpbourret.com/xml/XMLDataBinding.htm
Class Book-type{
public integer getYear();
public string getTitle();
public List getAuthors();
}
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
M3-7
XML Processing
(3) Native XML Processing
„
Most promising alternative for the future!
„
The only alternative such that …
z
z
z
z
z
„
Data is stored
z
z
„
the data is modeled only once
it is well integrated with the XML Schema type system
it preserves the logical/physical data independence
the code deals with non-generic structures
the code can be optimized automatically
in plain file systems or in dedicated data stores
e.g. XML extensions of RDBS
Missing pieces, under development
z
z
z
procedural logic
update language
…
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-8
Introduction XPath XQuery XML & DB
XML Processing
Outline
„
„
Introduction
XPath
z
z
z
„
„
Introduction
XPath 1.0
XPath 2.0
XQuery
XML & DB
The following slides are based (among others) on:
„ Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.
„ Walmsley, Priscilla, XQuery, OReilly, March 2007.
„ Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
Introduction
M3-9
XML Processing
Overview
„
Purpose
Original goal: selecting document parts for layout purposes (XSL)
Now used for various XML-standards – XML Schema, XPointer
z No XML syntax used – proprietary syntax
z Various selection criteria, e.g., element/attribute names, content, type
z
z
„
Basic Processing Principle
Tree-based navigation, similar to navigation in a file system
Starting point is always a certain context – i.e., a tree node
specified by a XPath expression
z Navigation and Filter modify the context
z Result of a XPath expression = context computed in the last step
z
z
„
Read-only language
It cannot create nodes or modify existing nodes, except by calling
functions written in another language
z However, it can create new atomic values and sequences of existing nodes
z
„
W3C-Standards
z
z
XPath 1.0, Nov. 1999, ~ 44 pages
XPath 2.0, Jan. 2007, ~ 250 pages
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-10
Introduction XPath XQuery XML & DB
XML Processing
XPath 1.0
XPath Datamodel – 7 Node Types
Node
StringValue: String
child *
NodeWithChildren
NodeWithoutChildren
parent
Root
parent
parent 1
child
child
* parent
Element
*
outermost
element
declares
*
Text
Attribute
*
1
Comment
child *
Processing
Instruction
child *
attribute
*
namespace
isDefinedBy
0..1
Namespace
Note: Root is NOT equal to the root (i.e. outermost) element
but rather represents the whole XML document ("document entity“)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-11
Introduction XPath XQuery XML & DB
XML Processing
XPath 1.0
XPath Data Model – Example HandyCatalog1.xml
UML Object Diagram
:root
Root Node
Legend:
HandyCatalog:Element
Root (Outermost)
Element
Node Name: Node Type
Node Value
: part-of
:Comment
Producer:Element
name:Attribute
NOKIA
NOKIA
name:Attribute
no:Attribute
ProducerNo:Element
Type:Element
Type:Element
h1234
8210
name:Attribute
....
7110
Weight:Element
Price:Element
Price:Element
contract:Attribute
no
:Text
:Text
:Text
contract:Attribute
yes
141g
999
4999
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-12
Introduction XPath XQuery XML & DB
XML Processing
XPath 1.0
XPath Navigation – 13 Axes Names
„
ancestor-orself
„
ancestor
Context Node
„
parent
precedingsibling
Parts of a XML
document represent
nodes of a tree
Processing direction
of the XPath-processor
is depth-first
Further axes names
z attribute
z namespace
followingsibling
self
preceding
following
child
descendant
descendant-or-self
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-13
Introduction XPath XQuery XML & DB
XML Processing
XPath 1.0
root
Hierarchical Operators, Elements/Attributes
„
Hierarchical Operators / and //
z
Producer
name
/
ProducerNo
no
root node
z
z
„
HandyCatalog
Weight
//Type
all Type elements at arbitrary depth
//Type/Price
all Price childelements of Type elements at arbitrary depth
Access to Elements *
z
/*
z
root element
//*
z
Type
name
„
Price
contract
Access to Attributes @
z
//@*
all attributes
all elements, including the root element
/HandyCatalog/*/Type
all Type elements, which are grandchilds
of the HandyCatalog element
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-14
Introduction XPath XQuery XML & DB
XPath 1.0
Filter
XML Processing
root
HandyCatalog
Producer
name
„
//Type[Price]
z
„
//Producer[ProducerNo]/Type[Price]
z
„
all Type elements containing a Price childelement
Type
name
Weight
all Type elements containing a Price childelement,
whereby the Type elements must be childelements of a
Producer element which contains a ProducerNo childelement
Price
contract
//Producer[Type/Price]
z
all Producer elements containing a Type childelement which in
turn contains a Price childelement
„
//Type[Weight and Price]
„
//Type[Weight = "141g"]
z
z
„
ProducerNo
no
all Type elements having Weight and Price childelements
all Type elements containing a Weight childelement with value
141g
//Type[@name = "7110"]
z
all Type elements containing an attribute name with value 7110
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 1.0
Union, Index-based Access, Variables
M3-15
XML Processing
root
HandyCatalog
Producer
name
„
Union |
z
„
ProducerNo
no
Weight
Type
name
Price
contract
Index-based access via the node’s context position
z
z
„
//Type/Weight | //Type/Price
all Weight and Price childelements of
Type elements
//Type[1]
first Type element
Type[last()]
last Type element
Variable $qname
z
z
z
from within XPath 1.0, variables can be referenced only
the variable $qname has to be defined
by the application using XPath 1.0 (e.g., XSLT or XQuery)
Note: XPath 2.0 can also bind values to variable („for-clause“)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-16
Introduction XPath XQuery XML & DB
XML Processing
XPath 1.0
Path Expressions 1/2
Chaining
„
Relative Path
z
„
Location Step[/Location Step]*
Processing starts at the current context node
(determined e.g., by the preceding Location Step)
Absolute Path
z
/Path
Processing starts at the root node ("/") INDEPENDENT of the
current context
„
Location Step
„
AxisName – Navigation via axes name (ancestor, etc.)
z
AxisName::NodeTest('['predicate']')*
Short forms for some axes names
child:: element-name
attribute::attname
/descendant-or-self::node()/
self::node()
parent::node()
element-name
@attname
//
.
..
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 1.0
M3-17
XML Processing
Path Expressions 2/2
„
::NodeTest – Node filtering (1)
z
z
z
z
„
Name of the node, or
Wildcard "*" – arbitrary elements, "@*" – arbitrary attributes, or
Type of the node on basis of a function
(text(), comment(), processing-instruction(), node())
Result = Set of Nodes
[predicate] – Node filtering (2)
z
z
z
z
Is a Filter on all nodes selected by NodeTest – e.g.,
specification of the context position via the nodes’ number
Multiple predicates are processed from left2right
Result = Boolean Value
Predicates may again contain Location Paths
{
{
E.g., selection of a node, in case that certain elements/attributes
exist in the context of this node
//address[tel/@type="work"]
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-18
Introduction XPath XQuery XML & DB
XPath 1.0
XML Processing
Operators and Functions
„
XPath Operators
z
Node Set Operators
z
Boolean and Comparison Operators
z
Arithmetic Operators
{
{
{
„
|, [expr], /, //
or, and, =, !=, <=, <, >=, >
+, -, *, div, mod
XPath Core Function Library Ö ~ 37 functions available
z
Node Set Functions (7)
z
String Functions (20)
{
{
{
last(), position(), count(), id()(), local-name
contains(string s1, string s2)
concat(string s1, string s2, string sn*)
z
Boolean Functions (5)
z
Number Functions (5)
{
{
boolean true(), boolean false()
number round(number), number sum(node-set)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 2.0
M3-19
XML Processing
Goals of XPath 2.0
„
Simplify manipulation of XML Schema-typed content
„
Simplify manipulation of string content
„
Support related XML standards
z
z
z
z
Introduction of a type system based on XML Schema
Regular expressions, changing strings to upper and lower case, etc.
Supports common underlying semantics for XSLT 2.0 and XQuery 1.0
Data model based on the InfoSet W3C-Standard
„
Improve ease of use
„
Improve interoperability
„
Improve i18n support
„
Maintain backward compatibility
z
z
z
z
z
„
New string / aggregation functions, conditional expression, etc.
Different implementations of specifications should produce same result
Support the needs of different languages and cultures worldwide
Large gratuitous incompatibilities were avoided
Ability to run in backward compatibility mode
Enable improved processor efficiency
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-20
Introduction XPath XQuery XML & DB
XPath 2.0
XML Processing
XPath 2.0 vs. XPath 1.0
„
70% more language concepts than XPath 1.0
„
Number of operators
z
„
has doubled
Number of functions in the standard function library
z
has grown by a factor of four
„
Minor changes in core syntax
„
Introduction of a new type system based on XML Schema
z
represents a pretty radical overhaul of the language semantics
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 2.0
M3-21
XML Processing
New Features in XPath 2.0 – Overview
„
Everything is a „sequence“ and Sequence Processing
Construction operators
Filter
z New set operators in addition to UNION
z Functions for list manipulation
z Aggregation functions
z
z
„
Support of XML Schema‘s Type System
Type annotations
Typed values
z Type expressions
z
z
„
Changes to Path Expressions
z
z
„
New Expressions
z
z
„
Node tests now also on basis of XML Schema Types
Location steps can be now defined by function calls
Control primitives: «for» and «if»
Quantifiers: «some» and «every»
New Operators and New Functions
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-22
Introduction XPath XQuery XML & DB
NOTE: Although
XML Processing
syntactically correct,
nested sequences
become unnested
XPath 2.0
„Everything is a sequence“
„
z
„
Item
{abstract}
XPath 1.0: Sets of nodes only
z
„
1/2
Unordered
Can‘t contain duplicates
Sequences
Node
Atomic Value
Sequence
z
Are ordered
(1, 2, 3, 4) is different from (4, 3, 2, 1)
z
Can have duplicates
(1, 2, 3, 4) is different from (1, 1, 2, 3, 4)
z
Can have heterogenous items
(1, 2, 3, “foo“)
z
Can‘t be nested
(1, 2, (3, 4)) is the same as (1, 2, 3, 4)
Identity
z
z
YES: Nodes
NONE: Atomic values and sequences
1 is the same as (1)
Remember
Lisp ?
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
„Everything is a sequence“
z
Every operand of an expression is a sequence
Every result of an expression is a sequence
2 characteristics: closure and composability
z
z
„
2/2
Consequence of „everything is a sequence“
z
„
M3-23
XML Processing
XPath 2.0
„
* contains
The language is closed Æ every possible operation applied to a
sequence generates again a sequence
Therefore expressions can be nested arbitrarily –
composability
Example
z
Sum(//Type/Price)
Result = Sequence
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-24
Introduction XPath XQuery XML & DB
XML Processing
XPath 2.0
Sequence Processing
„
Union (alternative: | as in XPath 1.0)
z
z
„
(A, B) union (A, B) Æ (A, B)
(A, B) union (B, C) Æ (A, B, C)
Intersection
z
(A, B) intersect (A, B) Æ (A, B)
(A, B) intersect (B, C) Æ (B)
z
XPath 1.0 versus XPath 2.0
z
{
{
{
„
1/2
Determine whether the node $x is included in the /foo/bar node-set
XPath 1.0: count(/foo/bar)=count(/foo/bar | $x)
XPath 2.0: $x intersect /foo/bar
Difference
z
(A, B) except (A, B) Æ ()
(A, B) except (B, C) Æ (A)
z
XPath 1.0 versus XPath 2.0
z
{
{
{
Select all attributes except the one with a given NS-qualified name
XPath 1.0: @*[not(namespace-uri()='http://example.com' and localname()='foo')]
XPath 2.0: @* except @exc:foo
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XML Processing
XPath 2.0
Sequence Processing
„
2/2
List functions
z
z
z
z
z
„
M3-25
insert((1, 3, 4),
remove((1, 2, 3),
index-of((10, 20,
empty(()) Æ true
exists((1, 2, 3))
2, 2) Æ (1, 2, 3, 4)
2) Æ (1, 3)
30), 20) Æ 2
Æ true
Aggregation functions
z
z
z
z
z
sum(1, 2, 3) Æ 6 //already supported in XPath 1.0
count(1, 2, 3) Æ 3 //already supported in XPath 1.0
avg(1, 2, 3) Æ 2
min(1, 2, 3) Æ 1
max(1, 2, 3) Æ 3
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-26
Introduction XPath XQuery XML & DB
XML Processing
XPath 2.0
Type System
„
XPath 1.0 supports
z
z
z
z
z
„
Node-sets
Booleans
Strings
A single numeric data type (double precision floating point)
Æ Weakly typed language
XPath 2.0 supports
z
z
z
z
Sequences as a data type
All 19 primitive simple types built into XML Schema like
integers, decimals, single precision, dates, times, durations, …
User-defined data types
Strong type checking as well as weak type checking
Æ hybrid language
Æ satisfies data-oriented and document-oriented world
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-27
Introduction XPath XQuery XML & DB
XML Processing
XPath 2.0
Type System – Changes to XPath 1.0 Data Model
Node
StringValue: String
NodeWithChildren
child *
TypedNode
Name: QName?
TypedValue: AtomicValue*
TypeAnnotation: QName?
NodeWithoutChildren
parent
Document
parent
1
child
child
Element
*
parent
outermost
element
* parent
declares
* *
Attribute
*
*
1
Text
*
Comment
child *
attribute
namespace
Processing
Instruction
child *
isDefinedBy
0..1
Namespace
has
has
XMLSchemaTypes
0..1
TypeAnnotation ComplexTypes
SimpleTypes
TypeAnnotation
0..1
*
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
TypeAnnotation
AtomicValue
M3-28
TypeAnnotation
Introduction XPath XQuery XML & DB
XPath 2.0
XML Processing
Path Expressions – Node Test by Schema Type
„
Node tests in XPath 1.0
z
„
On basis of the node‘s name and it‘s predefined 7 types
Node tests in XPath 2.0
z
z
z
Also on basis of the node‘s type defined by XML Schema
For example, select all elements of type Person, regardless of
the name
Useful especially when using a schema with a rich type
hierarchy in which many elements can be derived from the
same type definition
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 2.0
M3-29
XML Processing
Path Expressions – Function as Location Step
„
„
„
„
Now, a function call can be used as a location step
Allows to follow logical relationships in the document’s
structure, not just physical relationships given by the
hierarchy
Example:
«customer[@id="123"]/find-orders(.)/order-value»
The person writing a path expression doesn’t necessarily
need to know how the orders for a customer are found
z
z
„
supports some kind of information hiding Æ encapsulation
the way that they are found can change without invalidating
the expression Æ locality of change
XPath itself does not allow to write the find-orders()
function
z
you can do this on basis of XQuery or XSLT
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-30
Introduction XPath XQuery XML & DB
XPath 2.0
PurchaseOrder
OrderLines
«for» Expression
Line
Price
„
Quantity
Code
Enables iteration over sequences, returning a new value
for each member in the argument sequence
z
„
XML Processing
Seller
for $line in /po:PurchaseOrder/po:OrderLines/po:Line
return $line/po:Price * $line/po:Quantity
Similar to xsl:for-each, but it is different in that it is an
actual expression, that returns a sequence which can, in turn,
be processed as such
z
fn:sum(
for $line in /po:PurchaseOrder/po:OrderLines/po:Line
return $line/po:Price * $line/po:Quantity
)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 2.0
PurchaseOrder
OrderLines
«if» Expression
Quantity
Code
Depending on whether the expression in parenthesis
evaluates to true or false, the expression returns the
then or else section
z
„
XML Processing
Seller
Line
Price
„
M3-31
if(/po:PurchaseOrder/po:Seller = 'Bookstore')
then 'ok'
else 'ko'
Power of XPath 2.0 comes from the ability to combine
expressions to create sophisticated requests
z
fn:sum(
for $line in /po:PurchaseOrder/po:OrderLines/po:Line
return
if($line/po:Code)
then $line/po:Price * $line/po:Quantity
else ()
)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-32
Introduction XPath XQuery XML & DB
XPath 2.0
XML Processing
Existential «some» and Universal «every» Quantifiers
„
XPath 1.0 equals operator (=) could compare node-sets
z
z
/students/student/name = "Fred" Æ returns true if any
student name is equal to "Fred" Æ existential quantification
The same applies to !=, <, >,…;
{
„
XPath 2.0 makes it possible to write explicit quantified
expressions – existentially and universially quantified
z
z
„
e.g. /students/student/name != "Fred" Æ returns true if any
student name is not equal to "Fred"
some $x in /students/student/name satisfies $x = "Fred"
every $x in /students/student/name satisfies $x =
"Fred"
This formulation is more powerful, because the constraining
condition can be anything (not just =, !=, < and so on)
z
z
some $item in //LineItem
satisfies (($item/Price * $item/Quantity) > 100)
some $x in (1, 2, 3), $y in (2, 3, 4)
satisfies $x + $y = 4
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XPath 2.0
M3-33
XML Processing
String Support Improved
„
Case conversion
„
String concatenation
„
Complementing the starts-with()function of XPath 1.0
z
z
z
„
concat(‘Jane‘, ‘ ‘, ‘Brown‘) Æ ‘Jane Brown‘
ends-with() function
Regular expressions supported by 3 functions
z
z
„
upper-case('Michael') Æ 'MICHAEL‚
matches(), replace(), and tokenize()
Example: matches(SSNumber, '\d{3}-\d{2}-\d{4}')
All functions that perform comparison of strings can now use a
user-specified collation to do the string comparison
z
This allows more intelligent localization of string matching
according to the conventions of different languages
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-34
Introduction XPath XQuery XML & DB
XML Processing
XPath 2.0
XPath Functions by Category
„
Boolean Functions
„
Numeric Functions
„
String Functions
„
Date and Time Functions
„
Duration Functions
„
Aggregation Functions
„
Functions on URIs
„
Functions on QNames
z
z
z
z
z
z
z
z
1/2
boolean(), false(), true()
abs(), avg(), max(), min()
compare(), concat(), contains()
current-date(), current-time()
days-from-dayTimeDuration(), hours-from-dayTimeDuration()
count(), avg(), count(), max(), min(), sum()
base-uri(), collection(), doc()
expanded-QName(), local-name-from-QName()
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
XML Processing
XPath 2.0
XPath Functions by Category
„
base-uri(), collection(), current-date()
Diagnostic Functions
z
„
collection(), doc(), id(), root()
Functions that Return Context Information
z
„
base-uri(), data(), document-uri()
Functions that Find Nodes
z
„
empty(), exists()
Functions that Return Properties of Nodes
z
„
2/2
Functions on Sequences
z
„
M3-35
error(), trace()
Functions that Assert a Static Type
z
exactly-one(), one-or-many(), zero-or-one()
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-36
Introduction XPath XQuery XML & DB
XML Processing
Outline
„
„
„
Introduction
XPath
XQuery
z
z
z
z
z
z
z
z
z
z
„
Introduction
For and let clauses
Adding Elements/Attributes to Results
Conditional Expressions
Joins
Quantifiers
Distinctness & Grouping
Sorting & Aggregating
Structure of a XQuery Program
Appendix
XML & DB
The following slides are based (among others) on:
„ Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.
„ Walmsley, Priscilla, XQuery, OReilly, March 2007.
„ Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-37
Introduction XPath XQuery XML & DB
XML Processing
Introduction
Persistent
data
Why XQuery?
„
Why a “query” language for XML?
z
Preserve logical/physical data independence
{
z
{
Describe the “what”, not the “how”
Commonalities with functional, imperative
and query languages
Declarative
processing
Persistent
data
SQL
Transacted
data
Declarative
processing
Why a native query language? Why not SQL?
z
z
„
Based on an abstract data model, independent of
physical data storage
Declarative programming
{
„
XQuery
Transacted
data
We need to deal with the pecularities of XML
Hierarchical, ordered, textual, potentially schema-less structure
Why another XML processing language ? Why not XSLT?
z
z
The template nature of XSLT was not appealing to DB people
Not declarative enough
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-38
Introduction XPath XQuery XML & DB
XML Processing
Introduction
XPath – XSLT – XQuery
XML-based Syntax
2007
1999
XSLT 2.0
XSLT 1.0
uses
uses
Non-XML-based Syntax
Common Data Model
uses
XQuery 1.0
XML Schema
extends
XPath 2.0
Library of
Functions &
Operators
provides
XPath 1.0
Common Data Model
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
Introduction
M3-39
XML Processing
XPath – XSLT – XQuery
„
XPath 2.0
z
z
„
XSLT 2.0: XML ⇒ XML, HTML, Text
z
z
z
„
Common language for
navigation, selection, extraction
Used in XSLT, XQuery, XPointer,
XML Schema, XForms, etc.
Loosely-typed scripting language
Format XML in HTML for display in browser
Must be highly tolerant of variability/errors in data
XQuery 1.0: XML ⇒ XML
z
z
z
Strongly-typed query language – enforces input and output types
Must guarantee safety/correctness of operations on data – sideeffect free
Large-scale database access
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-40
Introduction XPath XQuery XML & DB
XML Processing
Introduction
History
„
„
Main basis for XQuery was “Quilt”
XML query language from IBM, INRIA and Software AG
XQL
XPointer
XSL
SQL
OQL
XML-QL
Expressions
XPath
Variabel bindings,
flexible structuring
of the result
XQL-99
Quilt
Navigation,
path expressions
XQuery
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
M3-41
XML Processing
Introduction
XQuery Family of Standards
„
W3C-REC Jan. 2007
z
XQuery 1.0 and XPath 2.0 Functions and Operators
{
z
XQuery 1.0 and XPath 2.0 Data Model (XDM)
z
XSLT 2.0 and XQuery 1.0 Serialization
{
{
z
z
representation and access for both XML and non-XML sources
how to output the results of XSLT 2.0 and XML Query evaluation in
XML, HTML or as text
XML Syntax for XQuery 1.0 (XQueryX)
{
an XML-aware syntax for querying collections of structured and semistructured data both locally and over the Web
XQuery 1.0 and XPath 2.0 Formal Semantics
{
„
the functions you can call in XPath expressions and the operations you
can perform on XPath 2.0 data types
the type system used in XQuery and XSLT 2.0 via XPath defined
precisely for implementers
W3C Working Drafts / Java Community Process
XQuery Update – Candidate Recommendation since August 2008!
XQuery and XPath Full Text Search
z XQJ – Query API for Java (~ JDBC)
z
z
http://www.w3.org/TR/xquery/
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-42
Introduction XPath XQuery XML & DB
XML Processing
Introduction
XQuery = 80% XPath 2.0 + 20% …
„
FLWOR (for-let-where-order-return)-expressions
z
„
XML construction
z
„
„
Compile & run-time type tests
User-defined functions
z
z
„
Adding new elements and attributes as well as transformations
Sorting of the result
Operators on types
z
„
~ SQL’s SELECT-FROM-WHERE
Modularize large queries
Process recursive data
Strong typing
z
z
Guarantees result value conforms to output type
Enforced statically or dynamically
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
M3-43
XML Processing
Introduction
FLWOR ['floωer] Expression 1/2
XML-Document
FOR/LET
Ordered list of tupels from
bound variables
WHERE
Filtered list of tupels from
bound variables
ORDER
Ordered list of tupels
RETURN
Iteration (cf. FROM in SQL) and Var. Binding
Variables are bound to values of
expressions (using XPath)
Selection (cf. WHERE in SQL)
Filtering of tuples on basis of predicates
(optional)
Ordering (cf. ORDERBY in SQL)
Ordering of tuples on basis of predicates
(optional)
Construction (cf. SELECT in SQL)
Composition of the result (single nodes,
ordered forest of nodes or atomic value)
Result = Instance of
XPath/XQuery Data Model
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-44
Introduction XPath XQuery XML & DB
XML Processing
Introduction
FLWOR ['floωer] Expression 2/2
,
RETURN expr
FOR $var IN expr
LET $var := expr
Variable
Binding
Function
Call
WHERE expr
XPath
Expression
Variable
Reference
ORDER expr
FLWOR Expressions
„ Allow sorting
„ Allow joining
„ Allow adding elements/
attributes to results
„ Verbose, but can be
clearer
Path Expressions
„ Great if just copying
certain elements and
attributes as is
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
Introduction
M3-45
XML Processing
XQuery Syntax – Some Important Issues
„
„
„
Nested Expressions
Compact, non-XML syntax
BUT all names must be valid XML names
z
z
„
„
No reserved words
Case-sensitive
z
„
„
keywords are written as lowercase
No special end-of-line character
XQuery comments are delimited by (: and :)
z
z
z
„
variables, functions, elements, etc.
can be associated with a NS
anywhere (insignificant) whitespace is allowed
do not appear in the result
expansion over multiple lines allowed
Whitespaces
z
allowed almost anywhere – have no significance
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-46
Introduction XPath XQuery XML & DB
XML Processing
Introduction
The XQuery Processing Model
XQuery
Query
Source
Document
(XML)
Result Tree
XML Processor
Source Tree
Serialize or
pass on
Result
Document
(XML)
Analysis and Evaluation
(XQuery Processor)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-47
Introduction XPath XQuery XML & DB
XML Processing
Prices
Running Example
Order
1
num
date
cust
Catalog
PriceList
effDate
*
1
Number
Name
1
Text
language
1
Text
Prod
Item
dept
1
1..*
*
Product
0..1
Color
Choices
1
Text
0..1
Desc
dept
num
quantity
color
num
1
Price
currency
1
1
Text
Text
0..1
Discount
type
1
Text
Order.xml
© 2010 JKU Linz,
Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Catalog.xml
Prices.xml
M3-48
Catalog
Introduction XPath XQuery XML & DB
XML* Processing
Product
for/let and Enclosed Expressions
dept
1
1
Number
Name
1
„
Using a let clause with a range expression
„
Using a range expression in a for clause
„
Multiple for clauses
„
Multiple variable bindings in one for clause
Text
Adding Elements/Attributes to Results
1
1
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
0..1
Color
Choices
language
Text
z
z
z
z
z
¾
Simple elements
Complex elements – along with their attributes and children if any (not
just their atomic values!)
No opportunity to change attributes, children, etc.
element/attribute constructors – a mixture of ...
Literal content („hard-coded“) – appears as is in the output document
Expressions within „{}“ evaluating to any kind of node (elements,
attributes, etc.) and to atomic values
Using XML syntax (proper nesting, case sensitivity, etc.)
(3) Computed
z
z
z
constructors
Allows for dynamic names of nodes and dynamic values
Copying tags from the input document but making minor changes
(e.g., add an attribute)
Turning content from the input document into markup
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
1
Text
XML Processing
copying of elements/attributes from the input document
(2) Direct
Desc
M3-49
Three Use Cases
(1) 1:1
0..1
M3-50
Catalog
Introduction XPath XQuery XML & DB
XML* Processing
Adding Elements/Attributes to Results
(1) 1:1 Copying from the Input Document
Product
dept
1
1
Number
Name
1
Copy simple elements – name
„
Copy complex elements – product
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
1
Text
Catalog
Product
dept
1
1
Number
Name
1
Text
language
1
Text
0..1
Color
Choices
1
Text
0..1
Desc
1
Text
Wrap whole result (name elements) in new ul elements
Literal
content
„
Desc
XML* Processing
Adding Elements/Attributes to Results
„
Text
0..1
M3-51
Introduction XPath XQuery XML & DB
(2) Direct Constructors 1/3
1
1
Text
„
0..1
Color
Choices
language
In addition, wrap each resulting name element in
an li element
Literal
content
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-52
Catalog
Introduction XPath XQuery XML & DB
XML* Processing
Adding Elements/Attributes to Results
(2) Direct Constructors 2/3
Product
dept
1
1
Number
Name
1
1
Text
„
0..1
Color
Choices
language
0..1
Desc
1
Text
Text
1
Text
Add new attributes, copy attribute values / element
New attribute name &
content
new value
New attribute name &
copy existing value
„
Copy element content (or attribute content)
(its typed value) via data()-function
Copy element content and use as attribute values
with prefix „P“
data-()function not necessary –
automatic „atomization“
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-53
Introduction XPath XQuery XML & DB
Catalog
XML* Processing
Adding Elements/Attributes to Results
(2) Direct Constructors 3/3
Product
dept
1
1
Number
Name
1
Text
„
language
1
Text
0..1
Color
Choices
0..1
Desc
1
Text
1
Text
Copy attributes/elements & eliminate certain elements
Eliminate the number
subelements of product
Copy dept-attributes
to new element new_product
Copy product elements and
add as subelements to new_product
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-54
Introduction XPath XQuery XML & DB
Catalog
XML* Processing
Adding Elements/Attributes to Results
(3) Computed Constructors
Product
dept
1
1
Number
Name
1
Text
„
0..1
Color
Choices
language
1
1
Text
Text
0..1
Desc
1
Text
Turning content into markup
z
z
Attribute values Ö elements
Explicit element constructor
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-55
Introduction XPath XQuery XML & DB
Conditional Expressions
Catalog
XML
Processing
*
Product
dept
1
1
Number
Name
1
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
language
1
Text
0..1
Color
Choices
1
Text
M3-56
0..1
Desc
1
Text
Introduction XPath XQuery XML & DB
Catalog
Order
XML Processing
num
*
Joins
date
cust
Product
dept
1/2
1
1
Number
Name
language
1
Text
„
Two-way join in a predicate
„
Two-way join in a
where clause
1
Text
0..1
0..1
Color
Choices
1
Catalog
num
date
cust
dept
1
1
Number
Name
1
Text
„
Three-way join in a where clause
„
Outer Join
language
1
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Prices
Order
Product
2/2
Text
M3-57
*
Joins
dept
num
quantity
color
1
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
*
Item
Desc
XML Processing
1
PriceList
effDate
1..*
0..1
Color
Choices
1
Text
0..1
Desc
1
Text
Prod
*
num
Item
dept
num
quantity
color
1
0..1
Price
currency
Discount
type
1
Text
M3-58
1
Text
Introduction XPath XQuery XML & DB
Catalog
XML Processing
*
Quantifiers
Product
dept
1
1
Number
Name
1
Text
0..1
Color
Choices
language
1
Text
Text
Quantified expression using the some keyword
„
Quantified expression using the every keyword
„
Combining the not function with the some keyword
„
Binding multiple variables in a quantified expression
Introduction XPath XQuery XML & DB
Desc
1
„
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
0..1
1
Text
M3-59
Order
XML Processing
num
date
cust
Distinctness & Grouping
*
Item
dept
num
quantity
color
„
... by department
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-60
Introduction XPath XQuery XML & DB
Order
num
XML Processing
date
cust
Sorting & Aggregating
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
Structure of a XQuery Program
*
Item
dept
num
quantity
color
M3-61
XML Processing
Prolog, Body, Modules 1/3
„
Prolog
z
Role
{
z
Parts
{
{
{
{
{
{
¾
„
Body
„
Note!
z
z
is the link between the XQuery expression and the environment
where the expression is embedded
namespace declarations
schema imports
default element and function namespace
function declarations
function library imports
global and external variable definitions, etc
each declaration separated by a semicolon
Contains the XQuery expression within { }
a function does not inherit the context from the main body of
the query – rather, the context has to be passed as parameter
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-62
Introduction XPath XQuery XML & DB
XML Processing
Structure of a XQuery Program
Prolog, Body, Modules 2/3
Example 1
Prolog
Example 2
Prolog
Body
Body
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
Structure of a XQuery Program
M3-63
XML Processing
Prolog, Body, Modules 3/3
„
Module
Useful functions available at:
http://www.xqueryfunctions.com
XQuery style conventions:
http://www.xqdoc.org/xquery-style.html
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-64
Catalog
Introduction XPath XQuery XML & DB
XML* Processing
Appendix
for and let Clauses
Product
dept
1
1
Number
Name
1
1
Text
„
Simple for and let clause
„
Intermingled for and let clauses
1
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Text
Desc
1
Text
Catalog
XML* Processing
Appendix
Product
dept
1
1
Number
Name
language
1
Text
1
Text
0..1
Color
Choices
1
Text
„
Wrap the content of each number and name element
„
Get the content of each name element / order by
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
0..1
M3-65
Introduction XPath XQuery XML & DB
Direct Constructors 1/3
0..1
Color
Choices
language
M3-66
0..1
Desc
1
Text
Introduction XPath XQuery XML & DB
Catalog
XML Processing
Appendix
Direct Constructors 2/3
*
Product
dept
1
1
Number
Name
1
1
Text
„
Aggregation function – no tags
from input document included
„
Add attributes class & dep
1
Text
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Text
Desc
1
Text
Catalog
XML
Processing
*
Appendix
Product
dept
1
1
Number
Name
0..1
Color
Choices
language
1
Text
1
1
Text
Text
„
Enclosed expressions that evaluate to elements
„
Enclosed expressions that evaluate to attributes
„
Enclosed expressions with multiple subexpressions
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
0..1
M3-67
Introduction XPath XQuery XML & DB
Direct Constructors 3/3
0..1
Color
Choices
language
M3-68
0..1
Desc
1
Text
Introduction XPath XQuery XML & DB
Catalog
XML* Processing
Appendix
Conditional Expressions
Product
dept
1
1
Number
Name
1
0..1
Color
Choices
language
1
Text
1
Text
Text
„
Simple conditional expression
„
Conditional expression returning multiple expressions
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Catalog
Product
dept
1
1
Number
Name
1
Text
language
1
Text
0..1
Color
Choices
1
Text
A where clause with multiple expressions
and an exists quantifier
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
1
Text
XML* Processing
Appendix
„
Desc
M3-69
Introduction XPath XQuery XML & DB
Quantifiers
0..1
M3-70
0..1
Desc
1
Text
Introduction XPath XQuery XML & DB
Order
XML Processing
num
Appendix
date
cust
Ordering
„
The order by clause
„
Using multiple ordering
specifications
*
Item
dept
num
quantity
color
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-71
Introduction XPath XQuery XML & DB
Catalog
Appendix
Distinctness & Aggregation 1/3
1
Number
Name
1
Distinctness on a combination of values
„
Aggregation – sum
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
date
cust
dept
1
Text
„
Order
XML Processing
num
*
Product
language
1
Text
0..1
Color
Choices
1
Text
0..1
Desc
1
Text
M3-72
*
Item
dept
num
quantity
color
Introduction XPath XQuery XML & DB
Appendix
Order
XML Processing
num
date
cust
*
Distinctness & Aggregation 2/3
„
dept
num
quantity
color
Aggregation – count, sum
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
Appendix
Distinctness & Aggregation 3/3
„
Item
M3-73
Order
XML Processing
num
date
cust
*
Item
dept
num
quantity
color
Aggregation on multiple values
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-74
Introduction XPath XQuery XML & DB
XML Processing
Outline
„
„
„
„
Introduction
XPath
XQuery
XML & DB
z
z
z
z
Motivation
Storage Alternatives
Access Alternatives
SQL/XML – SQL:2003-Standard
The following slides are based (among others) on:
„ Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.
„ Walmsley, Priscilla, XQuery, OReilly, March 2007.
„ Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-75
Introduction XPath XQuery XML & DB
XML Processing
Motivation
XML and DB – Why?
„
„
„
XML
XML Doc.
Doc.
......
......
......
......
......
<a>
<a> ......
......
......
......
......
<b>...</b>
<b>...</b>
......
......
<c
<c d=.../>
d=.../>
</a>
</a>
Existing DB store large amounts of data
Î Publish data as XML documents
Existing DB should store existing XML documents
Î Storage in DB along with additional „meta“ information
Well-known Benefits of DB
z
z
z
z
z
z
XML
XML Doc.
Doc.
......
......
......
......
......
<a>
<a> ......
......
......
......
......
<b>...</b>
<b>...</b>
......
......
<c
<c d=.../>
d=.../>
</a>
</a>
Efficient storage of large amounts of well-structured data
Structured query language (SQL)
Optimization
Views and security mechanisms
Concurrency Control / Transactions – more fine-grained than
just on a document level
Recovery techniques
DB are essential cornerstones of today’s IT infrastructures –
the importance of DB for Web applications steadily increases
"... The Web is one huge database..."
[The Asilomar Report on Database Research, SIGMOD Record 27(4), Dec. 1998]
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-76
Introduction XPath XQuery XML & DB
XML Processing
Motivation
The Challenge: Different Categories of XML Documents
„
Data-oriented
z
z
z
z
„
Well-known, fine-grained,
typed structure
Ordering of subelements doesn‘t matter
Schema available, defining the structure
<Claim>A severe
Examples: order, invoice
<Reason>fire</Reason>
damaged the building and claimed
<DeathToll>12</DeathToll> lives. First
investigations done by police indicate fire
raising with <Motive>criminal
intent</Motive>.
</Claim>
Document-oriented
z
z
z
z
z
„
<Order orderNr="1012">
<CustomerNr>8596</CostumerNr>
<Position posNr="1">
<ProductNr>14896612</ProductNr>
<Amount>2</Amount>...
</Position>...
</Order>
Semi-structured,
course grained, untyped
Ordering of subelements significant
Mixed content common
Schema often non-existent or very generic
Example: Claim
<Email>
Mixture
z
Beispiel: Email
<Sender>[email protected]</Sender>...
<Recipient>[email protected]</Recipient>
<Content>All the best to your 110th
birthday!</Content>
</Email>
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
M3-77
XML Processing
Storage Alternatives
Overview
„
File system
z
z
„
DBS
z
„
Storage Alternatives
XML documents stored as
files at operating system level
DBS
Hybrid
File system
Additional descriptive
attributes and file references
Conventional DBS
Native DBS
stored within DBS possible
XML document stored in
DBS as a whole or shreddered,
eventually together with
Datamodel
descriptive attributes
Hybrid
z
z
XML document or parts
thereof stored across DBS
and file system
Redundant or non-redundant
storage possible
XML
Non-shreddered vs.
shreddered
OO
OR
RM
no Schema
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
DTD XML Schema
Schema
Language
M3-78
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Native Storage
„
Conceptual XML mapping to a fine-grained storage structure
z
z
z
Transformation into an internal XML tree
Often DOM-trees are resembled
Element names are replaced by means of a dictionary
http://www.idealliance.org/proceedings/xml05/ship/58/Native_XML_Databases.HTML
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-79
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Relational Storage – Heterogeneity 1/2
Datamodel
Level
M2
Schema
Level
M1
Instance
Level
M0
Legend:
Relational Concepts
Relation
Attribute
XML Concepts
Element Typ
Attribute
Relationales Schema
DTD / XML Schema (optional)
Relation A
Relation B
...
Element Type a
Element Type b
...
Attribute X
Attribute Y
...
Relational DB
Tupel
Value
Attribute x
Attribute y
...
XML-Document
Element
Element Value
Attribute
Attribute Value
... consistsOf
... mayConstistOf
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-80
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Relational Storage – Heterogeneity 2/2
XML (DTD)
RDBS
Structure
Datatypes
Values
Order
Identification
Relationships
flat
numerous
stored within attributes
tupels are not ordered
nested
basically „STRING“ only
stored within attributes and ETs
elements are ordered
composite key possible
foreign key – typed
just a single attribute of type ID
Schema
necessary
created prior to instances
not part of the instances
optional
also after instance creation
schema in form of tags is part
of the instance data – “selfdescribing”
IDREFs (untyped) and
nested ETs (typed)
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-81
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Relational Storage – Example
DTD:
UML Diagram:
hotelChain
<!ELEMENT hotelChain (hotel*)>
<!ELEMENT hotel (name, category, location, telephone*, room*)>
<!ATTLIST hotel
hotelID CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT category (#PCDATA)>
<!ELEMENT location (#PCDATA)>
<!ELEMENT telephone (#PCDATA)>
<!ELEMENT room (roomCat, price)>
<!ELEMENT roomCat (#PCDATA)>
<!ELEMENT price (#PCDATA)>
*
hotel
1
«attribute»
hotelID
1
name
1
category
1
location
*
*
telephone
room
1
roomCat
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
1
price
M3-82
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Relational Storage – Mapping Onto a Schema
Schema DB-side
„
Fixed Schema
user-defined
Schema is domain-independent
derived
(e.g., Handy-Catalog)
fixed
and independent from the target schema
ed ed
z no decomposition: XML-document is
fix eriv
d
stored as a whole
z decomposition: XML-document is “shreddered”
¾ Similarities with the generic XML API approach
„
Derived Schema
z
¾
„
Schema
er d XML-side
us fine
de
Schema is derived from the other one
Similarities with the XML Data Binding approach
User-Defined Schema
z
Schema is domain-dependent, but has been designed
independent of the target schema
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-83
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Mapping Onto a Fixed RDB-Schema
Example: Decomposition of the document (content and schema)
into a single table
:hotelChain
n
„ Element Name Î DB Value
„
o
«attribute»
:name
:hotelID
c
:category
:hotel
Attribute Name Î DB Value
XML Value Î DB Value
:location
:telephone
f
g
e
d
„
FixedMappingTable
Source Ordinal
...
...
o
o
o
p
f
g
h
c
Name
Target/Value
location
telephone
room
roomCat
Vienna
0043/732/2468
:room
p
:roomCat
:price
c
d
h
p
Suite
[cf. Florescu et al.,
IEEE Data Engineering, 1999]
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-84
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Mapping Onto a Derived RDB-Schema
Example: Decomposition of the XML Schema into tables („Basic Inlining“)
hotelChain
Element Type Î DB Relation
Attribute Î DB Attribute
„ Foreign Keys connect Elements
„
„
*
hotel
1
«attribute»
1
1
name
hotelID
1
category
cID
hcID
hID value
rID
roomCat
hID hcID hotelID
lID hID value
name
telephone
price
tID hID value
pID rID
hID value
1
roomCat
hID
rcID
nID
room
1
location
hotel
*
telephone
room
category
hotelChain
*
location
price
Problem: Fragmentation
rID
value
[cf. Shanmugasundaram et al.,
VLDB, 1999]
value
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-85
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Mapping onto a User-defined RDB-Schema
Example: Mapping of the XML Schema into
existing tables and attributes
hotelChain
*
hotel
1
«attribute»
hotelID
1
1
name
category
1
location
*
*
telephone
room
1
roomCat
Phone
Accommodation
ID Name Category
TownID
1
price
ID Phone# Desc
Town
RoomRates
TownID TownName Country
ID
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
RoomCat
Rate
M3-86
Introduction XPath XQuery XML & DB
XML Processing
Storage Alternatives
Mapping Options – Advantages/Disadvantages
„
Fixed
XML
- Domain not represented in schema
z - Queries/optimization hard to realize
z + Fixed at DB-side:
z
{
{
„
fix
derived user-defined
fix
n.a.
n.a.
fixed
mapping
derived
n.a.
n.a.
derived
mapping
derived
mapping
user-defined
mapping
fixed
userdefined mapping
Derived
z
„
no Schema at XML-side necessary
best suited for document-oriented XML
DB
- The schema at the other hand side must exist
User-Defined
+ Schema can be designed independent of the target schema
+ Data of existing DBs can be used!
z - Heterogeneity problem!
z
z
„
Derived / User-Defined
+ Domain is represented in schema
+ Optimization mechanisms usable
z + Suited especially for data-oriented XML
z
z
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
M3-87
XML Processing
Storage Alternatives
Representation of Mapping Knowledge
„
“Template-Driven”
<?xml version="1.0" ?>
<Accommodation xmlns:sql="urn:schemas-ms-com:xml-sql">
mapping knowledge
<sql:query>
SELECT * FROM
hard-coded
</sql:query>
</Accommodation>
z queries
z transformation programs
z
„
Accommodation FOR XML AUTO,ELEMENTS
“Model-Driven”
mapping knowledge reified (i.e., stored as meta data)
as a file, e.g., as XML document
z in the DB, usage of DB functionality
z
z
<?xml version="1.0" ?>
<Schema xmlns="urn:schemas-ms-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes"
xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<ElementType name="Phone" content="textOnly" />
<ElementType name="Accommodation" sql:relation="Accommodation">
<element type="Phone" sql:relation="Phone" sql:field="Number">
<sql:relationship
key-relation="Accommodation"
key="AcID"
foreign-key="AccID"
foreign-relation="Phone" />
</element>
</ElementType>
</Schema>
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-88
Introduction XPath XQuery XML & DB
XML Processing
Access Alternatives
Read-only Query vs. Data Manipulation
„
Read-only Query
z
XML-centered
{
{
z
DB-centered
{
{
z
Access via SQL-based language
SQL/XML – Part of the current SQL2003-Standard
Proprietary Mechanism
{
„
Access via XML-based language
W3C XQuery-Standard
Neither DB- nor XML-centered
Data Manipulation
z
z
Current research area
XQuery Update Facility, W3C Candidate Rec. Aug. 2008
http://www.w3.org/TR/xqupdate/
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-89
Introduction XPath XQuery XML & DB
XML Processing
SQL/XML
First Edition: Part of SQL:2003-Standard
XML Documents
„
Storage of XML documents
z
z
„
Publishing stored data by extending SQL
with XML-Functions
z
„
Introduction of new datatype XMLType
Automatic shredding which can be customized
SQL/XML
SQL Functions
Functions for retrieving relational data and
transform it into XML
(e.g., XMLGen, XMLElement, XMLAgg)
RDBS
XMLType
Unfortunately, SQL:2003 pre-dated the XQuery
standard
z
z
Therefore no full XQuery functionality avaliable
cf. SQL:2007 ...
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-90
Introduction XPath XQuery XML & DB
SQL/XML
XML Processing
Second Edition: Part of Forthcoming SQL:2007-Standard
„
More complete integration of XQuery Data Model
z
XML datatype will support XQuery data model
{
{
{
„
Advanced Query capabilities
z
XMLQuery() function
z
XMLTable() function
{
{
„
heterogeneous sequences
non well-formed XML data
full XML Schema support and validation
create XML content using XQuery
shred XML to relational data using XQuery
Mapping between SQL & XQuery data model
z
XMLCAST between XML and SQL types
Figure „IBM DB2“
from an article of
Holger Seubert
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Introduction XPath XQuery XML & DB
SQL/XML
M3-91
XML Processing
What is it Good For …
„
Benefits
z
z
z
z
z
„
Takes advantage of the entire SQL infrastructure
(e.g. triggers, PL/SQL)
Transactional support
Scalability, clustering, reliability
Global optimization (XML and relational)
Standard implemented and supported by Microsoft, Oracle,
IBM, etc.
Drawbacks
z
Requires data to be loaded into the DB
{
{
z
z
not good for temporary XML data
not worth the effort for small volumes of data
Blending of the two languages (SQL, XQuery) isn’t natural
XQuery not supported entirely by DB engines
{
No XML updates a la XQuery yet
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-92
XML Processing
Literature
„
Standard-Specifications
z
z
z
„
„
http://www.w3.org/TR/xpath20/
http://www.w3.org/TR/xquery/
http://www.sqlx.org
SQL/XML Standard
Best source (!) on XML & DBS incl. an extensive overview of available systems:
z
z
http://www.rpbourret.com/xml/XMLDatabaseProds.htm
http://www.rpbourret.com/xml/XMLAndDatabases.htm
„
Interesting collection of papers:
„
GI-Working Group „Web und Datenbanken“:
„
M. Koran, Evaluierung von XML Datenbanken, Master Thesis, Universität Zürich, Oktober
2006 [http://www.ifi.uzh.ch/index.php?id=490&print=1&no_cache=1]
„
Books
z
z
z
z
z
http://www.cs.cornell.edu/People/jai/pubs.html#PaperCategory:PublishingRelationalDataAsXML
http://dbs.uni-leipzig.de/webdb/
H. Katz, et al., XQuery from the Experts, Addison Wesley, 2004.
J. Melton et al., Querying XML: XQuery, XPath, and SQL/XML in Context, Morgan
Kaufmann/Elsevier, 2006
M. Klettke, H. Meyer, XML & Datenbanken: Konzepte, Sprachen und Systeme, Meike Klettke,
Holger Meyer, dpunkt, 2003
http://www.xml-und-datenbanken.de/
z
Web & Datenbanken: Konzepte, Architekturen, Anwendungen, Erhard Rahm, Gottfried Vossen
(Hrsg.), dpunkt, 2003
z
Bastian Gorke: XML-Datenbanken in der Praxis, bomots Verlag, 2006
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
M3-93
Herunterladen