IF S ng rm su nfo e l i r Vo Bio 11 r 0 de S 2 S in at ik Modul 3: XML Processing a.Univ.-Prof. Dr. Werner Retschitzegger IFS Johannes Kepler University Linz www.jku.ac.at Institute of Bioinformatics www.bioinf.jku.at Introduction XPath XQuery XML & DB Information Systems Group www.ifs.uni-linz.ac.at XML Processing Outline Introduction z z z z z Motivation XML Processing Alternatives – Overview Extensions of Existing Languages Interfaces to Existing Languages Native XML Processing XPath XQuery XML & DB The following slides are based (among others) on: Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004. Walmsley, Priscilla, XQuery, OReilly, March 2007. Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003. © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-2 Introduction XPath XQuery XML & DB XML Processing Motivation Huge amount of XML data, steadily growing We need to “process” it, including its “storage” z z z z z z z z Filter, search, select, join, aggregate Create new pieces of information Clean, normalize the data Update it Verify the correctness Take actions based on the existing data Write complex execution flows Store it efficiently No common architecture like for RDBS z Applications are too heterogeneous © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-3 Introduction XPath XQuery XML & DB XML Processing XML Processing Alternatives – Overview (1) Existing Language Extensions z Procedural z Declarative { { JavaScript (ECMA), AJAX, PHP UE IFS2 SQL/XML – part of the SQL:2003-Standard (2) Interfaces to Existing Languages z XML APIs – Generic Mapping { z VO IFS2 DOM, SAX, StaX XML Data Binding – Non-Generic Mapping { { { { JAXB 2.0 – Java Architecture for XML Binding SDO – Service Data Objects (J2EE platform) ADO – ActiveX data objects (.NET platform) VO/UE EMF – Eclipse Modeling Framework Model Engineering (3) Native XML Processing z Pure XML Type System { XPath, XSLT and XQuery © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) VO IFS2 M3-4 Introduction XPath XQuery XML & DB XML Processing (1) Extensions to Existing Languages Extension of the type system of existing languages with XML types Extension of the API z z z Import of XML data into this type system XML retrieval and manipulation XPath-based or XPath inspired Example: SQL/XML Relational Data Ö XML Data EMPLOYEES EMPLOYEE_ID FIRST_NAME LAST_NAME EMPLOYEE_ID RESULT ----------- ---------------------------201 <Emp>Michael Hartstein</Emp> 202 <Emp>Pat Fay</Emp> 203 <Emp>Susan Mavris</Emp> SELECT e.employee_id, XMLElement("Emp", e.first_name||' '||e.last_name) AS result FROM employees e WHERE employee_id > 200; EMP_RESUMES XML Data Ö Relational Data RESULT ------AD_PRES RESUME <RESUME> <FULL_NAME>S.King</FULL_NAME> <JOB_HISTORY> <JOB_ID>AD_PRES</JOB_ID> </JOB_HISTORY> … © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-5 </RESUME> SELECT e.resume.extract ('//JOB_ID/text()') result FROM emp_resumes e WHERE e.employee_id = 100; Introduction XPath XQuery XML & DB (2) Interfaces to Existing Languages XML Processing XML API’s Mapping of XML data to generic XML programmatic APIs Programming languages (e.g. Java, C#) are used to manipulate the data Re-serialize it at the end More details later on … <purchaseOrder> <lineItem> … </lineItem> <lineItem> … </lineItem> </purchaseOrder> <book> <author>…</author> <title>…</title> … </book> Generic Mappings Class DomNode{ public String getNodeName(); public String getNodeValue(); public void setNodeValue(nodeValue); public short getNodeType(); } © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-6 Introduction XPath XQuery XML & DB XML Processing (2) Interfaces to Existing Languages XML Data Binding Mapping of the XML Schema of the XML data to appropriate code in the target language Based on this mapping, marshalling / unmarshalling between XML and objects Advantages Customization of translation possible Binding Compiler Translation Va lid at ion XML Schema Instances Deserialization Data Abstraction Derived Derived Classes Classes and and Interfaces Interfaces Instances (Unmarshalling) Objects XML Document Serialization (Marshalling) Data Binding Framework Abstraction from low-level APIs getter/setter-methods & the details of the parsing process <type name=“book-type”> z Development effort and <sequence> <attribute name=“year” type=“xs:integer”/> error-proness can be reduced z Disadvantages High memory demands for large XML documents z XML Schemaevolution leads to a new generation of the corrsponding classes z <element name=“title” type=“xs:string”/> <sequence minoccurs=“0”> <element name=“author” type=“xs:string/> </sequence> </sequence> </type> <element name=“book” type=“book-type”> Non-Generic Mappings http://www.rpbourret.com/xml/XMLDataBinding.htm Class Book-type{ public integer getYear(); public string getTitle(); public List getAuthors(); } © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB M3-7 XML Processing (3) Native XML Processing Most promising alternative for the future! The only alternative such that … z z z z z Data is stored z z the data is modeled only once it is well integrated with the XML Schema type system it preserves the logical/physical data independence the code deals with non-generic structures the code can be optimized automatically in plain file systems or in dedicated data stores e.g. XML extensions of RDBS Missing pieces, under development z z z procedural logic update language … © 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-8 Introduction XPath XQuery XML & DB XML Processing Outline Introduction XPath z z z Introduction XPath 1.0 XPath 2.0 XQuery XML & DB The following slides are based (among others) on: Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004. Walmsley, Priscilla, XQuery, OReilly, March 2007. Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003. © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB Introduction M3-9 XML Processing Overview Purpose Original goal: selecting document parts for layout purposes (XSL) Now used for various XML-standards – XML Schema, XPointer z No XML syntax used – proprietary syntax z Various selection criteria, e.g., element/attribute names, content, type z z Basic Processing Principle Tree-based navigation, similar to navigation in a file system Starting point is always a certain context – i.e., a tree node specified by a XPath expression z Navigation and Filter modify the context z Result of a XPath expression = context computed in the last step z z Read-only language It cannot create nodes or modify existing nodes, except by calling functions written in another language z However, it can create new atomic values and sequences of existing nodes z W3C-Standards z z XPath 1.0, Nov. 1999, ~ 44 pages XPath 2.0, Jan. 2007, ~ 250 pages © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-10 Introduction XPath XQuery XML & DB XML Processing XPath 1.0 XPath Datamodel – 7 Node Types Node StringValue: String child * NodeWithChildren NodeWithoutChildren parent Root parent parent 1 child child * parent Element * outermost element declares * Text Attribute * 1 Comment child * Processing Instruction child * attribute * namespace isDefinedBy 0..1 Namespace Note: Root is NOT equal to the root (i.e. outermost) element but rather represents the whole XML document ("document entity“) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-11 Introduction XPath XQuery XML & DB XML Processing XPath 1.0 XPath Data Model – Example HandyCatalog1.xml UML Object Diagram :root Root Node Legend: HandyCatalog:Element Root (Outermost) Element Node Name: Node Type Node Value : part-of :Comment Producer:Element name:Attribute NOKIA NOKIA name:Attribute no:Attribute ProducerNo:Element Type:Element Type:Element h1234 8210 name:Attribute .... 7110 Weight:Element Price:Element Price:Element contract:Attribute no :Text :Text :Text contract:Attribute yes 141g 999 4999 © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-12 Introduction XPath XQuery XML & DB XML Processing XPath 1.0 XPath Navigation – 13 Axes Names ancestor-orself ancestor Context Node parent precedingsibling Parts of a XML document represent nodes of a tree Processing direction of the XPath-processor is depth-first Further axes names z attribute z namespace followingsibling self preceding following child descendant descendant-or-self © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-13 Introduction XPath XQuery XML & DB XML Processing XPath 1.0 root Hierarchical Operators, Elements/Attributes Hierarchical Operators / and // z Producer name / ProducerNo no root node z z HandyCatalog Weight //Type all Type elements at arbitrary depth //Type/Price all Price childelements of Type elements at arbitrary depth Access to Elements * z /* z root element //* z Type name Price contract Access to Attributes @ z //@* all attributes all elements, including the root element /HandyCatalog/*/Type all Type elements, which are grandchilds of the HandyCatalog element © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-14 Introduction XPath XQuery XML & DB XPath 1.0 Filter XML Processing root HandyCatalog Producer name //Type[Price] z //Producer[ProducerNo]/Type[Price] z all Type elements containing a Price childelement Type name Weight all Type elements containing a Price childelement, whereby the Type elements must be childelements of a Producer element which contains a ProducerNo childelement Price contract //Producer[Type/Price] z all Producer elements containing a Type childelement which in turn contains a Price childelement //Type[Weight and Price] //Type[Weight = "141g"] z z ProducerNo no all Type elements having Weight and Price childelements all Type elements containing a Weight childelement with value 141g //Type[@name = "7110"] z all Type elements containing an attribute name with value 7110 © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 1.0 Union, Index-based Access, Variables M3-15 XML Processing root HandyCatalog Producer name Union | z ProducerNo no Weight Type name Price contract Index-based access via the node’s context position z z //Type/Weight | //Type/Price all Weight and Price childelements of Type elements //Type[1] first Type element Type[last()] last Type element Variable $qname z z z from within XPath 1.0, variables can be referenced only the variable $qname has to be defined by the application using XPath 1.0 (e.g., XSLT or XQuery) Note: XPath 2.0 can also bind values to variable („for-clause“) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-16 Introduction XPath XQuery XML & DB XML Processing XPath 1.0 Path Expressions 1/2 Chaining Relative Path z Location Step[/Location Step]* Processing starts at the current context node (determined e.g., by the preceding Location Step) Absolute Path z /Path Processing starts at the root node ("/") INDEPENDENT of the current context Location Step AxisName – Navigation via axes name (ancestor, etc.) z AxisName::NodeTest('['predicate']')* Short forms for some axes names child:: element-name attribute::attname /descendant-or-self::node()/ self::node() parent::node() element-name @attname // . .. © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 1.0 M3-17 XML Processing Path Expressions 2/2 ::NodeTest – Node filtering (1) z z z z Name of the node, or Wildcard "*" – arbitrary elements, "@*" – arbitrary attributes, or Type of the node on basis of a function (text(), comment(), processing-instruction(), node()) Result = Set of Nodes [predicate] – Node filtering (2) z z z z Is a Filter on all nodes selected by NodeTest – e.g., specification of the context position via the nodes’ number Multiple predicates are processed from left2right Result = Boolean Value Predicates may again contain Location Paths { { E.g., selection of a node, in case that certain elements/attributes exist in the context of this node //address[tel/@type="work"] © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-18 Introduction XPath XQuery XML & DB XPath 1.0 XML Processing Operators and Functions XPath Operators z Node Set Operators z Boolean and Comparison Operators z Arithmetic Operators { { { |, [expr], /, // or, and, =, !=, <=, <, >=, > +, -, *, div, mod XPath Core Function Library Ö ~ 37 functions available z Node Set Functions (7) z String Functions (20) { { { last(), position(), count(), id()(), local-name contains(string s1, string s2) concat(string s1, string s2, string sn*) z Boolean Functions (5) z Number Functions (5) { { boolean true(), boolean false() number round(number), number sum(node-set) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 2.0 M3-19 XML Processing Goals of XPath 2.0 Simplify manipulation of XML Schema-typed content Simplify manipulation of string content Support related XML standards z z z z Introduction of a type system based on XML Schema Regular expressions, changing strings to upper and lower case, etc. Supports common underlying semantics for XSLT 2.0 and XQuery 1.0 Data model based on the InfoSet W3C-Standard Improve ease of use Improve interoperability Improve i18n support Maintain backward compatibility z z z z z New string / aggregation functions, conditional expression, etc. Different implementations of specifications should produce same result Support the needs of different languages and cultures worldwide Large gratuitous incompatibilities were avoided Ability to run in backward compatibility mode Enable improved processor efficiency © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-20 Introduction XPath XQuery XML & DB XPath 2.0 XML Processing XPath 2.0 vs. XPath 1.0 70% more language concepts than XPath 1.0 Number of operators z has doubled Number of functions in the standard function library z has grown by a factor of four Minor changes in core syntax Introduction of a new type system based on XML Schema z represents a pretty radical overhaul of the language semantics © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 2.0 M3-21 XML Processing New Features in XPath 2.0 – Overview Everything is a „sequence“ and Sequence Processing Construction operators Filter z New set operators in addition to UNION z Functions for list manipulation z Aggregation functions z z Support of XML Schema‘s Type System Type annotations Typed values z Type expressions z z Changes to Path Expressions z z New Expressions z z Node tests now also on basis of XML Schema Types Location steps can be now defined by function calls Control primitives: «for» and «if» Quantifiers: «some» and «every» New Operators and New Functions © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-22 Introduction XPath XQuery XML & DB NOTE: Although XML Processing syntactically correct, nested sequences become unnested XPath 2.0 „Everything is a sequence“ z Item {abstract} XPath 1.0: Sets of nodes only z 1/2 Unordered Can‘t contain duplicates Sequences Node Atomic Value Sequence z Are ordered (1, 2, 3, 4) is different from (4, 3, 2, 1) z Can have duplicates (1, 2, 3, 4) is different from (1, 1, 2, 3, 4) z Can have heterogenous items (1, 2, 3, “foo“) z Can‘t be nested (1, 2, (3, 4)) is the same as (1, 2, 3, 4) Identity z z YES: Nodes NONE: Atomic values and sequences 1 is the same as (1) Remember Lisp ? © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB „Everything is a sequence“ z Every operand of an expression is a sequence Every result of an expression is a sequence 2 characteristics: closure and composability z z 2/2 Consequence of „everything is a sequence“ z M3-23 XML Processing XPath 2.0 * contains The language is closed Æ every possible operation applied to a sequence generates again a sequence Therefore expressions can be nested arbitrarily – composability Example z Sum(//Type/Price) Result = Sequence © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-24 Introduction XPath XQuery XML & DB XML Processing XPath 2.0 Sequence Processing Union (alternative: | as in XPath 1.0) z z (A, B) union (A, B) Æ (A, B) (A, B) union (B, C) Æ (A, B, C) Intersection z (A, B) intersect (A, B) Æ (A, B) (A, B) intersect (B, C) Æ (B) z XPath 1.0 versus XPath 2.0 z { { { 1/2 Determine whether the node $x is included in the /foo/bar node-set XPath 1.0: count(/foo/bar)=count(/foo/bar | $x) XPath 2.0: $x intersect /foo/bar Difference z (A, B) except (A, B) Æ () (A, B) except (B, C) Æ (A) z XPath 1.0 versus XPath 2.0 z { { { Select all attributes except the one with a given NS-qualified name XPath 1.0: @*[not(namespace-uri()='http://example.com' and localname()='foo')] XPath 2.0: @* except @exc:foo © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XML Processing XPath 2.0 Sequence Processing 2/2 List functions z z z z z M3-25 insert((1, 3, 4), remove((1, 2, 3), index-of((10, 20, empty(()) Æ true exists((1, 2, 3)) 2, 2) Æ (1, 2, 3, 4) 2) Æ (1, 3) 30), 20) Æ 2 Æ true Aggregation functions z z z z z sum(1, 2, 3) Æ 6 //already supported in XPath 1.0 count(1, 2, 3) Æ 3 //already supported in XPath 1.0 avg(1, 2, 3) Æ 2 min(1, 2, 3) Æ 1 max(1, 2, 3) Æ 3 © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-26 Introduction XPath XQuery XML & DB XML Processing XPath 2.0 Type System XPath 1.0 supports z z z z z Node-sets Booleans Strings A single numeric data type (double precision floating point) Æ Weakly typed language XPath 2.0 supports z z z z Sequences as a data type All 19 primitive simple types built into XML Schema like integers, decimals, single precision, dates, times, durations, … User-defined data types Strong type checking as well as weak type checking Æ hybrid language Æ satisfies data-oriented and document-oriented world © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-27 Introduction XPath XQuery XML & DB XML Processing XPath 2.0 Type System – Changes to XPath 1.0 Data Model Node StringValue: String NodeWithChildren child * TypedNode Name: QName? TypedValue: AtomicValue* TypeAnnotation: QName? NodeWithoutChildren parent Document parent 1 child child Element * parent outermost element * parent declares * * Attribute * * 1 Text * Comment child * attribute namespace Processing Instruction child * isDefinedBy 0..1 Namespace has has XMLSchemaTypes 0..1 TypeAnnotation ComplexTypes SimpleTypes TypeAnnotation 0..1 * © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) TypeAnnotation AtomicValue M3-28 TypeAnnotation Introduction XPath XQuery XML & DB XPath 2.0 XML Processing Path Expressions – Node Test by Schema Type Node tests in XPath 1.0 z On basis of the node‘s name and it‘s predefined 7 types Node tests in XPath 2.0 z z z Also on basis of the node‘s type defined by XML Schema For example, select all elements of type Person, regardless of the name Useful especially when using a schema with a rich type hierarchy in which many elements can be derived from the same type definition © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 2.0 M3-29 XML Processing Path Expressions – Function as Location Step Now, a function call can be used as a location step Allows to follow logical relationships in the document’s structure, not just physical relationships given by the hierarchy Example: «customer[@id="123"]/find-orders(.)/order-value» The person writing a path expression doesn’t necessarily need to know how the orders for a customer are found z z supports some kind of information hiding Æ encapsulation the way that they are found can change without invalidating the expression Æ locality of change XPath itself does not allow to write the find-orders() function z you can do this on basis of XQuery or XSLT © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-30 Introduction XPath XQuery XML & DB XPath 2.0 PurchaseOrder OrderLines «for» Expression Line Price Quantity Code Enables iteration over sequences, returning a new value for each member in the argument sequence z XML Processing Seller for $line in /po:PurchaseOrder/po:OrderLines/po:Line return $line/po:Price * $line/po:Quantity Similar to xsl:for-each, but it is different in that it is an actual expression, that returns a sequence which can, in turn, be processed as such z fn:sum( for $line in /po:PurchaseOrder/po:OrderLines/po:Line return $line/po:Price * $line/po:Quantity ) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 2.0 PurchaseOrder OrderLines «if» Expression Quantity Code Depending on whether the expression in parenthesis evaluates to true or false, the expression returns the then or else section z XML Processing Seller Line Price M3-31 if(/po:PurchaseOrder/po:Seller = 'Bookstore') then 'ok' else 'ko' Power of XPath 2.0 comes from the ability to combine expressions to create sophisticated requests z fn:sum( for $line in /po:PurchaseOrder/po:OrderLines/po:Line return if($line/po:Code) then $line/po:Price * $line/po:Quantity else () ) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-32 Introduction XPath XQuery XML & DB XPath 2.0 XML Processing Existential «some» and Universal «every» Quantifiers XPath 1.0 equals operator (=) could compare node-sets z z /students/student/name = "Fred" Æ returns true if any student name is equal to "Fred" Æ existential quantification The same applies to !=, <, >,…; { XPath 2.0 makes it possible to write explicit quantified expressions – existentially and universially quantified z z e.g. /students/student/name != "Fred" Æ returns true if any student name is not equal to "Fred" some $x in /students/student/name satisfies $x = "Fred" every $x in /students/student/name satisfies $x = "Fred" This formulation is more powerful, because the constraining condition can be anything (not just =, !=, < and so on) z z some $item in //LineItem satisfies (($item/Price * $item/Quantity) > 100) some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4 © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XPath 2.0 M3-33 XML Processing String Support Improved Case conversion String concatenation Complementing the starts-with()function of XPath 1.0 z z z concat(‘Jane‘, ‘ ‘, ‘Brown‘) Æ ‘Jane Brown‘ ends-with() function Regular expressions supported by 3 functions z z upper-case('Michael') Æ 'MICHAEL‚ matches(), replace(), and tokenize() Example: matches(SSNumber, '\d{3}-\d{2}-\d{4}') All functions that perform comparison of strings can now use a user-specified collation to do the string comparison z This allows more intelligent localization of string matching according to the conventions of different languages © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-34 Introduction XPath XQuery XML & DB XML Processing XPath 2.0 XPath Functions by Category Boolean Functions Numeric Functions String Functions Date and Time Functions Duration Functions Aggregation Functions Functions on URIs Functions on QNames z z z z z z z z 1/2 boolean(), false(), true() abs(), avg(), max(), min() compare(), concat(), contains() current-date(), current-time() days-from-dayTimeDuration(), hours-from-dayTimeDuration() count(), avg(), count(), max(), min(), sum() base-uri(), collection(), doc() expanded-QName(), local-name-from-QName() © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB XML Processing XPath 2.0 XPath Functions by Category base-uri(), collection(), current-date() Diagnostic Functions z collection(), doc(), id(), root() Functions that Return Context Information z base-uri(), data(), document-uri() Functions that Find Nodes z empty(), exists() Functions that Return Properties of Nodes z 2/2 Functions on Sequences z M3-35 error(), trace() Functions that Assert a Static Type z exactly-one(), one-or-many(), zero-or-one() © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-36 Introduction XPath XQuery XML & DB XML Processing Outline Introduction XPath XQuery z z z z z z z z z z Introduction For and let clauses Adding Elements/Attributes to Results Conditional Expressions Joins Quantifiers Distinctness & Grouping Sorting & Aggregating Structure of a XQuery Program Appendix XML & DB The following slides are based (among others) on: Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004. Walmsley, Priscilla, XQuery, OReilly, March 2007. Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003. © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-37 Introduction XPath XQuery XML & DB XML Processing Introduction Persistent data Why XQuery? Why a “query” language for XML? z Preserve logical/physical data independence { z { Describe the “what”, not the “how” Commonalities with functional, imperative and query languages Declarative processing Persistent data SQL Transacted data Declarative processing Why a native query language? Why not SQL? z z Based on an abstract data model, independent of physical data storage Declarative programming { XQuery Transacted data We need to deal with the pecularities of XML Hierarchical, ordered, textual, potentially schema-less structure Why another XML processing language ? Why not XSLT? z z The template nature of XSLT was not appealing to DB people Not declarative enough © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-38 Introduction XPath XQuery XML & DB XML Processing Introduction XPath – XSLT – XQuery XML-based Syntax 2007 1999 XSLT 2.0 XSLT 1.0 uses uses Non-XML-based Syntax Common Data Model uses XQuery 1.0 XML Schema extends XPath 2.0 Library of Functions & Operators provides XPath 1.0 Common Data Model © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB Introduction M3-39 XML Processing XPath – XSLT – XQuery XPath 2.0 z z XSLT 2.0: XML ⇒ XML, HTML, Text z z z Common language for navigation, selection, extraction Used in XSLT, XQuery, XPointer, XML Schema, XForms, etc. Loosely-typed scripting language Format XML in HTML for display in browser Must be highly tolerant of variability/errors in data XQuery 1.0: XML ⇒ XML z z z Strongly-typed query language – enforces input and output types Must guarantee safety/correctness of operations on data – sideeffect free Large-scale database access © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-40 Introduction XPath XQuery XML & DB XML Processing Introduction History Main basis for XQuery was “Quilt” XML query language from IBM, INRIA and Software AG XQL XPointer XSL SQL OQL XML-QL Expressions XPath Variabel bindings, flexible structuring of the result XQL-99 Quilt Navigation, path expressions XQuery © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB M3-41 XML Processing Introduction XQuery Family of Standards W3C-REC Jan. 2007 z XQuery 1.0 and XPath 2.0 Functions and Operators { z XQuery 1.0 and XPath 2.0 Data Model (XDM) z XSLT 2.0 and XQuery 1.0 Serialization { { z z representation and access for both XML and non-XML sources how to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as text XML Syntax for XQuery 1.0 (XQueryX) { an XML-aware syntax for querying collections of structured and semistructured data both locally and over the Web XQuery 1.0 and XPath 2.0 Formal Semantics { the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data types the type system used in XQuery and XSLT 2.0 via XPath defined precisely for implementers W3C Working Drafts / Java Community Process XQuery Update – Candidate Recommendation since August 2008! XQuery and XPath Full Text Search z XQJ – Query API for Java (~ JDBC) z z http://www.w3.org/TR/xquery/ © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-42 Introduction XPath XQuery XML & DB XML Processing Introduction XQuery = 80% XPath 2.0 + 20% … FLWOR (for-let-where-order-return)-expressions z XML construction z Compile & run-time type tests User-defined functions z z Adding new elements and attributes as well as transformations Sorting of the result Operators on types z ~ SQL’s SELECT-FROM-WHERE Modularize large queries Process recursive data Strong typing z z Guarantees result value conforms to output type Enforced statically or dynamically © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB M3-43 XML Processing Introduction FLWOR ['floωer] Expression 1/2 XML-Document FOR/LET Ordered list of tupels from bound variables WHERE Filtered list of tupels from bound variables ORDER Ordered list of tupels RETURN Iteration (cf. FROM in SQL) and Var. Binding Variables are bound to values of expressions (using XPath) Selection (cf. WHERE in SQL) Filtering of tuples on basis of predicates (optional) Ordering (cf. ORDERBY in SQL) Ordering of tuples on basis of predicates (optional) Construction (cf. SELECT in SQL) Composition of the result (single nodes, ordered forest of nodes or atomic value) Result = Instance of XPath/XQuery Data Model © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-44 Introduction XPath XQuery XML & DB XML Processing Introduction FLWOR ['floωer] Expression 2/2 , RETURN expr FOR $var IN expr LET $var := expr Variable Binding Function Call WHERE expr XPath Expression Variable Reference ORDER expr FLWOR Expressions Allow sorting Allow joining Allow adding elements/ attributes to results Verbose, but can be clearer Path Expressions Great if just copying certain elements and attributes as is © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB Introduction M3-45 XML Processing XQuery Syntax – Some Important Issues Nested Expressions Compact, non-XML syntax BUT all names must be valid XML names z z No reserved words Case-sensitive z keywords are written as lowercase No special end-of-line character XQuery comments are delimited by (: and :) z z z variables, functions, elements, etc. can be associated with a NS anywhere (insignificant) whitespace is allowed do not appear in the result expansion over multiple lines allowed Whitespaces z allowed almost anywhere – have no significance © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-46 Introduction XPath XQuery XML & DB XML Processing Introduction The XQuery Processing Model XQuery Query Source Document (XML) Result Tree XML Processor Source Tree Serialize or pass on Result Document (XML) Analysis and Evaluation (XQuery Processor) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-47 Introduction XPath XQuery XML & DB XML Processing Prices Running Example Order 1 num date cust Catalog PriceList effDate * 1 Number Name 1 Text language 1 Text Prod Item dept 1 1..* * Product 0..1 Color Choices 1 Text 0..1 Desc dept num quantity color num 1 Price currency 1 1 Text Text 0..1 Discount type 1 Text Order.xml © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Catalog.xml Prices.xml M3-48 Catalog Introduction XPath XQuery XML & DB XML* Processing Product for/let and Enclosed Expressions dept 1 1 Number Name 1 Using a let clause with a range expression Using a range expression in a for clause Multiple for clauses Multiple variable bindings in one for clause Text Adding Elements/Attributes to Results 1 1 Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB 0..1 Color Choices language Text z z z z z ¾ Simple elements Complex elements – along with their attributes and children if any (not just their atomic values!) No opportunity to change attributes, children, etc. element/attribute constructors – a mixture of ... Literal content („hard-coded“) – appears as is in the output document Expressions within „{}“ evaluating to any kind of node (elements, attributes, etc.) and to atomic values Using XML syntax (proper nesting, case sensitivity, etc.) (3) Computed z z z constructors Allows for dynamic names of nodes and dynamic values Copying tags from the input document but making minor changes (e.g., add an attribute) Turning content from the input document into markup © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 1 Text XML Processing copying of elements/attributes from the input document (2) Direct Desc M3-49 Three Use Cases (1) 1:1 0..1 M3-50 Catalog Introduction XPath XQuery XML & DB XML* Processing Adding Elements/Attributes to Results (1) 1:1 Copying from the Input Document Product dept 1 1 Number Name 1 Copy simple elements – name Copy complex elements – product Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 1 Text Catalog Product dept 1 1 Number Name 1 Text language 1 Text 0..1 Color Choices 1 Text 0..1 Desc 1 Text Wrap whole result (name elements) in new ul elements Literal content Desc XML* Processing Adding Elements/Attributes to Results Text 0..1 M3-51 Introduction XPath XQuery XML & DB (2) Direct Constructors 1/3 1 1 Text 0..1 Color Choices language In addition, wrap each resulting name element in an li element Literal content © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-52 Catalog Introduction XPath XQuery XML & DB XML* Processing Adding Elements/Attributes to Results (2) Direct Constructors 2/3 Product dept 1 1 Number Name 1 1 Text 0..1 Color Choices language 0..1 Desc 1 Text Text 1 Text Add new attributes, copy attribute values / element New attribute name & content new value New attribute name & copy existing value Copy element content (or attribute content) (its typed value) via data()-function Copy element content and use as attribute values with prefix „P“ data-()function not necessary – automatic „atomization“ © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-53 Introduction XPath XQuery XML & DB Catalog XML* Processing Adding Elements/Attributes to Results (2) Direct Constructors 3/3 Product dept 1 1 Number Name 1 Text language 1 Text 0..1 Color Choices 0..1 Desc 1 Text 1 Text Copy attributes/elements & eliminate certain elements Eliminate the number subelements of product Copy dept-attributes to new element new_product Copy product elements and add as subelements to new_product © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-54 Introduction XPath XQuery XML & DB Catalog XML* Processing Adding Elements/Attributes to Results (3) Computed Constructors Product dept 1 1 Number Name 1 Text 0..1 Color Choices language 1 1 Text Text 0..1 Desc 1 Text Turning content into markup z z Attribute values Ö elements Explicit element constructor © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-55 Introduction XPath XQuery XML & DB Conditional Expressions Catalog XML Processing * Product dept 1 1 Number Name 1 Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) language 1 Text 0..1 Color Choices 1 Text M3-56 0..1 Desc 1 Text Introduction XPath XQuery XML & DB Catalog Order XML Processing num * Joins date cust Product dept 1/2 1 1 Number Name language 1 Text Two-way join in a predicate Two-way join in a where clause 1 Text 0..1 0..1 Color Choices 1 Catalog num date cust dept 1 1 Number Name 1 Text Three-way join in a where clause Outer Join language 1 Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Prices Order Product 2/2 Text M3-57 * Joins dept num quantity color 1 Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB * Item Desc XML Processing 1 PriceList effDate 1..* 0..1 Color Choices 1 Text 0..1 Desc 1 Text Prod * num Item dept num quantity color 1 0..1 Price currency Discount type 1 Text M3-58 1 Text Introduction XPath XQuery XML & DB Catalog XML Processing * Quantifiers Product dept 1 1 Number Name 1 Text 0..1 Color Choices language 1 Text Text Quantified expression using the some keyword Quantified expression using the every keyword Combining the not function with the some keyword Binding multiple variables in a quantified expression Introduction XPath XQuery XML & DB Desc 1 © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 0..1 1 Text M3-59 Order XML Processing num date cust Distinctness & Grouping * Item dept num quantity color ... by department © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-60 Introduction XPath XQuery XML & DB Order num XML Processing date cust Sorting & Aggregating © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB Structure of a XQuery Program * Item dept num quantity color M3-61 XML Processing Prolog, Body, Modules 1/3 Prolog z Role { z Parts { { { { { { ¾ Body Note! z z is the link between the XQuery expression and the environment where the expression is embedded namespace declarations schema imports default element and function namespace function declarations function library imports global and external variable definitions, etc each declaration separated by a semicolon Contains the XQuery expression within { } a function does not inherit the context from the main body of the query – rather, the context has to be passed as parameter © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-62 Introduction XPath XQuery XML & DB XML Processing Structure of a XQuery Program Prolog, Body, Modules 2/3 Example 1 Prolog Example 2 Prolog Body Body © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB Structure of a XQuery Program M3-63 XML Processing Prolog, Body, Modules 3/3 Module Useful functions available at: http://www.xqueryfunctions.com XQuery style conventions: http://www.xqdoc.org/xquery-style.html © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-64 Catalog Introduction XPath XQuery XML & DB XML* Processing Appendix for and let Clauses Product dept 1 1 Number Name 1 1 Text Simple for and let clause Intermingled for and let clauses 1 Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Text Desc 1 Text Catalog XML* Processing Appendix Product dept 1 1 Number Name language 1 Text 1 Text 0..1 Color Choices 1 Text Wrap the content of each number and name element Get the content of each name element / order by © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 0..1 M3-65 Introduction XPath XQuery XML & DB Direct Constructors 1/3 0..1 Color Choices language M3-66 0..1 Desc 1 Text Introduction XPath XQuery XML & DB Catalog XML Processing Appendix Direct Constructors 2/3 * Product dept 1 1 Number Name 1 1 Text Aggregation function – no tags from input document included Add attributes class & dep 1 Text © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Text Desc 1 Text Catalog XML Processing * Appendix Product dept 1 1 Number Name 0..1 Color Choices language 1 Text 1 1 Text Text Enclosed expressions that evaluate to elements Enclosed expressions that evaluate to attributes Enclosed expressions with multiple subexpressions © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 0..1 M3-67 Introduction XPath XQuery XML & DB Direct Constructors 3/3 0..1 Color Choices language M3-68 0..1 Desc 1 Text Introduction XPath XQuery XML & DB Catalog XML* Processing Appendix Conditional Expressions Product dept 1 1 Number Name 1 0..1 Color Choices language 1 Text 1 Text Text Simple conditional expression Conditional expression returning multiple expressions © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Catalog Product dept 1 1 Number Name 1 Text language 1 Text 0..1 Color Choices 1 Text A where clause with multiple expressions and an exists quantifier © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 1 Text XML* Processing Appendix Desc M3-69 Introduction XPath XQuery XML & DB Quantifiers 0..1 M3-70 0..1 Desc 1 Text Introduction XPath XQuery XML & DB Order XML Processing num Appendix date cust Ordering The order by clause Using multiple ordering specifications * Item dept num quantity color © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-71 Introduction XPath XQuery XML & DB Catalog Appendix Distinctness & Aggregation 1/3 1 Number Name 1 Distinctness on a combination of values Aggregation – sum © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) date cust dept 1 Text Order XML Processing num * Product language 1 Text 0..1 Color Choices 1 Text 0..1 Desc 1 Text M3-72 * Item dept num quantity color Introduction XPath XQuery XML & DB Appendix Order XML Processing num date cust * Distinctness & Aggregation 2/3 dept num quantity color Aggregation – count, sum © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB Appendix Distinctness & Aggregation 3/3 Item M3-73 Order XML Processing num date cust * Item dept num quantity color Aggregation on multiple values © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-74 Introduction XPath XQuery XML & DB XML Processing Outline Introduction XPath XQuery XML & DB z z z z Motivation Storage Alternatives Access Alternatives SQL/XML – SQL:2003-Standard The following slides are based (among others) on: Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004. Walmsley, Priscilla, XQuery, OReilly, March 2007. Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003. © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-75 Introduction XPath XQuery XML & DB XML Processing Motivation XML and DB – Why? XML XML Doc. Doc. ...... ...... ...... ...... ...... <a> <a> ...... ...... ...... ...... ...... <b>...</b> <b>...</b> ...... ...... <c <c d=.../> d=.../> </a> </a> Existing DB store large amounts of data Î Publish data as XML documents Existing DB should store existing XML documents Î Storage in DB along with additional „meta“ information Well-known Benefits of DB z z z z z z XML XML Doc. Doc. ...... ...... ...... ...... ...... <a> <a> ...... ...... ...... ...... ...... <b>...</b> <b>...</b> ...... ...... <c <c d=.../> d=.../> </a> </a> Efficient storage of large amounts of well-structured data Structured query language (SQL) Optimization Views and security mechanisms Concurrency Control / Transactions – more fine-grained than just on a document level Recovery techniques DB are essential cornerstones of today’s IT infrastructures – the importance of DB for Web applications steadily increases "... The Web is one huge database..." [The Asilomar Report on Database Research, SIGMOD Record 27(4), Dec. 1998] © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-76 Introduction XPath XQuery XML & DB XML Processing Motivation The Challenge: Different Categories of XML Documents Data-oriented z z z z Well-known, fine-grained, typed structure Ordering of subelements doesn‘t matter Schema available, defining the structure <Claim>A severe Examples: order, invoice <Reason>fire</Reason> damaged the building and claimed <DeathToll>12</DeathToll> lives. First investigations done by police indicate fire raising with <Motive>criminal intent</Motive>. </Claim> Document-oriented z z z z z <Order orderNr="1012"> <CustomerNr>8596</CostumerNr> <Position posNr="1"> <ProductNr>14896612</ProductNr> <Amount>2</Amount>... </Position>... </Order> Semi-structured, course grained, untyped Ordering of subelements significant Mixed content common Schema often non-existent or very generic Example: Claim <Email> Mixture z Beispiel: Email <Sender>[email protected]</Sender>... <Recipient>[email protected]</Recipient> <Content>All the best to your 110th birthday!</Content> </Email> © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB M3-77 XML Processing Storage Alternatives Overview File system z z DBS z Storage Alternatives XML documents stored as files at operating system level DBS Hybrid File system Additional descriptive attributes and file references Conventional DBS Native DBS stored within DBS possible XML document stored in DBS as a whole or shreddered, eventually together with Datamodel descriptive attributes Hybrid z z XML document or parts thereof stored across DBS and file system Redundant or non-redundant storage possible XML Non-shreddered vs. shreddered OO OR RM no Schema © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) DTD XML Schema Schema Language M3-78 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Native Storage Conceptual XML mapping to a fine-grained storage structure z z z Transformation into an internal XML tree Often DOM-trees are resembled Element names are replaced by means of a dictionary http://www.idealliance.org/proceedings/xml05/ship/58/Native_XML_Databases.HTML © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-79 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Relational Storage – Heterogeneity 1/2 Datamodel Level M2 Schema Level M1 Instance Level M0 Legend: Relational Concepts Relation Attribute XML Concepts Element Typ Attribute Relationales Schema DTD / XML Schema (optional) Relation A Relation B ... Element Type a Element Type b ... Attribute X Attribute Y ... Relational DB Tupel Value Attribute x Attribute y ... XML-Document Element Element Value Attribute Attribute Value ... consistsOf ... mayConstistOf © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-80 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Relational Storage – Heterogeneity 2/2 XML (DTD) RDBS Structure Datatypes Values Order Identification Relationships flat numerous stored within attributes tupels are not ordered nested basically „STRING“ only stored within attributes and ETs elements are ordered composite key possible foreign key – typed just a single attribute of type ID Schema necessary created prior to instances not part of the instances optional also after instance creation schema in form of tags is part of the instance data – “selfdescribing” IDREFs (untyped) and nested ETs (typed) © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-81 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Relational Storage – Example DTD: UML Diagram: hotelChain <!ELEMENT hotelChain (hotel*)> <!ELEMENT hotel (name, category, location, telephone*, room*)> <!ATTLIST hotel hotelID CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT category (#PCDATA)> <!ELEMENT location (#PCDATA)> <!ELEMENT telephone (#PCDATA)> <!ELEMENT room (roomCat, price)> <!ELEMENT roomCat (#PCDATA)> <!ELEMENT price (#PCDATA)> * hotel 1 «attribute» hotelID 1 name 1 category 1 location * * telephone room 1 roomCat © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) 1 price M3-82 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Relational Storage – Mapping Onto a Schema Schema DB-side Fixed Schema user-defined Schema is domain-independent derived (e.g., Handy-Catalog) fixed and independent from the target schema ed ed z no decomposition: XML-document is fix eriv d stored as a whole z decomposition: XML-document is “shreddered” ¾ Similarities with the generic XML API approach Derived Schema z ¾ Schema er d XML-side us fine de Schema is derived from the other one Similarities with the XML Data Binding approach User-Defined Schema z Schema is domain-dependent, but has been designed independent of the target schema © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-83 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Mapping Onto a Fixed RDB-Schema Example: Decomposition of the document (content and schema) into a single table :hotelChain n Element Name Î DB Value o «attribute» :name :hotelID c :category :hotel Attribute Name Î DB Value XML Value Î DB Value :location :telephone f g e d FixedMappingTable Source Ordinal ... ... o o o p f g h c Name Target/Value location telephone room roomCat Vienna 0043/732/2468 :room p :roomCat :price c d h p Suite [cf. Florescu et al., IEEE Data Engineering, 1999] © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-84 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Mapping Onto a Derived RDB-Schema Example: Decomposition of the XML Schema into tables („Basic Inlining“) hotelChain Element Type Î DB Relation Attribute Î DB Attribute Foreign Keys connect Elements * hotel 1 «attribute» 1 1 name hotelID 1 category cID hcID hID value rID roomCat hID hcID hotelID lID hID value name telephone price tID hID value pID rID hID value 1 roomCat hID rcID nID room 1 location hotel * telephone room category hotelChain * location price Problem: Fragmentation rID value [cf. Shanmugasundaram et al., VLDB, 1999] value © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-85 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Mapping onto a User-defined RDB-Schema Example: Mapping of the XML Schema into existing tables and attributes hotelChain * hotel 1 «attribute» hotelID 1 1 name category 1 location * * telephone room 1 roomCat Phone Accommodation ID Name Category TownID 1 price ID Phone# Desc Town RoomRates TownID TownName Country ID © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) RoomCat Rate M3-86 Introduction XPath XQuery XML & DB XML Processing Storage Alternatives Mapping Options – Advantages/Disadvantages Fixed XML - Domain not represented in schema z - Queries/optimization hard to realize z + Fixed at DB-side: z { { fix derived user-defined fix n.a. n.a. fixed mapping derived n.a. n.a. derived mapping derived mapping user-defined mapping fixed userdefined mapping Derived z no Schema at XML-side necessary best suited for document-oriented XML DB - The schema at the other hand side must exist User-Defined + Schema can be designed independent of the target schema + Data of existing DBs can be used! z - Heterogeneity problem! z z Derived / User-Defined + Domain is represented in schema + Optimization mechanisms usable z + Suited especially for data-oriented XML z z © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB M3-87 XML Processing Storage Alternatives Representation of Mapping Knowledge “Template-Driven” <?xml version="1.0" ?> <Accommodation xmlns:sql="urn:schemas-ms-com:xml-sql"> mapping knowledge <sql:query> SELECT * FROM hard-coded </sql:query> </Accommodation> z queries z transformation programs z Accommodation FOR XML AUTO,ELEMENTS “Model-Driven” mapping knowledge reified (i.e., stored as meta data) as a file, e.g., as XML document z in the DB, usage of DB functionality z z <?xml version="1.0" ?> <Schema xmlns="urn:schemas-ms-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes" xmlns:sql="urn:schemas-microsoft-com:xml-sql"> <ElementType name="Phone" content="textOnly" /> <ElementType name="Accommodation" sql:relation="Accommodation"> <element type="Phone" sql:relation="Phone" sql:field="Number"> <sql:relationship key-relation="Accommodation" key="AcID" foreign-key="AccID" foreign-relation="Phone" /> </element> </ElementType> </Schema> © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-88 Introduction XPath XQuery XML & DB XML Processing Access Alternatives Read-only Query vs. Data Manipulation Read-only Query z XML-centered { { z DB-centered { { z Access via SQL-based language SQL/XML – Part of the current SQL2003-Standard Proprietary Mechanism { Access via XML-based language W3C XQuery-Standard Neither DB- nor XML-centered Data Manipulation z z Current research area XQuery Update Facility, W3C Candidate Rec. Aug. 2008 http://www.w3.org/TR/xqupdate/ © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-89 Introduction XPath XQuery XML & DB XML Processing SQL/XML First Edition: Part of SQL:2003-Standard XML Documents Storage of XML documents z z Publishing stored data by extending SQL with XML-Functions z Introduction of new datatype XMLType Automatic shredding which can be customized SQL/XML SQL Functions Functions for retrieving relational data and transform it into XML (e.g., XMLGen, XMLElement, XMLAgg) RDBS XMLType Unfortunately, SQL:2003 pre-dated the XQuery standard z z Therefore no full XQuery functionality avaliable cf. SQL:2007 ... © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-90 Introduction XPath XQuery XML & DB SQL/XML XML Processing Second Edition: Part of Forthcoming SQL:2007-Standard More complete integration of XQuery Data Model z XML datatype will support XQuery data model { { { Advanced Query capabilities z XMLQuery() function z XMLTable() function { { heterogeneous sequences non well-formed XML data full XML Schema support and validation create XML content using XQuery shred XML to relational data using XQuery Mapping between SQL & XQuery data model z XMLCAST between XML and SQL types Figure „IBM DB2“ from an article of Holger Seubert © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Introduction XPath XQuery XML & DB SQL/XML M3-91 XML Processing What is it Good For … Benefits z z z z z Takes advantage of the entire SQL infrastructure (e.g. triggers, PL/SQL) Transactional support Scalability, clustering, reliability Global optimization (XML and relational) Standard implemented and supported by Microsoft, Oracle, IBM, etc. Drawbacks z Requires data to be loaded into the DB { { z z not good for temporary XML data not worth the effort for small volumes of data Blending of the two languages (SQL, XQuery) isn’t natural XQuery not supported entirely by DB engines { No XML updates a la XQuery yet © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-92 XML Processing Literature Standard-Specifications z z z http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xquery/ http://www.sqlx.org SQL/XML Standard Best source (!) on XML & DBS incl. an extensive overview of available systems: z z http://www.rpbourret.com/xml/XMLDatabaseProds.htm http://www.rpbourret.com/xml/XMLAndDatabases.htm Interesting collection of papers: GI-Working Group „Web und Datenbanken“: M. Koran, Evaluierung von XML Datenbanken, Master Thesis, Universität Zürich, Oktober 2006 [http://www.ifi.uzh.ch/index.php?id=490&print=1&no_cache=1] Books z z z z z http://www.cs.cornell.edu/People/jai/pubs.html#PaperCategory:PublishingRelationalDataAsXML http://dbs.uni-leipzig.de/webdb/ H. Katz, et al., XQuery from the Experts, Addison Wesley, 2004. J. Melton et al., Querying XML: XQuery, XPath, and SQL/XML in Context, Morgan Kaufmann/Elsevier, 2006 M. Klettke, H. Meyer, XML & Datenbanken: Konzepte, Sprachen und Systeme, Meike Klettke, Holger Meyer, dpunkt, 2003 http://www.xml-und-datenbanken.de/ z Web & Datenbanken: Konzepte, Architekturen, Anwendungen, Erhard Rahm, Gottfried Vossen (Hrsg.), dpunkt, 2003 z Bastian Gorke: XML-Datenbanken in der Praxis, bomots Verlag, 2006 © 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M3-93