XML Extensible Markup Language Worzyk FH Anhalt Telemedizin WS 09/10 XML - 1 XML • Metalanguage – A Language, which describes languages – Languages describe formats for data exchange Worzyk FH Anhalt Telemedizin WS 09/10 XML - 2 Example Hans Meyer Lohmannstrasse 23 06366 Köthen Dr. Else Müller Bernburger Strasse 56 06366 Köthen Worzyk FH Anhalt Telemedizin WS 09/10 XML - 3 Example <Patient> <Name> <Strasse> <Ort> </Patient> <Arzt> <Name> <Strasse> <Ort> </Arzt> Worzyk FH Anhalt Hans Meyer Lohmannstrasse 23 06366 Köthen </Name> </Strasse> </Ort> Dr. Else Müller Bernburger Strasse 56 06366 Köthen </Name> </Strasse> </Ort> Telemedizin WS 09/10 XML - 4 Structure of XML documents • Prolog – Deklaration of type of dokument – DTD (Document Type Definition) • Elements http://www.w3schools.com/xml/default.asp Worzyk FH Anhalt http://de.selfhtml.org/ Telemedizin WS 09/10 XML - 5 Document Type Definition DTD • It describes the grammar of a XML document • It describes permitted elements and attributes – their data type and range of values – their nesting • An XML – Dokument, that conforms to a DTD is called valid Worzyk FH Anhalt Telemedizin WS 09/10 XML - 6 Example DTD <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Personen [ <!ELEMENT Personen (Patient)> <!ELEMENT Patient (#PCDATA)> ]> <Personen> <Patient> Hans Meyer Lohmannstrasse 23 06366 Köthen </Patient> </Personen> Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten1.xml Telemedizin WS 09/10 XML - 7 Structure of XML documents • DTD describes the characteristics of the elements • Elements are initiated by a start tag <Elementname> and are terminated by a closing tag </Elementname>. • XML tags are case sensitive • Elements can contain Elements. • #PCDATA Parsed character data: The elements consist of character strings whose characters are part of the defined character set. Worzyk FH Anhalt Telemedizin WS 09/10 XML - 8 Names of Elements • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces Worzyk FH Anhalt Telemedizin WS 09/10 XML - 9 Sequence of Elements Subordinate elements are separated in the declaration by commas and included in parentheses. Example: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Personen [ <!ELEMENT Personen (Patient,Arzt)> <!ELEMENT Patient (Name,Adresse)> <!ELEMENT Arzt (Name, Adresse)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Adresse (#PCDATA)> ]> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten2.xml Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten3.xml Telemedizin WS 09/10 XML - 10 selection list • Selection of exactly one element: The available elements are seperated by | • Example: <!DOCTYPE Personen [ <!ELEMENT Personen (Patient|Arzt)> <!ELEMENT Patient (Name,Adresse,Diagnose)> <!ELEMENT Arzt (Name, Adresse,Fachgebiet)> Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten4.xml Telemedizin WS 09/10 XML - 11 Multiple occurrence * The element can appear no time or arbitrarily often + The element can appear at least one time or arbitrarily often ? The element can appear no time or at most one time Worzyk FH Anhalt Telemedizin WS 09/10 XML - 12 Attributes <!ATTLIST element-name attribute-name attribute-type default-value> Types of attriutes:: CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION, xml: Defaultvalue: value #REQUIRED, #IMPLIED, #FIXED value http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten5.xml Worzyk FH Anhalt http://www.w3schools.com/xml/xml_attributes.asp Datenbanksysteme 2 SS 2004 Seite 13 - 13 Comments Comments are embedded by <!– and --> <!-- This is a comment --> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 14 Well-formed XML - File • The file starts with the XML-declaration, which establish the reference to XML • It exists at least one data element • It exists exactly one root element, which contain all other data elements • All required attributes are defined • All elements have the right content • The elements must be nested properly Worzyk FH Anhalt Telemedizin WS 09/10 XML - 15 Valide XML - File • The file is well-formed • A DTD is assigned to the file • The content of the file is according to the assigned DTD Worzyk FH Anhalt Telemedizin WS 09/10 XML - 16 Parser A parser validates if an XML Document is valide: <html> <body> <script type="text/javascript"> var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.validateOnParse="true" xmlDoc.load("Patienten5.xml") document.write("<br />Error Code: ") document.write(xmlDoc.parseError.errorCode) document.write("<br />Error Reason: ") document.write(xmlDoc.parseError.reason) document.write("<br />Error Line: ") document.write(xmlDoc.parseError.line) </script> </body> </html> Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Parser.htm Telemedizin WS 09/10 XML - 17 DTD - Disadvantages • Few datatypes • specification not in XML – Syntax – Specification can not be validated with a parser Worzyk FH Anhalt Telemedizin WS 09/10 XML - 18 XML - Schema • • • • • • • • • An XML Schema: defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes Worzyk FH Anhalt http://www.w3schools.com/schema/schema_intro.asp Telemedizin WS 09/10 XML - 19 XML Schema Advantages over DTD • • • • XML Schemas are extensible to future additions XML Schemas are richer and more useful than DTDs XML Schemas are written in XML XML Schemas support data types – xs;date, xs;dateTime, xs:string • XML Schemas support namespaces – xmlns:xs="http://www.w3.org/2001/XMLSchema“ Worzyk FH Anhalt Telemedizin WS 09/10 XML - 20 Dublin Core Standard Dublin Core Metadata Initiative Conference in 1995 in Dublin / Ohio defined a set of describing attributs to categorize documents in the internet 15 core elements are recommended in „Dublin Core Metadata Element Set, Version 1.1 (ISO 15836)“ http://dublincore.org/documents/dces/ Worzyk FH Anhalt Telemedizin WS 09/10 XML - 21 How to create an XML structure • • • • Create a tree-structure of the data Convert that structure to a DTD Add data elements Test Worzyk FH Anhalt Telemedizin WS 09/10 XML - 22 Example Quarterly billing • • • • • • • One file consists of exactly one physician and at least one patient A phyiscian is either a General Practitioner or a dentist A general practitioner has an address and a profession A dentist has an address A patient has an address and no ore more diagnisis An address consists of Name, City, Street A name has a salutation Mr. or Ms. Worzyk FH Anhalt Telemedizin WS 09/10 XML - 23 Example Quarterly billing billing Physician General Practitioner Address Worzyk FH Anhalt Profession ? Patient | Dentist Address Adresse Name Mr + Diagnosis City Street Ms Telemedizin WS 09/10 XML - 24 * Example - DTD <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Billing [ <!ELEMENT Billing (Physician, Patient+)> <!ELEMENT Physician (General_Practitioner | Dentist)> <!ELEMENT General_Practitioner (Address, Profession?)> <!ELEMENT Dentist (Address)> <!ELEMENT Patient (Address, Diagnosis*)> <!ELEMENT Address (Name, City, Street)> <!ELEMENT Profession (#PCDATA)> <!ELEMENT Diagnosis (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT City (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST Name Salutation (Mr|Ms) "Ms"> ]> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 25 Example - Data < Billing > < Physician > < General_Practitioner > <Address> <Name>Dr. Erpel</Name> <City>Entenhausen</City> <Street>Am Krankenhaus 1</Street> </Address> < Profession >Geriatrics</ Profession > </ General_Practitioner > </ Physician > < Patient > <Address> <Name Anrede="Herr">Daniel</Name> <City>Entenhausen</City> <Street>Bahnhofstrasse 3a</Street> </Address> <Diagnose>Bettflucht</Diagnose> </Patient> <Patient> <Address> <Name>Daisy</Name> <City>Entenhausen</City> <Street>Am Stadtpark</Street> </Address> <Diagnosis>Sonnenbrand</Diagnosis> <Diagnosis>Migräne</Diagnosis> </Patient> </ Billing > Worzyk FH Anhalt Telemedizin WS 09/10 XML - 26 Queries to XML - Files • XPath • XQuery Worzyk FH Anhalt Telemedizin WS 09/10 XML - 27 XPath The language XPath serves to address parts of a XML document. It was designed for the use both in XSLT and in XPointer. XPath models a XML document as a tree, which consists of knots. http://www.informatik.hu-berlin.de/~obecker/obqo/w3c-trans/xpath-de-20010702/ Worzyk FH Anhalt Telemedizin WS 09/10 XML - 28 Example <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> Worzyk FH Anhalt</bookstore> Telemedizin WS 09/10 XML - 29 Queries with XPath Select all titles: /bookstore/book/title Select the title of the first book /bookstore/book[1]/title Select all the prices /bookstore/book/price/text() Select price nodes with price>35 http://www.w3schools.com/xpath/xpath_examples.asp /bookstore/book[price>35]/title Worzyk FH Anhalt Telemedizin WS 09/10 XML - 30 XQuery • Querylanguage for XML data • Uses Xpath expression • Analogy to SQL Worzyk FH Anhalt Telemedizin WS 09/10 XML - 31 Xquery Example <?xml version="1.0" encoding="ISO-8859-1"?> <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> Worzyk FH Anhalt </bib> Telemedizin WS 09/10 XML - 32 Xquery Example Query: doc("books.xml")/bib/book[price<50] results: <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 33 FLWOR For, Let, Where, Order by, Return for $x in doc("books.xml")/bib/book where $x/price>50 order by $x/title return $x/title Results: <title>Advanced Programming in the Unix environment</title> <title>TCP/IP Illustrated</title> <title>The Technology and Content for Digital TV</title> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 34 XML – Documents in Databases XML – Documents can be • Focussed on data • Focussed on text • Semi-structured Worzyk FH Anhalt Telemedizin WS 09/10 XML - 35 Alternatives to store XML Documents • Storage as a whole • Storage within the XML-Structure • Transformation to structures of the database Worzyk FH Anhalt Telemedizin WS 09/10 XML - 36 Storage of XML documents as a whole Original will be stored in a file system or as CLOB in a database full-text index Strukturindex Worzyk FH Anhalt Telemedizin WS 09/10 XML - 37 Example <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 38 full-text index Begriff Verweis hotel *** Warnemünde * Rostock * ort ** Worzyk FH Anhalt <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel> Telemedizin WS 09/10 XML - 39 full-text - and Structurindex Begriff Verweis Element Warnemünd e * * Seestrasse * * Rostock * * Element Worzyk FH Anhalt hotel Ver w eis * Ord Vor nung gänger 1 adresse * 2 * ort * 3 * strasse * 3 * anreise * bschreibung 2 * <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel> Telemedizin WS 09/10 XML - 40 Queries Volltextindex hotel AND warnemünde (hotel OR pension) AND (rostock OR warnemünde) Volletxt- und Strukturindex hotel.adresse.ort CONTAINS (“warnemünde“) AND hotel.freizeitmoeglichkeit CONTAINS (“swimming pool“) Worzyk FH Anhalt Telemedizin WS 09/10 XML - 41 Characteristics full-text index Description of Schema Not required Reconstruction of document Queries The document remains in the original form - Information Retrieval - SQL The evaluation of the structure is possible Document-centered applications further characteristics Use Worzyk FH Anhalt Telemedizin WS 09/10 XML - 42 generic storage Storage within the XML-Structure All Informationen of the XML-Dokument will be stored – simple generic Storage – Document Object Model Worzyk FH Anhalt Telemedizin WS 09/10 XML - 43 Beispiel DocID Element name h0001 hotel h0001 hotelname h0001 kategorie h0001 adresse h0001 plz h0001 ort ... Worzyk FH Anhalt ID 101 102 103 104 105 106 Vor gänger 101 101 101 104 104 DocID Attribut name h0001 url ID h0001 id ... 102 101 Ord nung 1 1 2 3 1 2 Wert Hotel Hübner 4 18119 Warnemünde Element Wert 101 101 http://www.hotelhuebner.de h0001 Telemedizin WS 09/10 XML - 44 Document Object Model The structure of the tree will be transformed to a class hierarchy Storage in objectrelational or objektoriented databases Worzyk FH Anhalt Telemedizin WS 09/10 XML - 45 Queries • XPath • QXuery • XQL – Abfragesprache der Software AG • SQL Worzyk FH Anhalt Telemedizin WS 09/10 XML - 46 Characteristics Generic Storage Description of Schema Not required Reconstruction of document possible, but expensive Queries further characteristics Use Worzyk FH Anhalt - XQuery, XQL - QL considers the storage structures Queries anb Updates possible w ith DOM for documents - Focussed on data - Focussed on text - Semi-structured Telemedizin WS 09/10 XML - 47 Transformation to Structures of databases DTD or Schema must be available Automatic or userdriven procedures Transformtion to relational objectrelational objectoriented Databases Worzyk FH Anhalt Telemedizin WS 09/10 XML - 48 Transformation XM L - Information Element Root - Element XM L - Element Sequence of Elementen Alternative of Elementen Element w ith Qualifizierer ? Element w ith Qualifizierer + or * komplex strukturiertes Element Attribut XM L - Attribut #IM PLIED #REQUIRED Defaultw ert Worzyk FH Anhalt Datenbankiformation Relation Attribut of a Relation Attribute of a Relation Attribute of a Relation Attribut, nullvalue possible SET oder LIST ROW Attributof a Relation Nullvalue not allow ed Nullvalue not allow ed Defaultvalue Telemedizin WS 09/10 XML - 49 Example Hotelname url Hotel Hübner id erstellt-am autor http:// h0001 03/02/2003 Hans M üller kate fax anreisebeschreibung gorie 4 0381 Aus Richtung Rostock id plz ort strasse nummer h0001 18119 Warnemünde Seestrass e 12 id telefon h0001 0381 / 5434 - 0 Worzyk FH Anhalt Ordnung 1 Telemedizin WS 09/10 XML - 50 Queries • SQL with – Joins – Aggregatfunktionen – Queryoptimizing – Update Worzyk FH Anhalt Telemedizin WS 09/10 XML - 51 Characteristics Structures of databases Description of Schema required Reconstruction of document Queries only partly possible further characteristics Keeps the order of elements w ith additional attributs For data-centered applications Use Worzyk FH Anhalt - SQL und XM L Telemedizin WS 09/10 XML - 52