XML Extensible Markup Language Worzyk FH Anhalt Telemedizin WS 09/10 XML - 1 1 XML • Metalanguage – A Language, which describes languages – Languages describe formats for data exchange Worzyk FH Anhalt Telemedizin WS 09/10 XML - 2 2 Example Hans Meyer Lohmannstrasse 23 06366 Köthen Dr. Else Müller Bernburger Strasse 56 06366 Köthen Worzyk FH Anhalt Telemedizin WS 09/10 XML - 3 Data will be transmitted from one hospital to another hospital. The character strings may be patient data or doctors data. The meaning of the data must been known by the recipient. 3 Example <Patient> <Name> <Strasse> <Ort> </Patient> <Arzt> <Name> <Strasse> <Ort> </Arzt> Worzyk FH Anhalt Hans Meyer Lohmannstrasse 23 06366 Köthen </Name> </Strasse> </Ort> </Name> Dr. Else Müller Bernburger Strasse 56 </Strasse> </Ort> 06366 Köthen Telemedizin WS 09/10 XML - 4 The XML tags will put meaning to the character strings. The XML tags appear normally in pairs: the opening tag, for example <Patient> and the appropriate closing tag </Patient> XML elements like <Patient> ... </Patient> may contain children which also must be valide XML elements 4 Structure of XML documents • Prolog – Deklaration of type of dokument – DTD (Document Type Definition) • Elements http://www.w3schools.com/xml/default.asp Worzyk FH Anhalt http://de.selfhtml.org/ Telemedizin WS 09/10 XML - 5 5 Document Type Definition DTD • It describes the grammar of a XML document • It describes permitted elements and attributes – their data type and range of values – their nesting • An XML – Dokument, that conforms to a DTD is called valid Worzyk FH Anhalt Telemedizin WS 09/10 XML - 6 6 Example DTD <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Personen [ <!ELEMENT Personen (Patient)> <!ELEMENT Patient (#PCDATA)> ]> <Personen> <Patient> Hans Meyer Lohmannstrasse 23 06366 Köthen </Patient> </Personen> Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten1.xml Telemedizin WS 09/10 XML - 7 7 Structure of XML documents • DTD describes the characteristics of the elements • Elements are initiated by a start tag <Elementname> and are terminated by a closing tag </Elementname>. • XML tags are case sensitive • Elements can contain Elements. • #PCDATA Parsed character data: The elements consist of character strings whose characters are part of the defined character set. Worzyk FH Anhalt Telemedizin WS 09/10 XML - 8 / heißt slash \ heißt backslash 8 Names of Elements • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces Worzyk FH Anhalt Telemedizin WS 09/10 XML - 9 9 Sequence of Elements Subordinate elements are separated in the declaration by commas and included in parentheses. Example: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Personen [ <!ELEMENT Personen (Patient,Arzt)> <!ELEMENT Patient (Name,Adresse)> <!ELEMENT Arzt (Name, Adresse)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Adresse (#PCDATA)> ]> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten2.xml Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten3.xml Telemedizin WS 09/10 XML - 10 10 selection list • Selection of exactly one element: The available elements are seperated by | • Example: <!DOCTYPE Personen [ <!ELEMENT Personen (Patient|Arzt)> <!ELEMENT Patient (Name,Adresse,Diagnose)> <!ELEMENT Arzt (Name, Adresse,Fachgebiet)> Worzyk FH Anhalt http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten4.xml Telemedizin WS 09/10 XML - 11 11 Multiple occurrence * The element can appear no time or arbitrarily often + The element can appear at least one time or arbitrarily often ? The element can appear no time or at most one time Worzyk FH Anhalt Telemedizin WS 09/10 XML - 12 12 Attributes <!ATTLIST element-name attribute-name attribute-type defaultvalue> Types of attriutes:: CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION, xml: Defaultvalue: value #REQUIRED, #IMPLIED, #FIXED value http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten5.xml Worzyk FH Anhalt http://www.w3schools.com/xml/xml_attributes.asp Datenbanksysteme 2 SS 2004 Seite 13 - 13 CDATA: The value is character data (en1|en2|..): The value must be one from an enumerated list ID: The value is a unique id IDREF: The value is the id of another element IDREFS: The value is a list of other ids NMTOKEN: The value is a valid XML name NMTOKENS: The value is a list of valid XML names ENTITY: The value is an entity ENTITIES: The value is a list of entities NOTATION: The value is a name of a notation xml: : The value is a predefined xml value Defaultwerte: Value: The default value of the attribute #REQUIRED: The attribute value must be included in the element #IMPLIED: The attribute does not have to be included #FIXED value: The attribute value is fixed 13 Comments Comments are embedded by <!– and --> <!-- This is a comment --> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 14 14 Well-formed XML - File • The file starts with the XML-declaration, which establish the reference to XML • It exists at least one data element • It exists exactly one root element, which contain all other data elements • All required attributes are defined • All elements have the right content • The elements must be nested properly Worzyk FH Anhalt Telemedizin WS 09/10 XML - 15 15 Valide XML - File • The file is well-formed • A DTD is assigned to the file • The content of the file is according to the assigned DTD Worzyk FH Anhalt Telemedizin WS 09/10 XML - 16 16 Parser A parser validates if an XML Document is valide: <html> <body> <script type="text/javascript"> var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.validateOnParse="true" xmlDoc.load("Patienten5.xml") document.write("<br />Error Code: ") document.write(xmlDoc.parseError.errorCode) document.write("<br />Error Reason: ") document.write(xmlDoc.parseError.reason) document.write("<br />Error Line: ") document.write(xmlDoc.parseError.line) </script> </body> </html> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Parser.htm Worzyk FH Anhalt <html> <body> Telemedizin WS 09/10 XML - 17 An HTML file will follow The file will start now <script type="text/javascript"> var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false„ The parser starts now xmlDoc.validateOnParse="true„ The parser also validates xmlDoc.load("Patienten5.xml") name of the file to be validated The javascript interpreter is called The parser is called document.write("<br />Error Code: ") Output Error Code document.write(xmlDoc.parseError.errorCode) document.write("<br />Error Reason: ") Error Reason document.write(xmlDoc.parseError.reason) document.write("<br />Error Line: ") end Error Line document.write(xmlDoc.parseError.line) </script> end of Javascript and </body> end of HTML data </html> end of THML file 17 DTD - Disadvantages • Few datatypes • specification not in XML – Syntax – Specification can not be validated with a parser Worzyk FH Anhalt Telemedizin WS 09/10 XML - 18 18 XML - Schema • • • • • • • An XML Schema: defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text • defines data types for elements and attributes • defines default and fixed values for elements and attributes Worzyk FH Anhalt http://www.w3schools.com/schema/schema_intro.asp Telemedizin WS 09/10 XML - 19 19 XML Schema Advantages over DTD • XML Schemas are extensible to future additions • XML Schemas are richer and more useful than DTDs • XML Schemas are written in XML • XML Schemas support data types – xs;date, xs;dateTime, xs:string • XML Schemas support namespaces – xmlns:xs="http://www.w3.org/2001/XMLSchema“ Worzyk FH Anhalt Telemedizin WS 09/10 XML - 20 20 Dublin Core Standard Dublin Core Metadata Initiative Conference in 1995 in Dublin / Ohio defined a set of describing attributs to categorize documents in the internet 15 core elements are recommended in „Dublin Core Metadata Element Set, Version 1.1 (ISO 15836)“ http://dublincore.org/documents/dces/ Worzyk FH Anhalt Telemedizin WS 09/10 XML - 21 21 How to create an XML structure • • • • Create a tree-structure of the data Convert that structure to a DTD Add data elements Test Worzyk FH Anhalt Telemedizin WS 09/10 XML - 22 22 Example Quarterly billing • One file consists of exactly one physician and at least one patient • A phyiscian is either a General Practitioner or a dentist • A general practitioner has an address and a profession • A dentist has an address • A patient has an address and no ore more diagnisis • An address consists of Name, City, Street • A name has a salutation Mr. or Ms. Worzyk FH Anhalt Telemedizin WS 09/10 XML - 23 23 Example Quarterly billing billing Patient + Physician General Practitioner | Dentist Address Profession ? Adresse Worzyk FH Anhalt Address Name Mr Ms City Diagnosis * Street Telemedizin WS 09/10 XML - 24 24 Example - DTD <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Billing [ <!ELEMENT Billing (Physician, Patient+)> <!ELEMENT Physician (General_Practitioner | Dentist)> <!ELEMENT General_Practitioner (Address, Profession?)> <!ELEMENT Dentist (Address)> <!ELEMENT Patient (Address, Diagnosis*)> <!ELEMENT Address (Name, City, Street)> <!ELEMENT Profession (#PCDATA)> <!ELEMENT Diagnosis (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT City (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST Name Salutation (Mr|Ms) "Ms"> ]> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 25 25 Example - Data < Billing > < Physician > < General_Practitioner > <Address> <Name>Dr. Erpel</Name> <City>Entenhausen</City> <Street>Am Krankenhaus 1</Street> </Address> < Profession >Geriatrics</ Profession > </ General_Practitioner > </ Physician > < Patient > <Address> <Name Anrede="Herr">Daniel</Name> <City>Entenhausen</City> <Street>Bahnhofstrasse 3a</Street> </Address> <Diagnose>Bettflucht</Diagnose> </Patient> <Patient> <Address> <Name>Daisy</Name> <City>Entenhausen</City> <Street>Am Stadtpark</Street> </Address> <Diagnosis>Sonnenbrand</Diagnosis> <Diagnosis>Migräne</Diagnosis> </Patient> </ Billing > Worzyk FH Anhalt Telemedizin WS 09/10 XML - 26 26 Queries to XML - Files • XPath • XQuery Worzyk FH Anhalt Telemedizin WS 09/10 XML - 27 27 XPath The language XPath serves to address parts of a XML document. It was designed for the use both in XSLT and in XPointer. XPath models a XML document as a tree, which consists of knots. http://www.informatik.hu-berlin.de/~obecker/obqo/w3c-trans/xpath-de-20010702/ Worzyk FH Anhalt Telemedizin WS 09/10 XML - 28 28 Example <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> Worzyk FH Anhalt </bookstore> Telemedizin WS 09/10 XML - 29 Spielfilm: feature films Film: movie Regie: direction Beschreibung: description 29 Queries with XPath Select all titles: /bookstore/book/title Select the title of the first book /bookstore/book[1]/title Select all the prices /bookstore/book/price/text() Select price nodes with price>35 http://www.w3schools.com/xpath/xpath_examples.asp /bookstore/book[price>35]/title Worzyk FH Anhalt Telemedizin WS 09/10 XML - 30 30 XQuery • Querylanguage for XML data • Uses Xpath expression • Analogy to SQL Worzyk FH Anhalt Telemedizin WS 09/10 XML - 31 31 Xquery Example <?xml version="1.0" encoding="ISO-8859-1"?> <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> Worzyk </bib> FH Anhalt Telemedizin WS 09/10 XML - 32 32 Xquery Example Query: doc("books.xml")/bib/book[price<50] results: <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 33 33 FLWOR For, Let, Where, Order by, Return for $x in doc("books.xml")/bib/book where $x/price>50 order by $x/title return $x/title Results: <title>Advanced Programming in the Unix environment</title> <title>TCP/IP Illustrated</title> <title>The Technology and Content for Digital TV</title> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 34 34 XML – Documents in Databases XML – Documents can be • Focussed on data • Focussed on text • Semi-structured Worzyk FH Anhalt Telemedizin WS 09/10 XML - 35 35 Alternatives to store XML Documents • Storage as a whole • Storage within the XML-Structure • Transformation to structures of the database Worzyk FH Anhalt Telemedizin WS 09/10 XML - 36 36 Storage of XML documents as a whole Original will be stored in a file system or as CLOB in a database full-text index Strukturindex Worzyk FH Anhalt Telemedizin WS 09/10 XML - 37 37 Example <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel> Worzyk FH Anhalt Telemedizin WS 09/10 XML - 38 38 full-text index Begriff Verweis hotel *** Warnemünde * Rostock * ort ** Worzyk FH Anhalt <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel> Telemedizin WS 09/10 XML - 39 39 full-text - and Structurindex Begriff Verweis Element Warnemünde * * Seestrasse * * Rostock * * Element Worzyk FH Anhalt hotel Ver weis * Ord Vor nung gänger 1 adresse * 2 * ort * 3 * strasse * 3 * anreise * bschreibung 2 * <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel> Telemedizin WS 09/10 XML - 40 40 Queries Volltextindex hotel AND warnemünde (hotel OR pension) AND (rostock OR warnemünde) Volletxt- und Strukturindex hotel.adresse.ort CONTAINS (“warnemünde“) AND hotel.freizeitmoeglichkeit CONTAINS (“swimming pool“) Worzyk FH Anhalt Telemedizin WS 09/10 XML - 41 41 Characteristics full-text index Description of Schema Not required Reconstruction of document Queries The document remains in the original form - Information Retrieval - SQL The evaluation of the structure is possible Document-centered applications further characteristics Use Worzyk FH Anhalt Telemedizin WS 09/10 XML - 42 42 generic storage Storage within the XMLStructure All Informationen of the XML-Dokument will be stored – simple generic Storage – Document Object Model Worzyk FH Anhalt Telemedizin WS 09/10 XML - 43 generisch = dem Sachverhalt angepasst 43 Beispiel DocID Element name h0001 hotel h0001 hotelname h0001 kategorie h0001 adresse h0001 plz h0001 ort ... DocID Attribut name h0001 url h0001 id ... Worzyk FH Anhalt ID 101 102 103 104 105 106 ID Vor gänger 101 101 101 104 104 Ord nung 1 1 2 3 1 2 Wert Hotel Hübner 4 18119 Warnemünde Element Wert 101 101 102 101 http://www.hotelhuebner.de h0001 Telemedizin WS 09/10 XML - 44 44 Document Object Model The structure of the tree will be transformed to a class hierarchy Storage in objectrelational or objektoriented databases Worzyk FH Anhalt Telemedizin WS 09/10 XML - 45 45 Queries • XPath • QXuery • XQL – Abfragesprache der Software AG • SQL Worzyk FH Anhalt Telemedizin WS 09/10 XML - 46 46 Characteristics Generic Storage Description of Schema Not required Reconstruction of document possible, but expensive Queries further characteristics Use Worzyk FH Anhalt - XQuery, XQL - QL considers the storage structures Queries anb Updates possible with DOM for documents - Focussed on data - Focussed on text - Semi-structured Telemedizin WS 09/10 XML - 47 47 Transformation to Structures of databases DTD or Schema must be available Automatic or userdriven procedures Transformtion to relational objectrelational objectoriented Databases Worzyk FH Anhalt Telemedizin WS 09/10 XML - 48 48 Transformation XML - Information Element Root - Element XML - Element Sequence of Elementen Alternative of Elementen Element with Qualifizierer ? Element with Qualifizierer + or * komplex strukturiertes Element Attribut XML - Attribut #IMPLIED #REQUIRED Defaultwert Worzyk FH Anhalt Datenbankiformation Relation Attribut of a Relation Attribute of a Relation Attribute of a Relation Attribut, nullvalue possible SET oder LIST ROW Attributof a Relation Nullvalue not allowed Nullvalue not allowed Defaultvalue Telemedizin WS 09/10 XML - 49 49 Example Hotelname url Hotel Hübner id erstellt-am autor http:// h0001 03/02/2003 Hans Müller kate fax anreisebeschreibung gorie 4 0381 Aus Richtung Rostock id plz ort strasse nummer h0001 18119 Warnemünde Seestrass e 12 id telefon h0001 0381 / 5434 - 0 Worzyk FH Anhalt Ordnung 1 Telemedizin WS 09/10 XML - 50 50 Queries • SQL with – – – – Worzyk FH Anhalt Joins Aggregatfunktionen Queryoptimizing Update Telemedizin WS 09/10 XML - 51 51 Characteristics Structures of databases Description of Schema required Reconstruction of document Queries only partly possible further characteristics Keeps the order of elements with additional attributs For data-centered applications Use Worzyk FH Anhalt - SQL und XML Telemedizin WS 09/10 XML - 52 52