XML

Werbung
XML
Extensible Markup Language
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 1
1
XML
• Metalanguage
– A Language, which describes languages
– Languages describe formats for data
exchange
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 2
2
Example
Hans Meyer
Lohmannstrasse 23
06366 Köthen
Dr. Else Müller
Bernburger Strasse 56
06366 Köthen
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 3
Data will be transmitted from one hospital to another hospital. The character
strings may be patient data or doctors data. The meaning of the data must been
known by the recipient.
3
Example
<Patient>
<Name>
<Strasse>
<Ort>
</Patient>
<Arzt>
<Name>
<Strasse>
<Ort>
</Arzt>
Worzyk
FH Anhalt
Hans Meyer
Lohmannstrasse 23
06366 Köthen
</Name>
</Strasse>
</Ort>
</Name>
Dr. Else Müller
Bernburger Strasse 56 </Strasse>
</Ort>
06366 Köthen
Telemedizin WS 09/10
XML - 4
The XML tags will put meaning to the character strings. The XML tags appear
normally in pairs: the opening tag, for example <Patient> and the appropriate
closing tag </Patient>
XML elements like <Patient> ... </Patient> may contain children which also
must be valide XML elements
4
Structure of XML documents
• Prolog
– Deklaration of type of dokument
– DTD (Document Type Definition)
• Elements
http://www.w3schools.com/xml/default.asp
Worzyk
FH Anhalt
http://de.selfhtml.org/
Telemedizin WS 09/10
XML - 5
5
Document Type Definition
DTD
• It describes the grammar of a XML document
• It describes permitted elements and
attributes
– their data type and range of values
– their nesting
• An XML – Dokument, that conforms to
a DTD is called valid
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 6
6
Example DTD
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Personen [
<!ELEMENT Personen (Patient)>
<!ELEMENT Patient (#PCDATA)>
]>
<Personen>
<Patient>
Hans Meyer
Lohmannstrasse 23
06366 Köthen
</Patient>
</Personen>
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten1.xml
Telemedizin WS 09/10
XML - 7
7
Structure of XML documents
• DTD describes the characteristics of the
elements
• Elements are initiated by a start tag
<Elementname> and are terminated by a
closing tag </Elementname>.
• XML tags are case sensitive
• Elements can contain Elements.
• #PCDATA Parsed character data: The
elements consist of character strings whose
characters are part of the defined character
set.
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 8
/ heißt slash
\ heißt backslash
8
Names of Elements
• Names can contain letters, numbers,
and other characters
• Names must not start with a number or
punctuation character
• Names must not start with the letters
xml (or XML or Xml ..)
• Names cannot contain spaces
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 9
9
Sequence of Elements
Subordinate elements are separated in the declaration
by commas and included in parentheses.
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Personen [
<!ELEMENT Personen (Patient,Arzt)>
<!ELEMENT Patient (Name,Adresse)>
<!ELEMENT Arzt (Name, Adresse)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Adresse (#PCDATA)>
]>
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten2.xml
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten3.xml
Telemedizin WS 09/10
XML - 10
10
selection list
• Selection of exactly one element: The
available elements are seperated by |
• Example:
<!DOCTYPE Personen [
<!ELEMENT Personen (Patient|Arzt)>
<!ELEMENT Patient (Name,Adresse,Diagnose)>
<!ELEMENT Arzt (Name, Adresse,Fachgebiet)>
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten4.xml
Telemedizin WS 09/10
XML - 11
11
Multiple occurrence
* The element can appear no time or
arbitrarily often
+ The element can appear at least one
time or arbitrarily often
? The element can appear no time or at
most one time
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 12
12
Attributes
<!ATTLIST element-name attribute-name attribute-type defaultvalue>
Types of attriutes::
CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN, NMTOKENS,
ENTITY, ENTITIES, NOTATION, xml:
Defaultvalue:
value
#REQUIRED, #IMPLIED, #FIXED value
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten5.xml
Worzyk
FH Anhalt
http://www.w3schools.com/xml/xml_attributes.asp
Datenbanksysteme 2 SS 2004
Seite 13 - 13
CDATA: The value is character data
(en1|en2|..): The value must be one from an enumerated list
ID: The value is a unique id
IDREF: The value is the id of another element
IDREFS: The value is a list of other ids
NMTOKEN: The value is a valid XML name
NMTOKENS: The value is a list of valid XML names
ENTITY: The value is an entity
ENTITIES: The value is a list of entities
NOTATION: The value is a name of a notation
xml: : The value is a predefined xml value
Defaultwerte:
Value: The default value of the attribute
#REQUIRED: The attribute value must be included in the element
#IMPLIED: The attribute does not have to be included
#FIXED value: The attribute value is fixed
13
Comments
Comments are embedded by
<!– and -->
<!-- This is a comment -->
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 14
14
Well-formed XML - File
• The file starts with the XML-declaration,
which establish the reference to XML
• It exists at least one data element
• It exists exactly one root element,
which contain all other data elements
• All required attributes are defined
• All elements have the right content
• The elements must be nested properly
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 15
15
Valide XML - File
• The file is well-formed
• A DTD is assigned to the file
• The content of the file is according to
the assigned DTD
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 16
16
Parser
A parser validates if an XML Document is valide:
<html>
<body>
<script type="text/javascript">
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.validateOnParse="true"
xmlDoc.load("Patienten5.xml")
document.write("<br />Error Code: ")
document.write(xmlDoc.parseError.errorCode)
document.write("<br />Error Reason: ")
document.write(xmlDoc.parseError.reason)
document.write("<br />Error Line: ")
document.write(xmlDoc.parseError.line)
</script>
</body>
</html>
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Parser.htm
Worzyk
FH Anhalt
<html>
<body>
Telemedizin WS 09/10
XML - 17
An HTML file will follow
The file will start now
<script type="text/javascript">
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false„
The parser starts now
xmlDoc.validateOnParse="true„
The parser also validates
xmlDoc.load("Patienten5.xml")
name of the file to be validated
The javascript interpreter is called
The parser is called
document.write("<br />Error Code: ")
Output Error Code
document.write(xmlDoc.parseError.errorCode)
document.write("<br />Error Reason: ")
Error Reason
document.write(xmlDoc.parseError.reason)
document.write("<br />Error Line: ")
end Error Line
document.write(xmlDoc.parseError.line)
</script>
end of Javascript
and
</body>
end of HTML data
</html>
end of THML file
17
DTD - Disadvantages
• Few datatypes
• specification not in XML – Syntax
– Specification can not be validated with a
parser
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 18
18
XML - Schema
•
•
•
•
•
•
•
An XML Schema:
defines elements that can appear in a document
defines attributes that can appear in a document
defines which elements are child elements
defines the order of child elements
defines the number of child elements
defines whether an element is empty or can include
text
• defines data types for elements and attributes
• defines default and fixed values for elements and
attributes
Worzyk
FH Anhalt
http://www.w3schools.com/schema/schema_intro.asp
Telemedizin WS 09/10
XML - 19
19
XML Schema
Advantages over DTD
• XML Schemas are extensible to future
additions
• XML Schemas are richer and more useful
than DTDs
• XML Schemas are written in XML
• XML Schemas support data types
– xs;date, xs;dateTime, xs:string
• XML Schemas support namespaces
– xmlns:xs="http://www.w3.org/2001/XMLSchema“
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 20
20
Dublin Core Standard
Dublin Core Metadata Initiative
Conference in 1995 in Dublin / Ohio
defined a set of describing attributs to
categorize documents in the internet
15 core elements are recommended in
„Dublin Core Metadata Element Set,
Version 1.1 (ISO 15836)“
http://dublincore.org/documents/dces/
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 21
21
How to create
an XML structure
•
•
•
•
Create a tree-structure of the data
Convert that structure to a DTD
Add data elements
Test
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 22
22
Example
Quarterly billing
• One file consists of exactly one physician and
at least one patient
• A phyiscian is either a General Practitioner or
a dentist
• A general practitioner has an address and a
profession
• A dentist has an address
• A patient has an address and no ore more
diagnisis
• An address consists of Name, City, Street
• A name has a salutation Mr. or Ms.
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 23
23
Example
Quarterly billing
billing
Patient +
Physician
General Practitioner | Dentist
Address Profession ? Adresse
Worzyk
FH Anhalt
Address
Name
Mr
Ms
City
Diagnosis *
Street
Telemedizin WS 09/10
XML - 24
24
Example - DTD
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Billing [
<!ELEMENT Billing (Physician, Patient+)>
<!ELEMENT Physician (General_Practitioner | Dentist)>
<!ELEMENT General_Practitioner (Address, Profession?)>
<!ELEMENT Dentist (Address)>
<!ELEMENT Patient (Address, Diagnosis*)>
<!ELEMENT Address (Name, City, Street)>
<!ELEMENT Profession (#PCDATA)>
<!ELEMENT Diagnosis (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT City (#PCDATA)>
<!ELEMENT Street (#PCDATA)>
<!ATTLIST Name Salutation (Mr|Ms) "Ms">
]>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 25
25
Example - Data
< Billing >
< Physician >
< General_Practitioner >
<Address>
<Name>Dr. Erpel</Name>
<City>Entenhausen</City>
<Street>Am Krankenhaus 1</Street>
</Address>
< Profession >Geriatrics</ Profession >
</ General_Practitioner >
</ Physician >
< Patient >
<Address>
<Name Anrede="Herr">Daniel</Name>
<City>Entenhausen</City>
<Street>Bahnhofstrasse 3a</Street>
</Address>
<Diagnose>Bettflucht</Diagnose>
</Patient>
<Patient>
<Address>
<Name>Daisy</Name>
<City>Entenhausen</City>
<Street>Am Stadtpark</Street>
</Address>
<Diagnosis>Sonnenbrand</Diagnosis>
<Diagnosis>Migräne</Diagnosis>
</Patient>
</ Billing >
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 26
26
Queries to
XML - Files
• XPath
• XQuery
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 27
27
XPath
The language XPath serves to address
parts of a XML document.
It was designed for the use both in XSLT
and in XPointer.
XPath models a XML document as a tree,
which consists of knots.
http://www.informatik.hu-berlin.de/~obecker/obqo/w3c-trans/xpath-de-20010702/
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 28
28
Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
Worzyk
FH Anhalt
</bookstore>
Telemedizin WS 09/10
XML - 29
Spielfilm: feature films
Film: movie
Regie: direction
Beschreibung: description
29
Queries with XPath
Select all titles:
/bookstore/book/title
Select the title of the first book
/bookstore/book[1]/title
Select all the prices
/bookstore/book/price/text()
Select
price nodes with price>35
http://www.w3schools.com/xpath/xpath_examples.asp
/bookstore/book[price>35]/title
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 30
30
XQuery
• Querylanguage for XML data
• Uses Xpath expression
• Analogy to SQL
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 31
31
Xquery Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
<book year="1999">
<title>The Technology and Content for Digital TV</title>
<editor>
<last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation>
</editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>
Worzyk
</bib>
FH Anhalt
Telemedizin WS 09/10
XML - 32
32
Xquery Example
Query:
doc("books.xml")/bib/book[price<50]
results:
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 33
33
FLWOR
For, Let, Where, Order by, Return
for $x in doc("books.xml")/bib/book
where $x/price>50
order by $x/title
return $x/title
Results:
<title>Advanced Programming in the Unix environment</title>
<title>TCP/IP Illustrated</title>
<title>The Technology and Content for Digital TV</title>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 34
34
XML – Documents
in Databases
XML – Documents can be
• Focussed on data
• Focussed on text
• Semi-structured
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 35
35
Alternatives to store
XML Documents
• Storage as a whole
• Storage within the XML-Structure
• Transformation to structures of the
database
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 36
36
Storage of XML documents
as a whole
Original will be stored in a file system or
as CLOB in a database
full-text index
Strukturindex
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 37
37
Example
<hotel
url=“http://www.hotel-huebner.de“
id=“h0001“
erstellt-am=“03/02/2003“
Autor=“Hans Müller“>
<hotelname>Hotel Hübner</hotelname>
<kategorie>4</kategorie>
<adresse>
<plz>18199</plz>
<ort>Warnemünde</ort>
<strasse>Seestraße</strasse>
</adresse>
<telefon>0381 / 5434-0</telefon>
<fax> 0381 / 5434-444</fax>
<anreisebeschreibung>Aus Richtung
Rostock kommend ...
</anreisebeschreibung>
</hotel>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 38
38
full-text index
Begriff
Verweis
hotel
***
Warnemünde *
Rostock
*
ort
**
Worzyk
FH Anhalt
<hotel
url=“http://www.hotel-huebner.de“
id=“h0001“
erstellt-am=“03/02/2003“
Autor=“Hans Müller“>
<hotelname>Hotel Hübner</hotelname>
<kategorie>4</kategorie>
<adresse>
<plz>18199</plz>
<ort>Warnemünde</ort>
<strasse>Seestraße</strasse>
</adresse>
<telefon>0381 / 5434-0</telefon>
<fax> 0381 / 5434-444</fax>
<anreisebeschreibung>Aus Richtung
Rostock kommend ...
</anreisebeschreibung>
</hotel>
Telemedizin WS 09/10
XML - 39
39
full-text - and
Structurindex
Begriff
Verweis
Element
Warnemünde *
*
Seestrasse
*
*
Rostock
*
*
Element
Worzyk
FH Anhalt
hotel
Ver
weis
*
Ord
Vor
nung gänger
1
adresse
*
2
*
ort
*
3
*
strasse
*
3
*
anreise
*
bschreibung
2
*
<hotel
url=“http://www.hotel-huebner.de“
id=“h0001“
erstellt-am=“03/02/2003“
Autor=“Hans Müller“>
<hotelname>Hotel Hübner</hotelname>
<kategorie>4</kategorie>
<adresse>
<plz>18199</plz>
<ort>Warnemünde</ort>
<strasse>Seestraße</strasse>
</adresse>
<telefon>0381 / 5434-0</telefon>
<fax> 0381 / 5434-444</fax>
<anreisebeschreibung>Aus Richtung
Rostock kommend ...
</anreisebeschreibung>
</hotel>
Telemedizin WS 09/10
XML - 40
40
Queries
Volltextindex
hotel AND warnemünde
(hotel OR pension) AND (rostock OR warnemünde)
Volletxt- und Strukturindex
hotel.adresse.ort CONTAINS (“warnemünde“) AND
hotel.freizeitmoeglichkeit CONTAINS
(“swimming pool“)
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 41
41
Characteristics
full-text index
Description of Schema
Not required
Reconstruction of
document
Queries
The document remains in
the original form
- Information Retrieval
- SQL
The evaluation of the
structure is possible
Document-centered
applications
further characteristics
Use
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 42
42
generic storage
Storage within the XMLStructure
All Informationen of the XML-Dokument
will be stored
– simple generic Storage
– Document Object Model
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 43
generisch = dem Sachverhalt angepasst
43
Beispiel
DocID Element
name
h0001 hotel
h0001 hotelname
h0001 kategorie
h0001 adresse
h0001 plz
h0001 ort
...
DocID Attribut
name
h0001 url
h0001 id
...
Worzyk
FH Anhalt
ID
101
102
103
104
105
106
ID
Vor
gänger
101
101
101
104
104
Ord
nung
1
1
2
3
1
2
Wert
Hotel Hübner
4
18119
Warnemünde
Element Wert
101 101
102 101
http://www.hotelhuebner.de
h0001
Telemedizin WS 09/10
XML - 44
44
Document Object Model
The structure of the tree will be
transformed to a class hierarchy
Storage in objectrelational or
objektoriented databases
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 45
45
Queries
• XPath
• QXuery
• XQL
– Abfragesprache der Software AG
• SQL
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 46
46
Characteristics
Generic Storage
Description of Schema
Not required
Reconstruction of document possible, but expensive
Queries
further characteristics
Use
Worzyk
FH Anhalt
- XQuery, XQL
- QL considers the storage
structures
Queries anb Updates possible with
DOM
for documents
- Focussed on data
- Focussed on text
- Semi-structured
Telemedizin WS 09/10
XML - 47
47
Transformation to
Structures of databases
DTD or Schema must be available
Automatic or userdriven procedures
Transformtion to
relational
objectrelational
objectoriented
Databases
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 48
48
Transformation
XML - Information
Element Root - Element
XML - Element
Sequence of Elementen
Alternative of Elementen
Element with Qualifizierer ?
Element with Qualifizierer +
or *
komplex strukturiertes
Element
Attribut XML - Attribut
#IMPLIED
#REQUIRED
Defaultwert
Worzyk
FH Anhalt
Datenbankiformation
Relation
Attribut of a Relation
Attribute of a Relation
Attribute of a Relation
Attribut, nullvalue possible
SET oder LIST
ROW
Attributof a Relation
Nullvalue not allowed
Nullvalue not allowed
Defaultvalue
Telemedizin WS 09/10
XML - 49
49
Example
Hotelname url
Hotel
Hübner
id
erstellt-am
autor
http:// h0001 03/02/2003 Hans
Müller
kate fax
anreisebeschreibung
gorie
4
0381 Aus Richtung
Rostock
id
plz
ort
strasse
nummer
h0001
18119
Warnemünde
Seestrass
e
12
id
telefon
h0001 0381 / 5434 - 0
Worzyk
FH Anhalt
Ordnung
1
Telemedizin WS 09/10
XML - 50
50
Queries
• SQL with
–
–
–
–
Worzyk
FH Anhalt
Joins
Aggregatfunktionen
Queryoptimizing
Update
Telemedizin WS 09/10
XML - 51
51
Characteristics
Structures of databases
Description of Schema
required
Reconstruction of
document
Queries
only partly possible
further characteristics
Keeps the order of elements with
additional attributs
For data-centered applications
Use
Worzyk
FH Anhalt
- SQL und XML
Telemedizin WS 09/10
XML - 52
52
Herunterladen