Seminar `Datenbanksysteme` im Rahmen des MSE - HSR-Wiki

Werbung
How to effectively store the history
of data in a relational DBMS
Database systems
MSE-Seminar
14.12.2008
© Raphael Gfeller, [email protected]
1
Agenda
• Time
– Definition
– Storing within a DMBS
– General problems
• History
– Motivation
– On Data warehouses, Online Analytical Processing system, OLAP systems
• Common patterns available: slowly changing dimensions, SDC
– On Online transaction Processing system, OLTP systems
• No common patterns available
• Analyzed data models
– Performed tests
– Results
– Conclusion
• Questions
• References
14.12.2008
© Raphael Gfeller, [email protected]
2
Time is defined as
one second:
“the duration of 9 192 631 770 periods of the
radiation corresponding to the transition
between the two hyperfine levels of the
ground state of the caesium 133 atom.”
14.12.2008
© Raphael Gfeller, [email protected]
3
Storing within a DMBS
• a date
is represented by an offset with a defined
accuracy on a reference point
• an interval
is represented by a value with a defined accuracy
• a duration
– is represented by a composite value of two dates or
– a date value and an interval value
14.12.2008
© Raphael Gfeller, [email protected]
4
Storing within a DMBS \ Example
Value to store:
2008-12-02 10:00
Used date type:
smalldatetime (SQL 2), resolution 1 min,
reference point: 1900-01-01, based on the
Gregorian calendar
Internal stored Value:
57286680
(„Value to store“ - reference point [min])
14.12.2008
© Raphael Gfeller, [email protected]
5
Storing within a DMBS \ Date Types
Available date types on Microsoft SQL Server 2008
14.12.2008
© Raphael Gfeller, [email protected]
6
Time \ General Problems
•
•
•
•
Different time zones
Different implementation of data types
Different calendars
Time synchronization
– local
– networked
• Summer, winter time problematic
• Choosing the date type
– Resolution is to small
– Resolution is to high
– Range is to small
14.12.2008
© Raphael Gfeller, [email protected]
7
History \ Motivation
Why do we need to know the history of our data?
For example for:
• Legal requirements
– A bank has to know at each time what the exact balance of the customer was
– A internet provider has to be able to store all traffic from a user over a given
time
• Business requirements
– A version control has to be able to manage multiple revisions of the same unit
of information
– A customer relationship management (CRM) software has to be able to
present the volume of sales of a costumer over time
• Entertainment requirements
– A chat program has to be able to present the conversation between two
person over time
• Other requirements
– The “Time Machine” function in the Mac OS X has to be able to go back in
time for locating older version of your files
14.12.2008
© Raphael Gfeller, [email protected]
8
Main approaches to store data
Data warehouses (Online Analytical Processing system, OLAP)
• Designed for
– Reporting
– Analysis
– Speed of data retrieval
•
Uses to following approaches
–
data are stored denormalised based on a dimension-based model (logical data grouped
together)
• Include often business intelligence tools to retrieve and analyze data
• History
– Common pattern exits for storing historical data (Slow Changing Dimensions (SCD))
Online transaction Processing system, OLTP
• Designed for
– Perform day-to-day transaction processing
– Preservation of data integrity
– Speed of recording of business transactions
•
Uses to following approaches
– Database normalization
•
Codd rules of data normalization
– Entity-relationship model
•
History
–
14.12.2008
No common pattern available for storing historical data
© Raphael Gfeller, [email protected]
9
History \ Slow Changing Dimensions [0/2]
Type 0: an attribute of a dimension is fixed, no history available, is not
frequently used yet
Type 1: overwrites the old data with the new data, no history available
Type 2: tracks historical data by creating multiple records with a
separated key, unlimited history is possible
Type 3: additional columns in the tables track changes, limited history
is available
Type 4: creates separate historical tables that stores the historical data
Type 6: is a hybrid approach that combines SCD 1, 2 and 3 , unlimited
history is possible, is not frequently used yet
 Type 1,2 and 3 are the most common
14.12.2008
© Raphael Gfeller, [email protected]
10
History \ Slow Changing Dimensions [1/2]
SCD 1, overwrites the old data with the new data, no history
available
Results in
SCD 2, tracks historical data by creating multiple records with a
separated key, unlimited history is possible
14.12.2008
© Raphael Gfeller, [email protected]
11
History \ Slow Changing Dimensions [2/2]
SCD 3, Additional columns in the tables track changes, limited
history is available
SCD 4, creates separate historical tables that stores the historical
data
14.12.2008
© Raphael Gfeller, [email protected]
12
History \ OnLine Transaction Processing system, OLTP
No common patterns available for storing historical data
 Common used rational data models are analyzed
(all based on SCD type 2 (unlimited history is possible))
–
–
–
–
Method “Duplication”
Method “Transaction”
Method “Linked history items”
Method “Bidirectional linked history items”
Based on the following rational data model:
14.12.2008
© Raphael Gfeller, [email protected]
13
History \ OLTP \ Duplication
Method “Duplication”
Focused on
• Fast access of historical data
• Easy implementation
• Data integrity
ID
Name
Salary
ID_Change
ID_Company
ID
DateTime
1
Gfeller Raphael
1000
1
2
1
2008-12-2
1
Gfeller Raphael
2500
2
1
2
2008-12-3
2
Hans Meier
500
1
1
2
Hans Meier
550
2
1
3
Fritz Müller
1750
2
2
14.12.2008
ID
Name
ID_Change
1
UBS
1
2
HSR
1
1
UBS
2
2
HSR
2
© Raphael
Gfeller,
[email protected]
14
History \ OLTP \ Transaction
Method “Transaction”
Focused on
• Less used data storage
• Precious information about
history at every point on
time
ID
Name
Salary
ID_Company
1
Gfeller Raphael
2500
1
2
Hans Meier
500
1
3
Fritz Müller
1750
2
ID
Name
2
HSR
1
ID
DateTime
Entry ID
NewStringValue
Action
2
2008-12-2
1
Gfeller Raphael
Person.SetName
1
2008-12-1
1
Gfeller Raphal, 2500, 1
Person.Create
UBS
14.12.2008
© Raphael Gfeller, [email protected]
15
History \ OLTP \ Linked history items
Method “Linked history items”
Focused on
• Avoid huge changes to the underlying database
• Easy to implement
• Fast insertion of new entries
ID
Name
Salary
ID_Company
FK_Old_ID
1
Gfeller
Raphael
2500
1
5
2
Hans Meier
500
1
NULL
3
Gfeller
Raphael
1500
1
NULL
4
Gfeller
Raphael
1000
2
3
5
Fritz Müller
1750
2
NULL
14.12.2008
© Raphael Gfeller, [email protected]
16
History \ OLTP \ Bidirectional Linked history items
Method “Bidirectional Linked history
items”
Focused on
• Avoid huge changes to the
underlying database
• Extendibility by adding additional
metadata to the separated table
• Fast insertion of new entries
• Providing additional backward and
forward navigation
Old Person
New Person
DateTime
User
3
4
2008-12-3
User 1
4
1
2008-12-4
User 2
14.12.2008
ID
Name
Salary
ID_Company
1
Gfeller
Raphael
2500
1
2
Hans Meier
500
1
3
Gfeller
Raphael
1500
1
4
Gfeller
Raphael
1000
2
5
Fritz Müller
1750
2
© Raphael Gfeller, [email protected]
17
History \ OLTP \ Analyzed criteria’s
Analyzing criteria's
• Insert an entry
• Updating an entry
• Storage cost
• Get an entry at (Time – 1)
• Get en entry at (Time – n)
• Entry at time x
• Get an integrity state over all entries
• Get the next entry by a entry at the past
• Get the previous entry by a entry at the past
• A person by a company at the past
14.12.2008
© Raphael Gfeller, [email protected]
18
History \ OLTP \ Theory
14.12.2008
© Raphael Gfeller, [email protected]
19
History \ OLTP \ Test
Test environment
CPU: Intel Core 2, 2Ghz
Memory: 2 Gb
Operating System: Windows XP, Sp3,
Database: Microsoft SQL Server 2005, Express Edition with SP1,
Benchmark written in C#
Benchmark input
• Count inserted companies
• Count inserted persons
• Count companies to change
• Count persons to change
Benchmark steps
0. Insert companies
1. Insert persons
2. Change companies
3. Change persons
4. Find a person by its parent person
5. Collect all persons and companies that are valid at a specific time.
6. Find a person in the past by a datetime value
7. Find a person by a company by a datetime value.
14.12.2008
© Raphael Gfeller, [email protected]
20
History \ OLTP \ Results
The measurements confirm the theory
Possible optimizations
• Method Change Set based on Duplication
– Only changed entries are duplicated
 acceptable overhead in reading
 fewer data storage used
• Method “Transaction with anchors “
– Using “anchors transaction”, they resave the entire state of the entries
 Fewer network traffic
 Restoring an entry becomes linear
O(maxChangesBetweenTwoAnchors) instead of O(nChanges)
14.12.2008
© Raphael Gfeller, [email protected]
21
History \ OLTP \ Result \ Conclusion
Advices based on these tests
•
If storage is limited use the methods in the following order:
–
–
–
–
–
–
•
If network bandwidth is limited, use the methods in the following order:
–
–
–
–
–
–
•
transaction mechanism
Linked history items
Bidirectional linked history items
Transaction with anchors
Change Set based on Duplication
Duplication
Change Set based on Duplication
Duplication
Linked history items
Bidirectional Linked history items
Transaction with anchors
transaction mechanism
If the knowledge of the developers is low
•
use either method duplication or Linked history items
14.12.2008
© Raphael Gfeller, [email protected]
22
History \ OLTP \ Result \ Conclusion
Advices based on these tests
•
If data volume is high, use the methods in the following order:
–
–
–
–
–
–
•
transaction mechanism
Linked history items
Bidirectional linked history items
Transaction with anchors
Change Set based on Duplication
Duplication
If change frequency of the data is high, use the methods in the following order:
–
–
–
–
–
–
Transaction with anchors
transaction mechanism
Linked history items
Bidirectional linked history items
Change Set based on Duplication
Duplication
14.12.2008
© Raphael Gfeller, [email protected]
23
History \ OLTP \ Result \ Conclusion
Practical example of the usage of the methods
14.12.2008
© Raphael Gfeller, [email protected]
24
Questions?
14.12.2008
© Raphael Gfeller, [email protected]
25
References
• Gfeller Raphael, How to effectively store the history of data in a ration
DBMS, [Online] 2.12.2008. [Citied 2.12.2009]
http://wiki.hsr.ch/Datenbanken/wiki.cgi?HistorieUndZeitInDatenbanken.
14.12.2008
© Raphael Gfeller, [email protected]
26
Herunterladen