A×B - Database Technology Group

Werbung
Hannes Voigt, Michael Rudolf, Marcus Paradies
Forschungskolleg –
Schema Flexibility and Graph Data
Management
© Prof. Dr.-Ing. Wolfgang Lehner |
> Example: Product Catalogs
?
Hannes Voigt |
Forschungskolleg
|
1
> Example: Social Network
Hannes Voigt |
Forschungskolleg
|
2
> Challenges
1
2
Management of
Management of
 Structured
 Schema-annotated
 multi-associative,
 cross-linked
data, if schema is not known in
advance
data, where association pattern is
not known in advance
Schema Flexibility
Graph Data Management
Hannes Voigt |
Forschungskolleg
|
3
>
Schema Flexibility
Hannes Voigt
[email protected]
INF3050
Hannes Voigt |
Forschungskolleg
|
4
> Schema Flexibility
Problem
?
Solution A: Alter the Data
Hannes Voigt |
Solution B: Alter the Data Storage
Forschungskolleg
|
5
Tenant 1
Tenant 2
PRODUCT
PRODUCT
Product
Name
Sensor
Product
Name
Screen
1
Camera 5I
12MP
1
TV 6000
42”
2
…
Camera 5II 18MP
Extension Table
• Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: S.B. Navathe (ed.) SIGMOD’85,
Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, Austin,
Texas, May
Virtual
Schemas
𝜎Tenant=2
𝜎Tenant
28-31,
1985, pp.=1
268–279. ACM Press (1985)
System Schema
Used for OR-Mapping, too
⋈
• Product,Tenant
Cabibbo, L., Carosi, A.: Managing inheritance hierarchies in object/relational mapping tools. In: O. Pastor, J.F.
e Cunha (eds.) CAiSE’05, Advanced Information Systems Engineering, 17th International Conference, CAiSE
2005, Porto, Portugal, June 13-17, 2005, Proceedings, Lecture
Notes in Computer Science, vol. 3520, pp.
⋈Product,Tenant
135–150. Springer (2005)
CAMERAEXTENSION
Tenant
Product Sensor
1
1
12MP
1
2
18MP
PRODUCT
TVEXTENSION
Tenant
Product Name
1
1
Camera 5I
2
1
TV 6000
Tenant
Product Screen
1
2
Camera 5II
2
1
3
1
Notebook X7
42”
Tenant 1
Tenant 2
PRODUCT
PRODUCT
Product
Name
Sensor
Product
Name
Screen
1
Camera 5I
12MP
1
TV 6000
42”
2
…
Camera 5II 18MP
Universal Table
• Maier, D., Ullman, J.D.: Maximal objects and the semantics of universal relation databases. ACM Transactions
on Database Systems 8(1), 1–14 (1983)
Virtual Schemas
𝜌Col1:Name,Col2:Sensor
𝜌Col1:Name,Col2:Screen
Used by Salesforce on force.com
System Schema
• Weissman, C.D., Bobrowski, S.: The design of the force.com multitenant internet application development
platform. In: U. C¸ etintemel, S.B. Zdonik, D. Kossmann, N. Tatbul (eds.) SIGMOD’09, Proceedings of the ACM
𝜎Tenant
𝜎Tenant
=1,Table=1
Table
=1 Rhode Island, USA,
SIGMOD International
Conference on Management of Data,
SIGMOD=2,
2009,
Providence,
June 29 - July 2, 2009, pp. 889–896. ACM (2009)
UNIVERSALTABLE
Tenant
Table
Col1
Col2
…
1
1
Camera 5I
12MP
…
2
1
TV 6000
32”
…
1
1
Camera 5II 18MP
3
2
…
…
Tenant 1
Tenant 2
PRODUCT
PRODUCT
Product
Name
Sensor
Product
Name
Screen
1
Camera 5I
12MP
1
TV 6000
42”
2
Camera 5II
Pivot Table
18MP
…
• Agrawal, R., Somani, A., Xu, Y.: Storage and querying of e-commerce data. In: P.M.G. Apers, P. Atzeni, S. Ceri,
S. Paraboschi, K. Ramamohanarao, R.T. Snodgrass (eds.) VLDB’01, Proceedings of 27th International
Virtual Schemas
𝜎Tenant
Conference
on
Very
Large
Data
Bases,
September
11-14,
2001,
Roma,
Italy,
pp.
149–158.
Morgan
Kaufmann
=1,Table=1
(2001)
System Schema
• Cunningham, C., Graefe, G., Galindo-Legaria, C.A.: Pivot and unpivot: Optimization and execution strategies
⋈Tenant
in an rdbms. In:M.A.
Nascimento,M.T.
O¨ zsu,
D. Kossmann, R.J.Miller, J.A. Blakeley, K.B. Schiefer (eds.)
,Table
,Row
VLDB’04, Proceedings of 30th International Conference on Very Large Data Bases, August 31 - September 3,
2004, Toronto, Canada, pp. 998–1009 (2004)
…
𝜌Value:Product
𝜌Value:Name
𝜌Value:Sensor
in clinical systems
𝜎ColUsed
𝜎Col=′Name′
=′Product′
𝜎Col=′Senosor′
• Friedman, C., Hripcsak, G., Johnson, S.B., Cimino, J.J., Clayton, P.D.: A generalized relational schema for an
integrated clinical patient database. In: R.A. Miller (ed.) SCAMC’90, Proceedings of the 14th Annual
Symposium on Computer Application inPMedical
Care, November 4-7, 1990, Washington DC, pp. 335–339
IVOTTABLE
(1990)
Tenant
Table
Row
Col
Value
1
1
1
Product
1
1
1
1
Name
Camera 5I
1
1
1
Sensor
12MP
2
1
2Forschungskolleg…
…
Tenant 1
Tenant 2
PRODUCT
PRODUCT
Product
Name
Sensor
Product
Name
Screen
1
Camera 5I
12MP
1
TV 6000
42”
2
Camera 5II
Chunk Table
18MP
…
• Aulbach, S., Grust, T., Jacobs, D., Kemper, A., Rittinger, J.: Multi-tenant databases for software as a service:
schema-mapping techniques. In: J.T.L. Wang (ed.) SIGMOD’08, Proceedings of the 2008 ACM SIGMOD
Virtual Schemas
𝜎Tenant
𝜎
International
Conference
on
Management
of
Data,
Vancouver,
British
Columbia,
Canada,
June
10-12,
2008.,
=1,Table=1
Tenant=2,Table=1
pp. 1195–1206. ACM (2008)
System Schema
…
⋈Tenant,Table,Chunk
𝜌Int:Product,Str:Name
⋈Tenant,Table,Chunk
𝜌Int:Sensor 𝜌Int:Screen
𝜎Chunk=2
𝜎Chunk=1
𝜌Int:Product,Str:Name
𝜎Chunk=2
𝜎Chunk=1
CHUNKTABLE
Tenant
Table
Row
Chunk
Int
Str
1
1
1
1
1
Camera 5I
1
1
1
2
12MP
%
2
1
1
1
1
TV 6000
2
1
1
2
42”
%
Tenant 1
Tenant 2
PRODUCT
PRODUCT
Product
Name
Sensor
Product
Name
Screen
1
X-Tablet
5MP
1
X-Tablet
11”
4
TV 6000
42”
2
3
Camera 5I 12MP
Polymorphic
Table18MP
Camera 5II
…
• Aulbach, S., Grust, T., Jacobs, D., Kemper, A., Rittinger, J.: Multi-tenant databases for software as a service:
schema-mapping techniques. In: J.T.L. Wang (ed.) SIGMOD’08, Proceedings of the 2008 ACM SIGMOD
Virtual Schemas
International Conference on Management of Data, Vancouver, British Columbia, Canada, June 10-12, 2008.,
pp.⋈
1195–1206. ACM (2008)
⋈
System Schema
Product
Product
TENANT1
∪
∪
∪
Product Sensor
1
5MP
SCREENEXTENSION
2
12MP
Product
Name
Product Screen
3
18MP
1
X-Tablet
1
PRODUCTBASE
PRODUCTBASETENANT1
11”
TENANT2
Product
Name
PRODUCTBASETENANT2
Product
SCREENEXTENSIONTENANT2
2
Camera 5I
Product
Name
1
Product
Screen
3
Camera 5II
4
TV 6000
4
4
42”
> Schema Flexibility
Problem
?
Solution A: Alter the Data
Hannes Voigt |
Solution B: Alter the Data Storage
Forschungskolleg
|
13
> How does the data look like?
Abstraction for Structured Data
A
AT
E
ET
Example
Person
Firstname Lastname
hv Hannes
Voigt
Hannes Voigt |
ET={Person}
AT={Firstname, Lastname}
E={hv}
A={Hannes, Voigt}
Forschungskolleg
|
14
> How does the data look like?
Abstraction for Structured Data
TA A
SD
AT
E
ST ET TE
Example
Person
Firstname Lastname
hv Hannes
Voigt
Hannes Voigt |
ET={Person}
AT={Firstname, Lastname}
E={hv}
A={Hannes, Voigt}
ST={(Person,Firstname), (Person,Lastname)}
TE={(Person,hv)}
SD={(hv,Hannes), (hv,Voigt)}
TA={(Firstname,Hannes), (Lastname,Voigt)}
Forschungskolleg
|
15
> How does the data look like?
Incompleteness
 Entities without an entity type
 Attribute types without an entity type
 Entity types without attributes
Ambiguity
 Entities with multiple entity types
 Attributes with multiple attribute types
 Attribute types with multiple entity types
Sparseness
 Entities instantiate attribute types that do not belong to their entity type
 Entities do not instantiate attribute types that belong to their entity type
Multivalence
 Entities instantiate an attribute multiple times
Hannes Voigt |
Forschungskolleg
|
16
> How does the storage look like?
Presenting Relation R : A×B




Example: CSV
R = {(a1,b11), (a1,b12), (a2,b2), (a3,b3), …}
Direct:
a1b11 a1b12 a2b2 a3b3 …
Grouped:
a1b11b12 a2b2 a3b3 …
Ordered:
a1a1a2a3… b11b12b2b3…
<A,B>
<A<B>>
[A][B]
Presenting R : A×B
and S : B×C
[AT]
Year,Make,Model,Description,Price
<E … > 1997,Ford,E350,"ac, abs, moon",3000.00
[A] 1999,Chevy,"Venture Extended Edition","",4900.00
1999,Chevy,"Venture Extended Edition, Very Large","",5000.00
Existing Storage Designs
Hannes Voigt |
Forschungskolleg
|
17
> How can we change the storage?
Example: Interpreted Record
 <ET <E <AT, A>>>
 Incompleteness 
 Ambiguity 
 Sparseness 
 Multivalence 
[Beckmann, J. L.; Halverson, A.; Krishnamurthy, R. & Naughton, J. F., Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage
Format, ICDE'06, Proceedings of the 22nd International Conference on Data Engineering, 3-8 April 2006, Atlanta, GA, USA, IEEE Computer Society, 2006, 58 ]
Hannes Voigt |
Forschungskolleg
|
18
> Thesis Topics
Performance Implication of Storage Design
 How does the Storage Design affect the performance of a database system?
Query Processing with variable Storage Design
 What are the effects of a configurable storage design on query processing?
Adaptive Storage Design
 Can a database system itself adapt its storage organization automatically?
Adaptive Data Rework
 Can a database system itself adapt the data automatically?
SQL and Schema Flexibility
 How can SQL deal with Incompleteness, Ambiguity, Sparseness, Multivalence?
Tagging in Relational DBMS
 How can tagging (Ambiguity) be integrated in relational database systems?
 What are the effects on Storage Design, Query Processing, SQL
Vectorized Relational DBMS
 How can vectors (Multivalence) be integrated in relational database systems?
 What are the effects on Storage Design, Query Processing, SQL
Hannes Voigt |
Forschungskolleg
|
19
> Thesis Topics
Data Irregularities
Incompleteness
Query Language
Ambiguity
Sparseness
Multivalence
SQL and Schema Flexibility
DBMS Architecture
Tagging
Data System
Adaptive Data Rework
Query Processing with
variable Storage Design
Performance Study
Access System
Adaptive Storage Design
Vectors
Storage System
Hannes Voigt |
Forschungskolleg
|
20
>
Graph Data Management
Michael Rudolf
[email protected]
Hannes Voigt |
Marcus Paradies
[email protected]
Forschungskolleg
|
21
Schema Flexibility and Graph Data Management
Hannes Voigt – [email protected]
Michael Rudolf – [email protected]
Marcus Paradies – [email protected]
© Prof. Dr.-Ing. Wolfgang Lehner |
Herunterladen