Hannes Voigt, Michael Rudolf, Marcus Paradies Forschungskolleg – Schema Flexibility and Graph Data Management © Prof. Dr.-Ing. Wolfgang Lehner | > Example: Product Catalogs ? Hannes Voigt | Forschungskolleg | 1 > Example: Social Network Hannes Voigt | Forschungskolleg | 2 > Challenges 1 2 Management of Management of Structured Schema-annotated multi-associative, cross-linked data, if schema is not known in advance data, where association pattern is not known in advance Schema Flexibility Graph Data Management Hannes Voigt | Forschungskolleg | 3 > Schema Flexibility Hannes Voigt [email protected] INF3050 Hannes Voigt | Forschungskolleg | 4 > Schema Flexibility Problem ? Solution A: Alter the Data Hannes Voigt | Solution B: Alter the Data Storage Forschungskolleg | 5 Tenant 1 Tenant 2 PRODUCT PRODUCT Product Name Sensor Product Name Screen 1 Camera 5I 12MP 1 TV 6000 42” 2 … Camera 5II 18MP Extension Table • Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: S.B. Navathe (ed.) SIGMOD’85, Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, Austin, Texas, May Virtual Schemas 𝜎Tenant=2 𝜎Tenant 28-31, 1985, pp.=1 268–279. ACM Press (1985) System Schema Used for OR-Mapping, too ⋈ • Product,Tenant Cabibbo, L., Carosi, A.: Managing inheritance hierarchies in object/relational mapping tools. In: O. Pastor, J.F. e Cunha (eds.) CAiSE’05, Advanced Information Systems Engineering, 17th International Conference, CAiSE 2005, Porto, Portugal, June 13-17, 2005, Proceedings, Lecture Notes in Computer Science, vol. 3520, pp. ⋈Product,Tenant 135–150. Springer (2005) CAMERAEXTENSION Tenant Product Sensor 1 1 12MP 1 2 18MP PRODUCT TVEXTENSION Tenant Product Name 1 1 Camera 5I 2 1 TV 6000 Tenant Product Screen 1 2 Camera 5II 2 1 3 1 Notebook X7 42” Tenant 1 Tenant 2 PRODUCT PRODUCT Product Name Sensor Product Name Screen 1 Camera 5I 12MP 1 TV 6000 42” 2 … Camera 5II 18MP Universal Table • Maier, D., Ullman, J.D.: Maximal objects and the semantics of universal relation databases. ACM Transactions on Database Systems 8(1), 1–14 (1983) Virtual Schemas 𝜌Col1:Name,Col2:Sensor 𝜌Col1:Name,Col2:Screen Used by Salesforce on force.com System Schema • Weissman, C.D., Bobrowski, S.: The design of the force.com multitenant internet application development platform. In: U. C¸ etintemel, S.B. Zdonik, D. Kossmann, N. Tatbul (eds.) SIGMOD’09, Proceedings of the ACM 𝜎Tenant 𝜎Tenant =1,Table=1 Table =1 Rhode Island, USA, SIGMOD International Conference on Management of Data, SIGMOD=2, 2009, Providence, June 29 - July 2, 2009, pp. 889–896. ACM (2009) UNIVERSALTABLE Tenant Table Col1 Col2 … 1 1 Camera 5I 12MP … 2 1 TV 6000 32” … 1 1 Camera 5II 18MP 3 2 … … Tenant 1 Tenant 2 PRODUCT PRODUCT Product Name Sensor Product Name Screen 1 Camera 5I 12MP 1 TV 6000 42” 2 Camera 5II Pivot Table 18MP … • Agrawal, R., Somani, A., Xu, Y.: Storage and querying of e-commerce data. In: P.M.G. Apers, P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, R.T. Snodgrass (eds.) VLDB’01, Proceedings of 27th International Virtual Schemas 𝜎Tenant Conference on Very Large Data Bases, September 11-14, 2001, Roma, Italy, pp. 149–158. Morgan Kaufmann =1,Table=1 (2001) System Schema • Cunningham, C., Graefe, G., Galindo-Legaria, C.A.: Pivot and unpivot: Optimization and execution strategies ⋈Tenant in an rdbms. In:M.A. Nascimento,M.T. O¨ zsu, D. Kossmann, R.J.Miller, J.A. Blakeley, K.B. Schiefer (eds.) ,Table ,Row VLDB’04, Proceedings of 30th International Conference on Very Large Data Bases, August 31 - September 3, 2004, Toronto, Canada, pp. 998–1009 (2004) … 𝜌Value:Product 𝜌Value:Name 𝜌Value:Sensor in clinical systems 𝜎ColUsed 𝜎Col=′Name′ =′Product′ 𝜎Col=′Senosor′ • Friedman, C., Hripcsak, G., Johnson, S.B., Cimino, J.J., Clayton, P.D.: A generalized relational schema for an integrated clinical patient database. In: R.A. Miller (ed.) SCAMC’90, Proceedings of the 14th Annual Symposium on Computer Application inPMedical Care, November 4-7, 1990, Washington DC, pp. 335–339 IVOTTABLE (1990) Tenant Table Row Col Value 1 1 1 Product 1 1 1 1 Name Camera 5I 1 1 1 Sensor 12MP 2 1 2Forschungskolleg… … Tenant 1 Tenant 2 PRODUCT PRODUCT Product Name Sensor Product Name Screen 1 Camera 5I 12MP 1 TV 6000 42” 2 Camera 5II Chunk Table 18MP … • Aulbach, S., Grust, T., Jacobs, D., Kemper, A., Rittinger, J.: Multi-tenant databases for software as a service: schema-mapping techniques. In: J.T.L. Wang (ed.) SIGMOD’08, Proceedings of the 2008 ACM SIGMOD Virtual Schemas 𝜎Tenant 𝜎 International Conference on Management of Data, Vancouver, British Columbia, Canada, June 10-12, 2008., =1,Table=1 Tenant=2,Table=1 pp. 1195–1206. ACM (2008) System Schema … ⋈Tenant,Table,Chunk 𝜌Int:Product,Str:Name ⋈Tenant,Table,Chunk 𝜌Int:Sensor 𝜌Int:Screen 𝜎Chunk=2 𝜎Chunk=1 𝜌Int:Product,Str:Name 𝜎Chunk=2 𝜎Chunk=1 CHUNKTABLE Tenant Table Row Chunk Int Str 1 1 1 1 1 Camera 5I 1 1 1 2 12MP % 2 1 1 1 1 TV 6000 2 1 1 2 42” % Tenant 1 Tenant 2 PRODUCT PRODUCT Product Name Sensor Product Name Screen 1 X-Tablet 5MP 1 X-Tablet 11” 4 TV 6000 42” 2 3 Camera 5I 12MP Polymorphic Table18MP Camera 5II … • Aulbach, S., Grust, T., Jacobs, D., Kemper, A., Rittinger, J.: Multi-tenant databases for software as a service: schema-mapping techniques. In: J.T.L. Wang (ed.) SIGMOD’08, Proceedings of the 2008 ACM SIGMOD Virtual Schemas International Conference on Management of Data, Vancouver, British Columbia, Canada, June 10-12, 2008., pp.⋈ 1195–1206. ACM (2008) ⋈ System Schema Product Product TENANT1 ∪ ∪ ∪ Product Sensor 1 5MP SCREENEXTENSION 2 12MP Product Name Product Screen 3 18MP 1 X-Tablet 1 PRODUCTBASE PRODUCTBASETENANT1 11” TENANT2 Product Name PRODUCTBASETENANT2 Product SCREENEXTENSIONTENANT2 2 Camera 5I Product Name 1 Product Screen 3 Camera 5II 4 TV 6000 4 4 42” > Schema Flexibility Problem ? Solution A: Alter the Data Hannes Voigt | Solution B: Alter the Data Storage Forschungskolleg | 13 > How does the data look like? Abstraction for Structured Data A AT E ET Example Person Firstname Lastname hv Hannes Voigt Hannes Voigt | ET={Person} AT={Firstname, Lastname} E={hv} A={Hannes, Voigt} Forschungskolleg | 14 > How does the data look like? Abstraction for Structured Data TA A SD AT E ST ET TE Example Person Firstname Lastname hv Hannes Voigt Hannes Voigt | ET={Person} AT={Firstname, Lastname} E={hv} A={Hannes, Voigt} ST={(Person,Firstname), (Person,Lastname)} TE={(Person,hv)} SD={(hv,Hannes), (hv,Voigt)} TA={(Firstname,Hannes), (Lastname,Voigt)} Forschungskolleg | 15 > How does the data look like? Incompleteness Entities without an entity type Attribute types without an entity type Entity types without attributes Ambiguity Entities with multiple entity types Attributes with multiple attribute types Attribute types with multiple entity types Sparseness Entities instantiate attribute types that do not belong to their entity type Entities do not instantiate attribute types that belong to their entity type Multivalence Entities instantiate an attribute multiple times Hannes Voigt | Forschungskolleg | 16 > How does the storage look like? Presenting Relation R : A×B Example: CSV R = {(a1,b11), (a1,b12), (a2,b2), (a3,b3), …} Direct: a1b11 a1b12 a2b2 a3b3 … Grouped: a1b11b12 a2b2 a3b3 … Ordered: a1a1a2a3… b11b12b2b3… <A,B> <A<B>> [A][B] Presenting R : A×B and S : B×C [AT] Year,Make,Model,Description,Price <E … > 1997,Ford,E350,"ac, abs, moon",3000.00 [A] 1999,Chevy,"Venture Extended Edition","",4900.00 1999,Chevy,"Venture Extended Edition, Very Large","",5000.00 Existing Storage Designs Hannes Voigt | Forschungskolleg | 17 > How can we change the storage? Example: Interpreted Record <ET <E <AT, A>>> Incompleteness Ambiguity Sparseness Multivalence [Beckmann, J. L.; Halverson, A.; Krishnamurthy, R. & Naughton, J. F., Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format, ICDE'06, Proceedings of the 22nd International Conference on Data Engineering, 3-8 April 2006, Atlanta, GA, USA, IEEE Computer Society, 2006, 58 ] Hannes Voigt | Forschungskolleg | 18 > Thesis Topics Performance Implication of Storage Design How does the Storage Design affect the performance of a database system? Query Processing with variable Storage Design What are the effects of a configurable storage design on query processing? Adaptive Storage Design Can a database system itself adapt its storage organization automatically? Adaptive Data Rework Can a database system itself adapt the data automatically? SQL and Schema Flexibility How can SQL deal with Incompleteness, Ambiguity, Sparseness, Multivalence? Tagging in Relational DBMS How can tagging (Ambiguity) be integrated in relational database systems? What are the effects on Storage Design, Query Processing, SQL Vectorized Relational DBMS How can vectors (Multivalence) be integrated in relational database systems? What are the effects on Storage Design, Query Processing, SQL Hannes Voigt | Forschungskolleg | 19 > Thesis Topics Data Irregularities Incompleteness Query Language Ambiguity Sparseness Multivalence SQL and Schema Flexibility DBMS Architecture Tagging Data System Adaptive Data Rework Query Processing with variable Storage Design Performance Study Access System Adaptive Storage Design Vectors Storage System Hannes Voigt | Forschungskolleg | 20 > Graph Data Management Michael Rudolf [email protected] Hannes Voigt | Marcus Paradies [email protected] Forschungskolleg | 21 Schema Flexibility and Graph Data Management Hannes Voigt – [email protected] Michael Rudolf – [email protected] Marcus Paradies – [email protected] © Prof. Dr.-Ing. Wolfgang Lehner |