Stefan Keller Seminsar Datenbanksysteme HSR - 1 Seminar Datenbanksysteme Autumn 2016 Kickoff-Meeting 19.9.2016 Stefan Keller, HSR Seminsar Datenbanksysteme HSR - 2 Data Stream Management Systems (DSMS) from the example of PipelineDB and others Stefan Keller Organisation: http://wiki.hsr.ch/Datenbanken/wiki.cgi?S eminarDatenbanksystemeHS16 Introduction: – NoSQL Definitions and Classifications – Polyglot Persistence – Thema / motivation of seminar – Organisation and infos about seminar Seminsar Datenbanksysteme HSR - 3 SQL vs. NoSQl vs. NewSQL SQL – Commercial example: Oracle | OS example: (Oracle) MySQL NoSQL Stefan Keller – “Mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.” – “Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.” – NoSQL systems are also sometimes called "Not only SQL". – SQL? ACID? Relations? Distributed? – Commercial example: DynamoDB | FOSS example: MongoDB NewSQL – Modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system. – FOSS example: VoltDB (Credits: Javier García Magna) Seminsar Datenbanksysteme HSR - 4 More Database classifications On premises vs. Stefan Keller Memory / Disk Cloud “As a service” (Azure Document vs. Only in memory (OrigoDB, Redis, SQL S OLTP vs. OLAP Databases vs. Not a database but a data store (Zookeeper, Kafka CAP classifications (Credits: Javier García Magna) Seminsar Datenbanksysteme HSR - 5 Stefan Keller Products Key-value stores (Redis) Document stores (MongoDB) Wide column stores (Cassandra) Graph stores (Neo4j) Search engines (Elastic Search) OODBMS Streaming Databases / Time Series Stores (InfluxDB, Event Store, PipelineDB) (Credits: Javier García Magna) Seminsar Datenbanksysteme HSR - 6 Polyglot persistence Stefan Keller Any decent sized enterprise will have a variety of different data storage technologies for different kinds of data (Credits: Martin Fowler and Javier García Magna) Seminsar Datenbanksysteme HSR - 7 Stefan Keller Linked___ before (Credits: Martin Fowler and Javier García Magna) Seminsar Datenbanksysteme HSR - 8 Stefan Keller Linkes___ after Seminsar Datenbanksysteme HSR - 9 Stefan Keller Some Stats (from db-engines.com, 2015) Seminsar Datenbanksysteme HSR - 10 Key Takeaways Stefan Keller (Source “Data Stores: beyond relational databases” by Javier García Magna (@ndsrf) Head of Development www.sequel.com) Always think about the schema (even with schema less DBs) Best DB? “It depends” – Prototyping? – Domain? – How the data is going to be used? Most of us don’t work with “big data” but “small or medium” Seminsar Datenbanksysteme HSR - 11 Stefan Keller Motivation MSE-Seminar HS16/17… Mit dem Internet of Things (IoT) werden Datenströme immer relevanter, fallen doch grosse Datenmengen an Massive Datenströme verlangen nach speziellen Systemen, Datenstrukturen sowie nach bedarfsgesteuerter Verarbeitung endlicher Eingabemengen: – kontinuierliche Anfragen (Views, Transformations, Triggers), – die permanent über Datenströme (bzw.) ausgeführt werden und ggf. persistiert werden Hierbei werden durch die datengetriebene Verarbeitung kontinuierlich Ergebnisse bereitgestellt, die z.B. auf einem "Fenster" der bis vorgängig konsumierten Elementen basieren. Seminsar Datenbanksysteme HSR - 12 Stefan Keller MSE-Seminar Datenbanksysteme HS16/17 Streaming-Daten in Echtzeit mit StandardSQL bearbeiten, ohne neue Programmiersprachen oder VerarbeitungsFrameworks erlernen zu müssen Anhand von aktuellen Produkten und Anwendungsszenarien sollen in diese DSMS charakterisiert werden. Szenarien: – "Realtime Reporting / Dashboards" – "Realtime Monitoring / Alerting" Direkt angrenzendes Thema "Immutable Databases“, „append-only“ Datenströme persistieren. Seminsar Datenbanksysteme HSR - 13 Observations Stefan Keller (Credits: "PipelineDB The Streaming SQL Database" by Derek Nelson) Data-processing demands are outpacing hardware innovation (disks) Storing critical data in main memory is an obvious workaround for the disk bottleneck If fast query results are required, then the query itself is often already known If the query is known in advance, we can efficiently compute the result continuously as new data arrives No need to store granular data after results are incrementally updated (aggregate before writing to disk) Seminsar Datenbanksysteme HSR - 14 Stefan Keller Benefits of continuous SQL Streaming analytics with pure SQL No application code Very low engineering overhead Add new continuous queries with no downtime Eliminates the need for ETL Seminsar Datenbanksysteme HSR - 15 Architectures and Data Source Stefan Keller Continuous Query and Architecture explained... => See slides 14 - 61 of "PipelineDB The Streaming SQL Database" by Derek Nelson. Architecture and Data Source of Seminar (by Samuel) => See Wiki… HAPPY CODING AND WRITING!