Data Management MSE - HSR-Wiki

Stefan Keller
Seminsar Datenbanksysteme HSR - 1
Seminar
Datenbanksysteme
Autumn 2016
Kickoff-Meeting 19.9.2016
Stefan Keller, HSR
Seminsar Datenbanksysteme HSR - 2
Data Stream Management Systems (DSMS)
from the example of PipelineDB and others
Stefan Keller
 Organisation:
http://wiki.hsr.ch/Datenbanken/wiki.cgi?S
eminarDatenbanksystemeHS16
 Introduction:
– NoSQL Definitions and Classifications
– Polyglot Persistence
– Thema / motivation of seminar
– Organisation and infos about seminar
Seminsar Datenbanksysteme HSR - 3
SQL vs. NoSQl vs. NewSQL

SQL
– Commercial example: Oracle | OS example: (Oracle) MySQL

NoSQL
Stefan Keller
– “Mechanism for storage and retrieval of data that is modeled in
means other than the tabular relations used in relational databases.”
– “Next Generation Databases mostly addressing some of the points:
being non-relational, distributed, open-source and horizontally
scalable.”
– NoSQL systems are also sometimes called "Not only SQL".
– SQL? ACID? Relations? Distributed?
– Commercial example: DynamoDB | FOSS example: MongoDB

NewSQL
– Modern relational database management systems that seek to
provide the same
scalable performance of NoSQL systems for online transaction
processing (OLTP)
read-write workloads while still maintaining the ACID guarantees of a
traditional
database system.
– FOSS example: VoltDB
(Credits: Javier García Magna)
Seminsar Datenbanksysteme HSR - 4
More Database classifications
On premises
vs.
Stefan Keller
Memory / Disk
Cloud “As a service” (Azure Document
vs.
Only in memory (OrigoDB, Redis, SQL S
OLTP vs. OLAP
Databases vs. Not a database
but a data store (Zookeeper, Kafka
CAP classifications
(Credits: Javier García Magna)
Seminsar Datenbanksysteme HSR - 5
Stefan Keller
Products







Key-value stores (Redis)
Document stores (MongoDB)
Wide column stores (Cassandra)
Graph stores (Neo4j)
Search engines (Elastic Search)
OODBMS
Streaming Databases / Time Series
Stores (InfluxDB, Event Store,
PipelineDB)
(Credits: Javier García Magna)
Seminsar Datenbanksysteme HSR - 6
Polyglot persistence
Stefan Keller
Any decent sized enterprise will have a
variety of different data storage technologies
for different kinds of data
(Credits: Martin Fowler and Javier García Magna)
Seminsar Datenbanksysteme HSR - 7
Stefan Keller
Linked___ before
(Credits: Martin Fowler and Javier García Magna)
Seminsar Datenbanksysteme HSR - 8
Stefan Keller
Linkes___ after
Seminsar Datenbanksysteme HSR - 9
Stefan Keller
Some Stats (from db-engines.com, 2015)
Seminsar Datenbanksysteme HSR - 10
Key Takeaways
Stefan Keller
(Source “Data Stores: beyond relational databases”
by Javier García Magna (@ndsrf) Head of
Development www.sequel.com)
 Always think about the schema
(even with schema less DBs)
 Best DB? “It depends”
– Prototyping?
– Domain?
– How the data is going to be used?
 Most of us don’t work with “big data” but “small or
medium”
Seminsar Datenbanksysteme HSR - 11
Stefan Keller
Motivation MSE-Seminar HS16/17…
 Mit dem Internet of Things (IoT) werden
Datenströme immer relevanter, fallen doch
grosse Datenmengen an
 Massive Datenströme verlangen nach speziellen
Systemen, Datenstrukturen sowie nach
bedarfsgesteuerter Verarbeitung endlicher
Eingabemengen:
– kontinuierliche Anfragen (Views, Transformations,
Triggers),
– die permanent über Datenströme (bzw.) ausgeführt
werden und ggf. persistiert werden
 Hierbei werden durch die datengetriebene
Verarbeitung kontinuierlich Ergebnisse
bereitgestellt, die z.B. auf einem "Fenster" der
bis vorgängig konsumierten Elementen basieren.
Seminsar Datenbanksysteme HSR - 12
Stefan Keller
MSE-Seminar Datenbanksysteme HS16/17
 Streaming-Daten in Echtzeit mit StandardSQL bearbeiten, ohne neue
Programmiersprachen oder VerarbeitungsFrameworks erlernen zu müssen
 Anhand von aktuellen Produkten und
Anwendungsszenarien sollen in diese
DSMS charakterisiert werden.
 Szenarien:
– "Realtime Reporting / Dashboards"
– "Realtime Monitoring / Alerting"
 Direkt angrenzendes Thema "Immutable
Databases“, „append-only“ Datenströme
persistieren.
Seminsar Datenbanksysteme HSR - 13
Observations

Stefan Keller




(Credits: "PipelineDB The Streaming SQL
Database" by Derek Nelson)
Data-processing demands are outpacing
hardware innovation (disks)
Storing critical data in main memory is an
obvious workaround for the disk bottleneck
If fast query results are required, then the query
itself is often already known
If the query is known in advance, we can
efficiently compute the result continuously as
new data arrives
No need to store granular data after results are
incrementally updated (aggregate before writing
to disk)
Seminsar Datenbanksysteme HSR - 14
Stefan Keller
Benefits of continuous SQL




Streaming analytics with pure SQL
No application code
Very low engineering overhead
Add new continuous queries with no
downtime
 Eliminates the need for ETL
Seminsar Datenbanksysteme HSR - 15
Architectures and Data Source
Stefan Keller
 Continuous Query and Architecture
explained... => See slides 14 - 61 of
"PipelineDB The Streaming SQL
Database" by Derek Nelson.
 Architecture and Data Source of Seminar
(by Samuel) => See Wiki…
HAPPY CODING AND WRITING!