Prof_Stefan_Edlich_OSDC_2011

Prof. Dr. Stefan Edlich
NoSQL in der Cloud
nosqlberlin.de
nosqlfrankfurt.de
http://nosql-database.org
NoSQL is
specialization!
• Big Data
• Massive Write Performance
• Fast KV Access
• Write Availability
• Flexible Schema (Migration) + Flexible
Datatypes
• Easier maintainability, administration
and operations
• No single point of failure
• Programmer ease of use
Theorie?!
Map/Reduce
Nachfolger!

Map/Reduce
ACID / BASE & CAP
nie vor!

P liegt in der Regel
Consistent Hashing
K/V Stores
MVCC
Vector Clocks


Basis skalierbarer
non blocking Vorteile

[122:1]
Google
Protocol
Buffers
=>
Apache Avro!
• JSON
• Binary data transfer
• automatic RPC generation
• no code generation
• Client + Server tauschen Schema bei Änd
unbedingt evaluieren!
Datenmodelle
Column Family
DocumentDBs
Key/ValueDBs
Voldemort, Chordless, Scalaris, Dynamo /
Dynomite
GraphDBs
andere
db4o, Versant, Objectivity, Gemstone, Progress, Mark
Logic, EMC Momentum, Tamino, GigaSpaces, Hazelcast,
Terracotta, …
Cassandra
HBase
SimpleDB
+ Skalierung = new node
+ Community
+ API
- Replikation
- Aufsetzen, Optimierung,
Wartung
+ stressfreie SaaS
Lösung
+ transparent scaling
- UTF-8 String
- Daten liegen bei
Amazon
+- kein tuning / config
+ Skalierung = new
node
+ Replikation
+ Konfiguration (r,
w)
- Dokumentation
- Abfragen
Document Databases
any JS-Client
no Middleware!
DB+WebServer
+evolving App
- nicht normalisiert (Duplicates, Delete Orphans, ...)
- (konfigurierbare Zeit Crash anfällig) (Journaling)
- Eventually Consistent
- echte Skalierung nur über Sharding
- (noch nicht kill -9 fest)
67 GB
Index 
Data 
11 hours + 1 day of
+
+
+
+
+
+
+
+
+
+
/
nicht normalisiert
Schema Agilität
Doku exzellent
Speed (MemMapped Files)
Installation+save =28 sek!
beliebige Indizes
MapReduce
Rich Query Language
GridFS (statt HDFS)
einfache Replizierung (Master-Slave
Replica Sets)
db.system.indexes.find();
db.friends.getIndexes();
db.friends.ensureIndex({friend: 1});
db.friends.ensureIndex({friend: 1, zip: 1}); //compound
db.friends.find({friend: „Mario“,
zip: „13755“}).explain();
Queries: age: {$gt: 10} food:{$all: [„pizza“, „noodles“]}
$gt, $lt, $lte, $ne, $in, $nin, $mod, $all, $size, $exists,
$type, , $or, $elem, $elemMatch, regexp, ...
NoSQL Query LockIn?!
Sich veränderndes Schema
Migrations Architektur-Pattern:
A) Blacklist
rename
try { ...
} catch (FirstException | SecondException ex) {
// newName = BlackList.checkName(OldName)}

B) „Rails“ Migration
new
name
new
name
new
name
new
name
old
name
new
name
old
name
new
name
old
name
new
name
old
name
new
name
(nicht wenn zu oft repliziert)
Duplikate = Space
Aktualität der Daten
„Pre-Joined“ Daten!
„pre-computeD“
wachsende Daten
raus oder Pre-SPACED
In die Cloud…
Clients
Config
Servers
mongos
ROUTER
Shard A
Shard B
Shard C
RAM+
DISK+
Replica Set
64 bit [extra | double | quadrupel] Large
POSSIBLE
ARBITER
micro
Erfahrungen…
• RAID Konfigurationen
(00,01,10,03,05, …)
• Journaling-Dateisysteme (ext4,
xfs, …)
• (Security) Ports, F-Deskriptoren,
Snapshots,…
K/V-Stores
+ sehr schnell > 100.000
/sek
+ konfigurierbarer Disc sync
+ API für eigene Anbindung
+ einfache
Replikation
Datenstrukturen abbilden
->
+ hash, list, set, sorted
set, messages
+ Installation
UNIX: 38 sek
Windows: 18 sek
Sorted Set
memcached API
•
•
•
•
•
•
simply dynamic scaling (up & down)
scales linear
bullet proof by Zynga.com
limited membase protocol
Membase Tap (Protocol Interception)
Code-Node:
Membase in der Cloud
• Fertige RightScale & AMI templates
• Diverse Ports öffnen
• DNS Eintrag und keine verändernden
IPs
• Master Node angeben
• legt Quota für die Erben fest
• Backups für EBS
Property Graph
Graph DBs in der Cloud
• > N Milliarden Knoten? Sharding!
• aber meistens kein „predictable
lookup“ 
• möglich nur bei Domain Specific
Knowledge
• ausbalancierte DBs ohne sweet spots
kaum möglich
• Access Patterns + Heuristiken
(Insert Sharding / Runtime Sharding)
=> partitionierungs Algorithmen
> 220 DBs
durchaus
frustrierendes
Consulting…
Data
Transactions
Performance
Queries
Architecture
other Non-Functional Requirements
Analyse your Data
Domain-Data, Log-Data, Event-Data, Message-Data, critical Data, Business-Data,
Meta-Data, temp Data, Session-Data, Geo Data, etc.
Data- / Storage-Model:
relational, column-o, doc-alike, graphs, objects, etc.
What Types / Type-System?
Data-Navigation, Data Amount, Data Komplexity (Deep XML?)
ACID vs. BASE vs. Mixture?
CAP decisions
Performance Dimension Analysis
Latency, Request behaviour, Throughput
Scale-Up vs Scale-Out
Query Requirements
Typical queries, Tools, Ad-Hoc
Queries, SQL / LINQ needed,
Map/Reduce? …
Distribution Architecture
local, parallel, distributed / grid, service, cloud, mobile, p2p,
…
Data Access Patterns
read / write distribution, random / sequential,
Patterns
Access Design
Non Functional Requirements:
Replication, Refactoring Frequency, DB-Support,
Qualification / simplicity, Company restrictions, DB
diversity (allowed?), Security, Safety / Backup & Restore,
Crash Resistance, Licence…
NoSQL
FAZIT
Unbedingt RAM & SDD
annehmen!
Lot‘s of >1 PT RAM DBs
in California!
SAP-Strategie?
Service, RAM, Cloud, Mobile
DaaS
Zeitalter
Alleine für MongoDB weit über 100 „Database-as-aService“ Provider!
Amazon: SimpleDB, Hadoop, etc.
Viele clevere hybrid Lösungen!
CouchBase, Hadoop+MySQL
Database-aaS => best Mix!
(View, Domain, Stamm, Meta, Log,
…)
by Couch, MongoDB, Redis,
Membase, …
unkritisc
he
Daten
kritis
che
Daten
Management
Zahlungsdaten, persönliche Daten,
…
by classic RDBMS, Vertica,
VoltDB, Database.com, GenieDB, …
Hadoop* BI
OLAP BI
Dwight Merriman (10gen)
Analytics
Links
• nosql-database.org
• nosqltapes.com
• mynosql.com
.com
http://edlich.de
pi -> 1015 -> 1000
Stonebraker
„A giant step
back!
Imcompatible,
missing
features, not
new, …“
Starke Konkurrenz: Stratosphere (TUB), ePic, SwissBox,
etc.
Paralellization
Contracts
compile, analyze,
optimize
auf einer atmenden
Eventually Consistent
ACID
WATER
• Amazon Dynamo
• MySQL Replikation
BASE
Consistency Models
© Wilfried Springer NoSQL
CAP Theoreme
Pick 2!
System is always ‘
on‘
Availability
Clients find
replicas
Klassiker NoSQL
ACID / Isolation
Consistency
Clients see equal
data
Partition
Tolerance
„Don‘t throw C away
so easy! It‘s
complex.“
What you really have is:
1. Application errors
2. Repetable DBMS errors
3. Unrepeatable DBMS errors
4. Operating System errors
5. Hardware failure in cluster
6. Network partition in local
cluster
7. A disaster
8. WAN failure
• 6 = Network Partition
is rare
• 3,4,5,6 is mostly a
Single Node
• Algorithms can help!
„give up P rather
than
sacrificing C. Use
VoltDB or NimbusDB”
Consistent Hashing
M:[0,5)
R:[25,30)
N:[5,10)
Q:[20,25)
HASH
KNOTEN
REPLIKAT
2
M
N,O
8
N
O,P
10
O
P,Q
17
P
Q,R
22
Q
R,M
26
R
M,N
O:[10,15)
P:[15,20)
W = 2*W
R = 1*R
• ausfallsicher 
• leicht
erweiterbar 
• gut verteilt /
MVCC
Multi Version Concurrency
Control
pessimistisches Locki
laufen
Anna
A:1
L:1
Paul
Laura
surfen
surfen
laufen
P:1
L:1
L:2
P:1
A:0
A:1
L:1
P:0
laufen
surfen
L:1
L:2
P:1
surfen
=>
P:2
A:1
L:2