Prof. Dr. Stefan Edlich NoSQL in der Cloud nosqlberlin.de nosqlfrankfurt.de http://nosql-database.org NoSQL is specialization! • Big Data • Massive Write Performance • Fast KV Access • Write Availability • Flexible Schema (Migration) + Flexible Datatypes • Easier maintainability, administration and operations • No single point of failure • Programmer ease of use Theorie?! Map/Reduce Nachfolger! Map/Reduce ACID / BASE & CAP nie vor! P liegt in der Regel Consistent Hashing K/V Stores MVCC Vector Clocks Basis skalierbarer non blocking Vorteile [122:1] Google Protocol Buffers => Apache Avro! • JSON • Binary data transfer • automatic RPC generation • no code generation • Client + Server tauschen Schema bei Änd unbedingt evaluieren! Datenmodelle Column Family DocumentDBs Key/ValueDBs Voldemort, Chordless, Scalaris, Dynamo / Dynomite GraphDBs andere db4o, Versant, Objectivity, Gemstone, Progress, Mark Logic, EMC Momentum, Tamino, GigaSpaces, Hazelcast, Terracotta, … Cassandra HBase SimpleDB + Skalierung = new node + Community + API - Replikation - Aufsetzen, Optimierung, Wartung + stressfreie SaaS Lösung + transparent scaling - UTF-8 String - Daten liegen bei Amazon +- kein tuning / config + Skalierung = new node + Replikation + Konfiguration (r, w) - Dokumentation - Abfragen Document Databases any JS-Client no Middleware! DB+WebServer +evolving App - nicht normalisiert (Duplicates, Delete Orphans, ...) - (konfigurierbare Zeit Crash anfällig) (Journaling) - Eventually Consistent - echte Skalierung nur über Sharding - (noch nicht kill -9 fest) 67 GB Index Data 11 hours + 1 day of + + + + + + + + + + / nicht normalisiert Schema Agilität Doku exzellent Speed (MemMapped Files) Installation+save =28 sek! beliebige Indizes MapReduce Rich Query Language GridFS (statt HDFS) einfache Replizierung (Master-Slave Replica Sets) db.system.indexes.find(); db.friends.getIndexes(); db.friends.ensureIndex({friend: 1}); db.friends.ensureIndex({friend: 1, zip: 1}); //compound db.friends.find({friend: „Mario“, zip: „13755“}).explain(); Queries: age: {$gt: 10} food:{$all: [„pizza“, „noodles“]} $gt, $lt, $lte, $ne, $in, $nin, $mod, $all, $size, $exists, $type, , $or, $elem, $elemMatch, regexp, ... NoSQL Query LockIn?! Sich veränderndes Schema Migrations Architektur-Pattern: A) Blacklist rename try { ... } catch (FirstException | SecondException ex) { // newName = BlackList.checkName(OldName)} B) „Rails“ Migration new name new name new name new name old name new name old name new name old name new name old name new name (nicht wenn zu oft repliziert) Duplikate = Space Aktualität der Daten „Pre-Joined“ Daten! „pre-computeD“ wachsende Daten raus oder Pre-SPACED In die Cloud… Clients Config Servers mongos ROUTER Shard A Shard B Shard C RAM+ DISK+ Replica Set 64 bit [extra | double | quadrupel] Large POSSIBLE ARBITER micro Erfahrungen… • RAID Konfigurationen (00,01,10,03,05, …) • Journaling-Dateisysteme (ext4, xfs, …) • (Security) Ports, F-Deskriptoren, Snapshots,… K/V-Stores + sehr schnell > 100.000 /sek + konfigurierbarer Disc sync + API für eigene Anbindung + einfache Replikation Datenstrukturen abbilden -> + hash, list, set, sorted set, messages + Installation UNIX: 38 sek Windows: 18 sek Sorted Set memcached API • • • • • • simply dynamic scaling (up & down) scales linear bullet proof by Zynga.com limited membase protocol Membase Tap (Protocol Interception) Code-Node: Membase in der Cloud • Fertige RightScale & AMI templates • Diverse Ports öffnen • DNS Eintrag und keine verändernden IPs • Master Node angeben • legt Quota für die Erben fest • Backups für EBS Property Graph Graph DBs in der Cloud • > N Milliarden Knoten? Sharding! • aber meistens kein „predictable lookup“ • möglich nur bei Domain Specific Knowledge • ausbalancierte DBs ohne sweet spots kaum möglich • Access Patterns + Heuristiken (Insert Sharding / Runtime Sharding) => partitionierungs Algorithmen > 220 DBs durchaus frustrierendes Consulting… Data Transactions Performance Queries Architecture other Non-Functional Requirements Analyse your Data Domain-Data, Log-Data, Event-Data, Message-Data, critical Data, Business-Data, Meta-Data, temp Data, Session-Data, Geo Data, etc. Data- / Storage-Model: relational, column-o, doc-alike, graphs, objects, etc. What Types / Type-System? Data-Navigation, Data Amount, Data Komplexity (Deep XML?) ACID vs. BASE vs. Mixture? CAP decisions Performance Dimension Analysis Latency, Request behaviour, Throughput Scale-Up vs Scale-Out Query Requirements Typical queries, Tools, Ad-Hoc Queries, SQL / LINQ needed, Map/Reduce? … Distribution Architecture local, parallel, distributed / grid, service, cloud, mobile, p2p, … Data Access Patterns read / write distribution, random / sequential, Patterns Access Design Non Functional Requirements: Replication, Refactoring Frequency, DB-Support, Qualification / simplicity, Company restrictions, DB diversity (allowed?), Security, Safety / Backup & Restore, Crash Resistance, Licence… NoSQL FAZIT Unbedingt RAM & SDD annehmen! Lot‘s of >1 PT RAM DBs in California! SAP-Strategie? Service, RAM, Cloud, Mobile DaaS Zeitalter Alleine für MongoDB weit über 100 „Database-as-aService“ Provider! Amazon: SimpleDB, Hadoop, etc. Viele clevere hybrid Lösungen! CouchBase, Hadoop+MySQL Database-aaS => best Mix! (View, Domain, Stamm, Meta, Log, …) by Couch, MongoDB, Redis, Membase, … unkritisc he Daten kritis che Daten Management Zahlungsdaten, persönliche Daten, … by classic RDBMS, Vertica, VoltDB, Database.com, GenieDB, … Hadoop* BI OLAP BI Dwight Merriman (10gen) Analytics Links • nosql-database.org • nosqltapes.com • mynosql.com .com http://edlich.de pi -> 1015 -> 1000 Stonebraker „A giant step back! Imcompatible, missing features, not new, …“ Starke Konkurrenz: Stratosphere (TUB), ePic, SwissBox, etc. Paralellization Contracts compile, analyze, optimize auf einer atmenden Eventually Consistent ACID WATER • Amazon Dynamo • MySQL Replikation BASE Consistency Models © Wilfried Springer NoSQL CAP Theoreme Pick 2! System is always ‘ on‘ Availability Clients find replicas Klassiker NoSQL ACID / Isolation Consistency Clients see equal data Partition Tolerance „Don‘t throw C away so easy! It‘s complex.“ What you really have is: 1. Application errors 2. Repetable DBMS errors 3. Unrepeatable DBMS errors 4. Operating System errors 5. Hardware failure in cluster 6. Network partition in local cluster 7. A disaster 8. WAN failure • 6 = Network Partition is rare • 3,4,5,6 is mostly a Single Node • Algorithms can help! „give up P rather than sacrificing C. Use VoltDB or NimbusDB” Consistent Hashing M:[0,5) R:[25,30) N:[5,10) Q:[20,25) HASH KNOTEN REPLIKAT 2 M N,O 8 N O,P 10 O P,Q 17 P Q,R 22 Q R,M 26 R M,N O:[10,15) P:[15,20) W = 2*W R = 1*R • ausfallsicher • leicht erweiterbar • gut verteilt / MVCC Multi Version Concurrency Control pessimistisches Locki laufen Anna A:1 L:1 Paul Laura surfen surfen laufen P:1 L:1 L:2 P:1 A:0 A:1 L:1 P:0 laufen surfen L:1 L:2 P:1 surfen => P:2 A:1 L:2