Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA Stefan Bäuerle, Jonathan Dees, Franz Faerber, Wolfgang Lehner Agenda • Motivation & Requirements • Different Processing Engines and Integration • Scale out edition engine © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 2 Application requirements for a modern DBMS Different: © 2015 SAP SE or an SAP affiliate company. All rights reserved. data types consumption models data models notions of consistency application and query language levels of scaling hardware capabilities Public 3 HANA Platform © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 4 HANA System © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 5 Beyond relational data processing (1/3) • Integrate as deep as possible into the engine Bringing OLAP and OLTP together Data mining and prediction Unstructured data Planning extensions © 2015 SAP SE or an SAP affiliate company. All rights reserved. • Proven: works in thousands of customer systems • Simplicity: get rid of extracts, loads and redundancy, one system • OLAP dominates OLTP in real world systems: optimize accordingly • Examples: Basked analysis, different forecasting algorithms… • Easy interaction with R and SAS • Support text search > 30 languages including: • Stemming, speech tagging, noun extractions, … • Classification, clustering, named entity recognition, sentinel analysis • Planning: Define and align business figures for foreseeable future • Data heavy operators like disaggregation or logical snapshots Public 6 Beyond relational data processing (2/3) Graph processing Hierarchy processing Geospatial processing & Time series © 2015 SAP SE or an SAP affiliate company. All rights reserved. • Real world business data often resembles graphs • Model as graph: More explicit and more efficient operators • Distance, siblings, shortest path, reachability, transitive closure, … • Special type of general graphs • Used by almost every business application • Support for time dependent and versioned hierarchies • Extended graph operators: level, neighbor, is_ancestor, … • Native relational data types • Existing compression techniques + powerful specializations for sensor data • Spatial: WithinDistance, Contains, Area, … • Time series: Group by time interval, Interpolate Missing Values, … Public 7 Beyond relational data processing (3/3) Scientific processing No SQL processing Massive scale out © 2015 SAP SE or an SAP affiliate company. All rights reserved. • Bring prominent operators into the engine • Simplifies and speeds up operations in scientific and financial area • Matrix operators: Eigenvalue, Multiply, … • Financial operators: Interest Rates, GarmanKohlagenProcess, … • Document based models, XML, JSON, … • Key value stores • Flexible Schema, in HANA via specific flexible table type • Conventional business applications fit on single box, but: there is a new kind of applications requiring massive scale out • Deep and seamless integration with the Hadoop system • Scale out and single box application act as one system Public 8 Application integration ( examples ) © 2015 SAP SE or an SAP affiliate company. All rights reserved. Currency conversion Hierarchy handling Aging / dynamic tiering Dictionary maintenance Graph optimizations Public 9 HANA Data Platform Dynamic Tiering HANA Dynamic Tiering Declare table to use disk storage Cost efficient for big data Optimized disk based processing powered by IQ New warm option beside Hot (in-memory) Cold (Near Linear Storage) CREATE TABLE „demo“.“SalesOrders_WARM“ ( ID Integer NOT NULL, CustomerID Integer NOT NULL, OrderDate date NOT NULL, …, PRIMARY KEY (id) ) USING EXTENDED STORAGE; INSERT INTO „demo“.“SalesOrders_WARM“ VALUES ( … ); © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 10 HANA Data Platform BigData | Vision HANA Data Management Platform Information Management | Text | Search | Graph | Geospatial | Predictive SAP HANA In-Memory HANA Dynamic Tiering 0.1sec Instant Results Warm Data ∞ Dynamic Tiering Smart Data Streaming NoSQL | Graph | Geo | TimeSeries HADOOP HANA Scale Out Infinite Storage Raw Data Smart Data Streaming Administration | Monitoring | Operations | User Management | Security © 2015 SAP SE or an SAP affiliate company. All rights reserved. HANA native BigData HANA & Hadoop SDA Hive | Spark MapReduce | HDFS Admin & Monitoring User Mgmt / Security Hadoop Extension Velocity Engine Integrated with HANA and Hadoop Public 11 SAP HANA Massive Scale Out Edition (Velocity) Motivation: • Engine for massive scale out and big data Key Features: • Scale to thousands of nodes • Different data freshness and consistency levels • Efficient fail safety design • First class citizen within Hadoop (Spark) • Support variety of hardware and operating systems • Extreme query performance by compiling SQL to native code © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 12 SAP HANA SOE (Velocity) and Hadoop (1/2) Hadoop Ecosystem Ambari Cluster Management MLib Machine Learning Hive SQL SparkSQL SQL Yarn Processing Spark Processing HDFS Distributed File System © 2015 SAP SE or an SAP affiliate company. All rights reserved. HBase Database Zookeeper Coordination Pig Scripting Public 13 SAP HANA SOE (Velocity) and Hadoop (2/2) Steps Stage 1: Integration with Spark (2015) Stage 2: Independent execution cluster Benefits Integration of SAP data with data lakes HANA features add Value into Hadoop (e.g. SQL extensions like time series, hierarchies, …) Performance Holistic data platform © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 14 Architecture to Support Different Data Freshness Levels • Separate component for Transactions • Options • read your own writes • up-to-date data vs. certain age DQP R R Connection 1 (Session data) R Transaction Broker Version Table Query Engine 2 A, D R DTX … Connection n Storage 1 Query Engine 1 A, B, C Storage … Storage 2 n Distributed Log Query Engine 3 A, C, D … R Storage (checkpoints) © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 15 SAP HANA scale out integration © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 16 Conclusion • Today’s applications have multidimensional set of specialized requirements • Gains from moving these requirements into a (single) DBMS: • Simplified and more explicit data modeling and processing for applications • Increased performance • No complicated data transfer between specialized engines • Powerful orchestration required • Web-scale processing is key to support new applications SAP HANA strives to answer all these requirements in a single data management platform. © 2015 SAP SE or an SAP affiliate company. All rights reserved. Public 17 Thank you © 2015 SAP SE or an SAP affiliate company. All rights reserved.