Mehr als Reporting – Datenanalysen mit Oracle R Enterprise Dr. Nadine Schöne Systemberaterin Oracle Direct, Sales Consulting Dr. Michael Haupt Principal Member of Technical Staff Oracle Labs, Virtual Machine Research Group June 04, 2014 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 3 Agenda 1 Mehr als Standard Reporting? 2 Weiterführende Datenanalysen 3 R und Oracle R Enterprise (ORE) 4 Demo 5 Benefits 6 Ausblick: Mehr Performance für R Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 4 Mehr als Standard Reporting? Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 5 Reporting Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 6 Weiterführende Datenanalysen Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 8 Sensordaten-Analyse I 200.000 Haushalte 5.256 Mrd. Messwerte (2.628 Messwerte/Kunde) 1 Messung/Stunde 3 Jahre Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 9 Sensordaten-Analyse II 200.000 Haushalte ➔ 200.000 Modelle Oracle R Enterprise 10 s/Modell 23 Tage + 4 Stunden 4,3 Stunden Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 10 R Screenshots Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Advanced Analytics Große Bandbreite an In-Database Data Mining und statistischen Funktionen Deskriptive Datenanalyse & Visualization • Data Understanding & Visualization – Summary & Descriptive Statistics – Histograms, scatter plots, box plots, bar charts – R graphics: 3-D plots, link plots, special R graph types – Cross tabulations – Tests for Correlations (t-test, Pearson’s, ANOVA) – Selected Base SAS equivalents • Data Selection, Preparation and Transformations – Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple schemas – Sampling techniques – Re-coding, Missing values – Aggregations – Spatial data – R to SQL transparency and push down • Classification Models – Logistic Regression (GLM) – Naive Bayes – Decision Trees – Support Vector Machines (SVM) – Neural Networks (NNs) • Regression Models – Multiple Regression (GLM) – Support Vector Machines Daten Aufbereitung & Transformationen Klassifikations- & Regressions Modelle Clustering Clustering – Hierarchical K-means – Orthogonal Partitioning – Expectation Maximization Anomaly Detection – Special case Support Vector Machine (1-Class SVM) Associations / Market Basket Analysis – A Priori algorithm Feature Selection and Reduction – Attribute Importance (Minimum Description Length) – Principal Components Analysis (PCA) – Non-negative Matrix Factorization – Singular Vector Decomposition Text Mining – Most OAA algorithms support unstructured data (i.e. customer comments, email, abstracts, etc.) Transactional Data – Most OAA algorithms support transactional data (i.e. purchase transactions, repeated measures over time) R packages—ability to run open source – Broad range of R CRAN packages can be run as part of database process via R to SQL transparency and/or via Embedded R mode Verwendung von Open Source R packages * included in every Oracle Database Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Wichtige Themen für Enterprise Data Analytics 1. Skalierbarkeit 2. Performance 3. Entwicklung & Produktion Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R und Oracle R Enterprise (ORE) Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 14 Aspekte herkömmlicher R/Datenbank-Interaktion R logo © R Foundation, vonhttp://www.r-project.org Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15 „Collaborative Execution“-Modell 1 User R Engine (Dektop) 3 R Engine(s) verwaltet durch Oracle DB 2 Datenbank Compute Engine Oracle DB SQL R Engine andere R-Packages Oracle R Enterprise Packages Post-Processing der Ergebnisse Ergebnisse User-Tabellen R Engine R Ergebnisse Ausführung in Collaboration mit der Oracle DB andere R-Packages Oracle R Enterprise Packages Analysen, die in der Oracle DB nicht verfügbar sind Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Demo Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 17 Benefits Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 18 Benefits I 5.566 R-Packages Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19 Benefits II Performante Enterprise Predictive Analytics Applikationen Geringe Total Costs of Ownership Integration Performance & Scalability Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 20 Ausblick: Mehr Performance für R Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 21 FastR • Neuimplementierung von R in Java – Verwendung von Graal (Compiler) und Truffle (AST-Interpreter) – Dynamische Compilierung, Skalierung auf heterogenen Architekturen – Beteiligt: Oracle Labs (Deutschland, USA, Österreich), JKU Linz, Node Rewriting Compilation using U G for Profiling Feedback Partial Evaluation Purdue University, TU Dortmund U U Node Transitions U U I Uninitialized S AST Interpreter Uninitialized Nodes I G I Integer I U G I G I I D String Double AST Interpreter Rewritten Nodes Compiled Code G Generic Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 22 “R is a powerful and interesting tool for data analysis! ORE brings R into a scalable DB engine (solving problems of data management, analysis and scalability). We actually can obtain information and added value from not so actively used data.” – Stefano Alberto Russo, Researcher at CERN Openlab Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 23 Weitere Informationen ORE-Diskussionsforum: https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r Oracle Advanced Analytics: http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html ORE-Blog: https://blogs.oracle.com/R/ FastR: https://bitbucket.org/allR/fastR Graal/Truffle: https://wiki.openjdk.java.net/display/Graal/Main Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 24 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25