als Reporting - Datenanalysen mit Oracle R Enterprise

Werbung
Mehr als Reporting –
Datenanalysen mit Oracle R Enterprise
Dr. Nadine Schöne
Systemberaterin
Oracle Direct, Sales Consulting
Dr. Michael Haupt
Principal Member of Technical Staff
Oracle Labs, Virtual Machine Research Group
June 04, 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
3
Agenda
1
Mehr als Standard Reporting?
2
Weiterführende Datenanalysen
3
R und Oracle R Enterprise (ORE)
4
Demo
5
Benefits
6
Ausblick: Mehr Performance für R
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
4
Mehr als Standard Reporting?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
5
Reporting
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
6
Weiterführende Datenanalysen
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
7
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
8
Sensordaten-Analyse I
200.000 Haushalte
5.256 Mrd. Messwerte
(2.628 Messwerte/Kunde)
1 Messung/Stunde
3 Jahre
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
9
Sensordaten-Analyse II
200.000 Haushalte
➔
200.000 Modelle
Oracle R
Enterprise
10 s/Modell
23 Tage + 4 Stunden
4,3 Stunden
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
10
R Screenshots
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
Große Bandbreite an In-Database Data Mining und statistischen Funktionen
Deskriptive Datenanalyse & Visualization
• Data Understanding & Visualization
– Summary & Descriptive Statistics
– Histograms, scatter plots, box plots, bar charts
– R graphics: 3-D plots, link plots, special R graph types
– Cross tabulations
– Tests for Correlations (t-test, Pearson’s, ANOVA)
– Selected Base SAS equivalents
• Data Selection, Preparation and Transformations
– Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple
schemas
– Sampling techniques
– Re-coding, Missing values
– Aggregations
– Spatial data
– R to SQL transparency and push down
• Classification Models
– Logistic Regression (GLM)
– Naive Bayes
– Decision Trees
– Support Vector Machines (SVM)
– Neural Networks (NNs)
• Regression Models
– Multiple Regression (GLM)
– Support Vector Machines




Daten Aufbereitung & Transformationen
Klassifikations- & Regressions Modelle



Clustering
Clustering
– Hierarchical K-means
– Orthogonal Partitioning
– Expectation Maximization
Anomaly Detection
– Special case Support Vector Machine (1-Class SVM)
Associations / Market Basket Analysis
– A Priori algorithm
Feature Selection and Reduction
– Attribute Importance (Minimum Description Length)
– Principal Components Analysis (PCA)
– Non-negative Matrix Factorization
– Singular Vector Decomposition
Text Mining
– Most OAA algorithms support unstructured data (i.e. customer
comments, email, abstracts, etc.)
Transactional Data
– Most OAA algorithms support transactional data (i.e. purchase
transactions, repeated measures over time)
R packages—ability to run open source
– Broad range of R CRAN packages can be run as part of database
process via R to SQL transparency and/or via Embedded R mode
Verwendung von
Open Source R packages
* included in every Oracle Database
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Wichtige Themen für Enterprise Data Analytics
1. Skalierbarkeit
2. Performance
3. Entwicklung & Produktion
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R und Oracle R Enterprise (ORE)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
14
Aspekte herkömmlicher R/Datenbank-Interaktion
R logo © R Foundation, vonhttp://www.r-project.org
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
15
„Collaborative Execution“-Modell
1 User R Engine (Dektop)
3 R Engine(s) verwaltet durch Oracle DB
2 Datenbank Compute Engine
Oracle DB
SQL
R Engine
andere
R-Packages
Oracle R Enterprise Packages
Post-Processing
der Ergebnisse
Ergebnisse
User-Tabellen
R Engine
R
Ergebnisse
Ausführung in Collaboration
mit der Oracle DB
andere
R-Packages
Oracle R Enterprise Packages
Analysen, die in der Oracle
DB nicht verfügbar sind
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Demo
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
17
Benefits
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
18
Benefits I
5.566 R-Packages
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
19
Benefits II
Performante Enterprise
Predictive Analytics Applikationen
Geringe Total Costs of Ownership
Integration
Performance & Scalability
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
20
Ausblick: Mehr Performance für R
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
21
FastR
• Neuimplementierung von R in Java
– Verwendung von Graal (Compiler) und Truffle (AST-Interpreter)
– Dynamische Compilierung, Skalierung auf heterogenen Architekturen
– Beteiligt: Oracle Labs (Deutschland, USA, Österreich),
JKU Linz,
Node Rewriting
Compilation using
U
G
for Profiling Feedback
Partial Evaluation
Purdue University,
TU Dortmund
U
U
Node Transitions
U
U
I
Uninitialized
S
AST Interpreter
Uninitialized Nodes
I
G
I
Integer
I
U
G
I
G
I
I
D
String
Double
AST Interpreter
Rewritten Nodes
Compiled Code
G
Generic
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
22
“R is a powerful and interesting tool for
data analysis! ORE brings R into a
scalable DB engine (solving problems
of data management, analysis and
scalability). We actually can obtain
information and added value from not
so actively used data.”
– Stefano Alberto Russo, Researcher at CERN Openlab
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
23
Weitere Informationen
ORE-Diskussionsforum:
https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r
Oracle Advanced Analytics:
http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html
ORE-Blog:
https://blogs.oracle.com/R/
FastR:
https://bitbucket.org/allR/fastR
Graal/Truffle:
https://wiki.openjdk.java.net/display/Graal/Main
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
24
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
25
Herunterladen