Datamining Ueberblick

Werbung
This document is for informational purposes. It is not a commitment to
deliver any material, code, or functionality, and should not be relied
upon in making purchasing decisions. The development, release, and
timing of any features or functionality described in this document
remains at the sole discretion of Oracle. This document in any form,
software or printed matter, contains proprietary information that is the
exclusive property of Oracle. This document and information contained
herein may not be disclosed, copied, reproduced or distributed to
anyone outside Oracle without prior written consent of Oracle. This
document is not part of your license agreement nor can it be
incorporated into any contractual agreement with Oracle or its
subsidiaries or affiliates.
FPO
In-Database Analytics: Predictive Analytics,
Data Mining & R …
Detlef E. Schröder
Leitender Systemberater STCC DB Mitte
R
Open Source
Oracle Data Mining
Oracle - Hardware und Software
Oracle Advanced Analytics Option
Oracle Data Mining
12- in-DB Data Mining Algorithmen
In-DB Modellbildung
In-DB Modelanwendung
In-DB Text Mining
50+ in-DB Statistische Funktionen
Oracle R Enterprise
R für alle Daten
Was ist Data Mining?
•
Automatische Suche durch die Daten, um Strukturen zu
entdecken, Zusammenhänge zu erforschen,
und Vorhersagen zu machen
•
Data Mining bietet Ergebnisse für :
•
•
•
•
•
•
•
Vorhersage des Kundenverhaltens (Classification)
Vorhersage oder Schätzen des Wertes (Regression)
Segmentierung (Clustering)
Faktoren entdecken, die zu einer Fragestellung gehören (Attribute Importance)
Finde Profile, Zielgruppen oder Zielelemente (Decision Trees)
Zusammenhänge entdecken und Warenkorbanalysen (Associations)
Datenausreißer (Anomaly Detection)
Oracle Data Mining Algorithmen
Probleme
Algorithmen
Anwendung
Classification
Logistic Regression (GLM)
Decision Trees
Naïve Bayes
Support Vector Machine
Classical statistical technique
Popular / Rules / transparency
Embedded app
Wide / narrow data / text
Regression
Multiple Regression (GLM)
Support Vector Machine
Classical statistical technique
Wide / narrow data / text
One Class SVM
Lack examples of target field
Anomaly
Detection
Attribute
Importance
Association
Rules
Minimum Description Length (MDL)
A1 A2 A3 A4 A5 A6 A7
Apriori
Hierarchical K-Means
Hierarchical O-Cluster
Clustering
Feature
Extraction
Nonnegative Matrix Factorization
F1 F2 F3 F4
Attribute reduction
Identify useful data
Reduce data noise
Market basket analysis
Link analysis
Product grouping
Text mining
Gene and protein analysis
Text analysis
Feature reduction
SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI
 GUI für Daten Analysten
 SQL Developer Extension
(OTN download)
 Daten untersuchen –
Neue Zusammenhänge
entdecken
 Aufbauen und Anwenden
von Modellen
 Vorhersagen modellieren
 Aufbau und Verteilen von
Workflows und SQL Code
Oracle Data Miner Nodes (Partial List)
Tabellen und Views
Transformationen
Data Analyse
Modellbildung
Text
Oracle Data Miner 11g Release 2 GUI
Churn Demo—Simple Conceptual Workflow
Churn models to product
and “profile” likely
churners
In-Database Data Mining
Traditional Analytics
Oracle Data Mining
Results
Data Import
Data Mining
Model “Scoring”
Data Preparation
and
Transformation
Savings
Data Mining
Model Building
Data Prep &
Transformation
Model “Scoring”
Data remains in the Database
Embedded data preparation
Data Extraction
Cutting edge machine learning
algorithms inside the SQL kernel of
Database
Model “Scoring”
Embedded Data Prep
Model Building
Data Preparation
Hours, Days or Weeks
Source
Data
• Faster time for
“Data” to “Insights”
• Lower TCO—Eliminates
• Data Movement
• Data Duplication
• Maintains Security
Dataset
s/ Work
Area
Analytic
al
Process
ing
Process
Output
Target
Secs, Mins or Hours
SQL—Most powerful language for data
preparation and transformation
Data remains in the Database
InDatabase - Mining
11g Statistische & Analytische Fkt. (Free)
Ranking functions
rank, dense_rank, cume_dist, percent_rank, ntile
Window Aggregate functions
(moving and
cumulative)
Avg, sum, min, max, count, variance, stddev,
first_value, last_value
LAG/LEAD functions
Direct inter-row reference using offsets
Reporting Aggregate functions
Sum, avg, min, max, variance, stddev, count,
ratio_to_report
Statistical Aggregates
Correlation, linear regression family, covariance
Linear regression
Fitting of an ordinary-least-squares regression line to a
set of number pairs.
Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions
Statistics
Descriptive Statistics
DBMS_STAT_FUNCS: summarizes numerical columns of a
table and returns count, min, max, range, mean, median,
stats_mode, variance, standard deviation, quantile values,
+/- n sigma values, top/bottom 5 values
Correlations
Pearson’s correlation coefficients, Spearman's and Kendall's
(both nonparametric).
Cross Tabs
Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
Hypothesis Testing
Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks
test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov
test, One-way ANOVA
Distribution Fitting
Kolmogorov-Smirnov Test, Anderson-Darling Test, ChiSquared Test, Normal, Uniform, Weibull, Exponential
Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition
Oracle Data Mining und Unstrukturierte Daten
Oracle Data Mining
untersucht auch
unstrukturierte
Daten, wie “Texte”
Inklusive Freitext und
Kommentare in
ODM Modellen
Cluster and
Klassifizierung von
Dokumenten
Oracle Text
für die
Vorverarbeitung
Real-time Klassifizierung für Kundendaten
On-the-fly, auf einzelne Datensätze angewendet (z.B. vom Call Center)
Select prediction_probability(CLAS_DT_5_2, 'Yes'
USING 7800 as bank_funds, 125 as checking_amount, 20 as
credit_balance, 55 as age, 'Married' as marital_status,
250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
from dual;
Call Center
Social Media
Branch
ECM
BI
Get Advice
Web
Email
CRM
Mobile
Exadata + Data Mining 11g Release 2
“DM Scoring” weitergeleitet auf den Storage!
schneller
In 11g Release 2, SQL Vorhersagen und Oracle Data Mining Modelle werden
In die Storage Zellen verlagert
z.B: Wechselwillige Kunden in den USA:
select cust_id
from customers
where region = ‘US’
and prediction_probability(churnmod,‘Y’ using *) > 0.8;
Oracle Communications Industry Data Model Beispiel
Bessere Informationen für OBIEE Dashboards
ODM Vorhersagen &
Wahrscheinlichkeiten
sind verfügbar aus der
Datenbank heraus
Weitere Beispiele für den Einsatz von ODM
 Polizei – Verbrechensvorhersage
 Geldwäsche – Konzept unter Verwendung von ODM zur Ermittlung
 Oracle CRM – Unterstützung des Kampagnenmanagements
 Oracle Telekomunikation Datenmodell – Chrun und CLTV
 Prozessananlyse - in Zusammenarbeit mit Robotron
 ...
Weitere Beispiele bei OBE
Zusammenfassung
Advanced Analytics direkt in der DB
Vorteile …
Datentransformation ohne Materialisierung
Definition von Views
Pipelined Table Functions
Unterstützung auch "ausgefallener" Datentypen
Ausnutzung des Optimizers
Skalierbarkeit auch bei großen Datenmengen
Durchgängige Sicherheitskonzepte
Virtual Private Database / Row Level Security
Schutz vor dem DBA durch Database Vault
Greift auch für die Anwendung der Ergebnisse
Kontrolle über Ressourcenverbrauch
Resource Manager / Enterprise Manager Grid Control
Fragen &
Antworten
Herunterladen