This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates. FPO In-Database Analytics: Predictive Analytics, Data Mining & R … Detlef E. Schröder Leitender Systemberater STCC DB Mitte R Open Source Oracle Data Mining Oracle - Hardware und Software Oracle Advanced Analytics Option Oracle Data Mining 12- in-DB Data Mining Algorithmen In-DB Modellbildung In-DB Modelanwendung In-DB Text Mining 50+ in-DB Statistische Funktionen Oracle R Enterprise R für alle Daten Was ist Data Mining? • Automatische Suche durch die Daten, um Strukturen zu entdecken, Zusammenhänge zu erforschen, und Vorhersagen zu machen • Data Mining bietet Ergebnisse für : • • • • • • • Vorhersage des Kundenverhaltens (Classification) Vorhersage oder Schätzen des Wertes (Regression) Segmentierung (Clustering) Faktoren entdecken, die zu einer Fragestellung gehören (Attribute Importance) Finde Profile, Zielgruppen oder Zielelemente (Decision Trees) Zusammenhänge entdecken und Warenkorbanalysen (Associations) Datenausreißer (Anomaly Detection) Oracle Data Mining Algorithmen Probleme Algorithmen Anwendung Classification Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text Regression Multiple Regression (GLM) Support Vector Machine Classical statistical technique Wide / narrow data / text One Class SVM Lack examples of target field Anomaly Detection Attribute Importance Association Rules Minimum Description Length (MDL) A1 A2 A3 A4 A5 A6 A7 Apriori Hierarchical K-Means Hierarchical O-Cluster Clustering Feature Extraction Nonnegative Matrix Factorization F1 F2 F3 F4 Attribute reduction Identify useful data Reduce data noise Market basket analysis Link analysis Product grouping Text mining Gene and protein analysis Text analysis Feature reduction SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI GUI für Daten Analysten SQL Developer Extension (OTN download) Daten untersuchen – Neue Zusammenhänge entdecken Aufbauen und Anwenden von Modellen Vorhersagen modellieren Aufbau und Verteilen von Workflows und SQL Code Oracle Data Miner Nodes (Partial List) Tabellen und Views Transformationen Data Analyse Modellbildung Text Oracle Data Miner 11g Release 2 GUI Churn Demo—Simple Conceptual Workflow Churn models to product and “profile” likely churners In-Database Data Mining Traditional Analytics Oracle Data Mining Results Data Import Data Mining Model “Scoring” Data Preparation and Transformation Savings Data Mining Model Building Data Prep & Transformation Model “Scoring” Data remains in the Database Embedded data preparation Data Extraction Cutting edge machine learning algorithms inside the SQL kernel of Database Model “Scoring” Embedded Data Prep Model Building Data Preparation Hours, Days or Weeks Source Data • Faster time for “Data” to “Insights” • Lower TCO—Eliminates • Data Movement • Data Duplication • Maintains Security Dataset s/ Work Area Analytic al Process ing Process Output Target Secs, Mins or Hours SQL—Most powerful language for data preparation and transformation Data remains in the Database InDatabase - Mining 11g Statistische & Analytische Fkt. (Free) Ranking functions rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions Direct inter-row reference using offsets Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear regression Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Statistics Descriptive Statistics DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, median, stats_mode, variance, standard deviation, quantile values, +/- n sigma values, top/bottom 5 values Correlations Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, ChiSquared Test, Normal, Uniform, Weibull, Exponential Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition Oracle Data Mining und Unstrukturierte Daten Oracle Data Mining untersucht auch unstrukturierte Daten, wie “Texte” Inklusive Freitext und Kommentare in ODM Modellen Cluster and Klassifizierung von Dokumenten Oracle Text für die Vorverarbeitung Real-time Klassifizierung für Kundendaten On-the-fly, auf einzelne Datensätze angewendet (z.B. vom Call Center) Select prediction_probability(CLAS_DT_5_2, 'Yes' USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status, 250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; Call Center Social Media Branch ECM BI Get Advice Web Email CRM Mobile Exadata + Data Mining 11g Release 2 “DM Scoring” weitergeleitet auf den Storage! schneller In 11g Release 2, SQL Vorhersagen und Oracle Data Mining Modelle werden In die Storage Zellen verlagert z.B: Wechselwillige Kunden in den USA: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; Oracle Communications Industry Data Model Beispiel Bessere Informationen für OBIEE Dashboards ODM Vorhersagen & Wahrscheinlichkeiten sind verfügbar aus der Datenbank heraus Weitere Beispiele für den Einsatz von ODM Polizei – Verbrechensvorhersage Geldwäsche – Konzept unter Verwendung von ODM zur Ermittlung Oracle CRM – Unterstützung des Kampagnenmanagements Oracle Telekomunikation Datenmodell – Chrun und CLTV Prozessananlyse - in Zusammenarbeit mit Robotron ... Weitere Beispiele bei OBE Zusammenfassung Advanced Analytics direkt in der DB Vorteile … Datentransformation ohne Materialisierung Definition von Views Pipelined Table Functions Unterstützung auch "ausgefallener" Datentypen Ausnutzung des Optimizers Skalierbarkeit auch bei großen Datenmengen Durchgängige Sicherheitskonzepte Virtual Private Database / Row Level Security Schutz vor dem DBA durch Database Vault Greift auch für die Anwendung der Ergebnisse Kontrolle über Ressourcenverbrauch Resource Manager / Enterprise Manager Grid Control Fragen & Antworten