IBM Smarter Analytics für „Big Data” Technologie Workshop „Big Data” FZI Forschungszentrum Informatik am Karlsruher Institut für Technologie 22. Juni 2015 Axel J. Schwarz Mobil: +49 171 5619419 E-Mail: [email protected] IBM Smarter Analytics für „Big Data” 1 v1.3 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ Analytics is expanding from enterprise data to big data, creating new opportunities for competitive advantage Traditional Approach Structured, analytical, logical New Approach Creative, holistic thought, intuition Hadoop & Streaming Data Data Warehouse Web Logs, URLs Transaction Data Contact Center notes Customer Data Core System Data Structured Repeatabl e Linear Enterprise Integration Unstructured Exploratory Iterative Text Data: emails, chats Social Data Log data 3rd Party Data Traditional Sources 2 Geolocation data New Sources © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ Leverage more of the data being captured TRADITIONAL APPROACH All available information Analyze small subsets of information 3 BIG DATA APPROACH Analyzed information All available information analyzed Analyze all information © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ Reduce effort required to leverage data TRADITIONAL APPROACH Small amount of carefully organized information Carefully cleanse information before any analysis 4 BIG DATA APPROACH Large amount of messy information Analyze information as is, cleanse as needed © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ Data leads the way—and sometimes correlations are good enough TRADITIONAL APPROACH Hypothesis Question Data Exploration Answer Data Insight Correlation Start with hypothesis and test against selected data 5 BIG DATA APPROACH Explore all data and identify correlations © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ Leverage data as it is captured TRADITIONAL APPROACH BIG DATA APPROACH Data Analysis Data Repository Analysis Insight Insight Analyze data after it’s been processed and landed in a warehouse or mart 6 Analyze data in motion as it’s generated, in real-time © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ - Summary Analyze small subsets of information Analyzed information All available information analyzed BIG DATA APPROACH TRADITIONAL APPROACH All available information Analyze all information Hypothesis Question Data Exploration Answer Data Insight Correlation Start with hypothesis and test against selected data 7 Explore all data and identify correlations © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Paradigm shift enabled by „Big Data“ - Summary Analyze small subsets of information EXACT EVERY THING SLOW LACK OF CONTROL Analyzed information Hypothesis Question Answer Data Start with hypothesis and test against selected data 8 Data Exploration Insight Correlation All available information analyzed BIG DATA APPROACH TRADITIONAL APPROACH All available information Analyze all information Explore all data and identify correlations © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” IBM Big Data & Analytics Referenz-Architektur Komponenten - Überblick 9 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” IBM Big Data & Analytics Referenz-Architektur Komponenten - Detaillierung 10 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” IBM Big Data & Analytics Referenz-Architektur Komponenten – Fokus 11 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark IBM Investing in Four Catalysts for Big Data Adoption 12 Open Source Innovation Technical Standards Familiar Interfaces & Integration with Established Tools New Analytics Capabilities © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache …potentially the most significant open source project of the next decade. To further accelerate open source innovation for the Spark ecosystem, IBM is taking the following actions: IBM will build Spark into the core of the company’s analytics and commerce platforms. IBM's Watson Health Cloud will leverage Spark as a key under- pinning for its insight platform, helping to deliver faster time to value for medical providers and researchers as they access new analytics around population health data. IBM will open source its breakthrough IBM SystemML machine learning technology and collaborate with Databricks to advance Spark’s machine learning capabilities. IBM will offer Spark as a Cloud service on IBM Bluemix to make it possible for app developers to quickly load data, model it, and derive the predictive artifact to use in their app. IBM will commit more than 3,500 researchers and developers to work on Spark-related projects […], and open a Spark Technology Center in San Francisco […]. IBM will educate more than 1 million data scientists and data engineers on 13 Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and BigData University MOOC. © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark The Combination: The Flexibility of Spark on a Stable Hadoop Platform 14 Unlimited Scale Ease of Development In-Memory Performance Enterprise Platform Wide Range of Data Formats Combine Workflows © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark Die wichtigsten Bewohner des „Hadoop-Zoos” Quelle: Uwe Seiler: Zoo voller Gehege, in: iX Developer Big Data, 2/2015, S.37 15 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark What’s the scoop with Hadoop? Der Link zum Video findet sich im Anhang dieser Präsentation. 16 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark Spark Libraries Spark SQL Spark Streaming GraphX MLlib SparkR Apache Spark 17 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark Spark on Hadoop Spark SQL Spark Streaming GraphX MLlib SparkR Apache Spark Slave node 1 18 Apache Hadoop-YARN Resource management Apache Hadoop-HDFS Storage management Slave node 2 … Slave node n Compute layer © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark IBM Open Platform with Apache Hadoop 100% open source code Commitment to currency: “days, not months” Includes Spark Free for production use Decoupled Apache Hadoop from IBM analytics and data science technologies Production support offering available IBM Open Platform with Apache Hadoop HDFS MapReduce Spark Hive HCatalog Pig YARN Ambari HBase Flume Sqoop Solr/Lucene Apache Open Source Components 19 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark IBM is Committed to Open Source Open source technologies are the base for IBM software and solutions IBM’s long history of deep open source commitment Apache Software Foundation: Founding member in 1999 Cloud Foundry: #1 contributor; Basis for Bluemix OpenStack: #4 contributor; Basis for IBM’s IaaS Linux: #3 contributor; IBM first enterprise backer of Linux Hadoop/Spark: Extensive investment in open source contribution; Integration with Analytics software Application Systems Infrastructure 20 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” In-Memory „Big Data“ Processing - Apache Spark IBM BigInsights for Apache Hadoop for Apache Hadoop IBM BigInsights Data Scientist Text Analytics IBM BigInsights Analyst Machine Learning with Big R IBM BigInsights Enterprise Management Big R POSIX Distributed File System Big SQL Big SQL BigSheets BigSheets Multi-workload, Multi-tenant scheduling IBM Open Platform with Apache Hadoop 21 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Link: http://www.sidap.de 22 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Echtzeit-Betriebsanalyse – Betrieb auf Basis globaler Erfahrung Otto von Bismarck, Winston Churchill: „Der Weise lernt von den Fehlern der Anderen, der Dumme aus den Eigenen” Device: Gerät, Apparat, Einrichtung, Verfahren, Einheit 23 Quelle: Dr. T. Pötter, Dr. M. Steffen (beide Bayer Technology Services) © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Echtzeit-Betriebsanalyse – Betrieb auf Basis globaler Erfahrung Quelle: Dr. T. Pötter, Dr. M. Steffen (beide Bayer Technology Services) 24 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Echtzeit-Betriebsanalyse – Betrieb auf Basis globaler Erfahrung Quelle: Dr. T. Pötter, Dr. M. Steffen (beide Bayer Technology Services) 25 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Echtzeit-Betriebsanalyse – Betrieb auf Basis globaler Erfahrung Die Qualität der Vorhersagen von Fehlern wächst mit der Anzahl der Einträge in die Datenbasis. Quelle: Dr. T. Pötter, Dr. M. Steffen (beide Bayer Technology Services) 26 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Datenaggregation im Rahmen von SIDAP Quelle: Dr. T. Pötter, Dr. M. Steffen (beide Bayer Technology Services) 27 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Use-Cases 28 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Use-Cases und Aufgabenbereiche der Partner 29 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: SIDAP Echtzeit-Betriebsanalyse (Operations Analysis) Raw Logs and Machine Data Real-time Monitoring InfoSphere Streams Capture Data Stream SPSS Modeler Identify Anomaly Decision Management Historical Reporting and Analysis InfoSphere BigInsights Raw Data SPSS Modeler Aggregate Results Dashboard/BI Predict and Classify Data Warehouse SPSS Modeler Predict and Score 30 Store Results Federated Navigation and Discovery © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM’s Internet of Things Foundation Secure Device Registration Scalable Device Connectivity Historian Visual wiring PAYG SaaS pricing Powered by IBM MessageSight technology Manage Assemble Collect Connect 31 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IoT becomes a Composable Business IoT end-end solutions Connected appliance solutions, Smarter home solutions… IoT-related Bluemix services Rules, Push, Geo location, Analytics, Asset management, Predictive Maintenance… App tips open community IoT Foundation Secure Device Registration, Scalable Device Connectivity, Historian, Visual wiring Device recipe open community Devices & Gateways 32 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Bluemix – How to build a smarter App Der Link zum Video findet sich im Anhang dieser Präsentation. 33 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Internet of Things Foundation 34 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Internet of Things Foundation – Status Quo 35 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Internet of Things Foundation – Status Quo 36 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Internet of Things Foundation – Status Quo 37 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Internet of Things Foundation – Status Quo 38 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things IBM Internet of Things Foundation – Status Quo 39 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things Device Recipes make it faster Connect 40 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things Connect Application Connection Publish the same data to many applications with MQTT Visualisation on app or Brower based interface App on Bluemix Access control via Application Registration & Secure Token Compose with other IoT Services in Bluemix e.g. HD Insights, Notification services, Twitter analytics 41 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things Beispiel 42 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” IBM Smarter Analytics für „Big Data” Exkurs: Internet of Things Beispiel 43 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Empfehlung Ia IBM Bluemix Link: http://www.ibm.com/bluemix 44 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Empfehlung Ib IBM Internet of Things Zone auf Bluemix IBM Internet of Things Foundation IoT Zone in Bluemix ibm.biz/try_iot oder https://bluemix.net/solutions/iot 45 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Empfehlung II IBM Watson Analytics Link: http://www.ibm.co m/analytics/watson -analytics/ 46 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Empfehlung III „Big Data“ University Link: http://bigdatauniversity.co m/ 47 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Dipl. Wirtsch.-Ing. Axel J. Schwarz Mobik: +49 171 5619419 E-Mail: [email protected] Software Client Architect Vielen Dank! 48 © 2015 IBM Corporation IBM Smarter Analytics für „Big Data” Quellen und Referenzen Literatur/Blog-Beiträge Ryan Baxter: Bluemix and the Internet of Things http://ryanjbaxter.com/2014/07/16/bluemix-and-the-internet-of-things/ Bernard Marr: Spark Or Hadoop -- Which Is The Best Big Data Framework? http://www.forbes.com/sites/bernardmarr/2015/06/22/spark-or-hadoop-which-is-the-best-big-data-framework/2/ Paul Miller - IBM Backs Apache Spark For Big Data Analytics http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/ Videos IBM IBM Big Data: How it works - https://www.youtube.com/watch?v=u5jWC89xBzI What’s the Scoop with Hadoop? - https://www.youtube.com/watch?v=BcGqiJXcDo8 How to build a smarter app - https://www.youtube.com/watch?v=E5TKNbWl2Co Learn how to create a mobile app quickly using IBM BlueMix! - https://www.youtube.com/watch?v=DbXN2mz90b0 Links IBM Emerging Technology - Node-RED: http://nodered.org IBM developerWorks - IoT: https://www.ibmdw.net/iot/ IBM developerWorks - Build a cloud-ready temperature sensor with the Arduino Uno and the IBM IoT Foundation: http://www.ibm.com/developerworks/cloud/library/cl-bluemix-arduino-iot1/index.html Bluemi Bildquellen IBM sowie lizenzfreie digitale Bilder der Medien-Bibliotheken: 123RF (http://www.123rf.com) und Stock.XCHNG (http:///www.sxc.hu). 49 © 2015 IBM Corporation