Big Data Live selbst analysieren Hands on Workshop zu IBM InfoSphere Big Insights Harald Gröger Wilfried Hoge Gerhard Wenzel IBM © 2013 IBM Corporation Agenda 15:00 - 15:10 Einführung IBM Big Data Plattform und BigInsights 15:15 - 15:25 Lab 1: Managing your big data environment 15:25 - 16:05 Lab 2: Analyzing big data with BigSheets 16:05 - 16:10 Demo BigSheets Highlights 16:10 - 16:20 Demo Textanalyse Highlights Was ist Big Data? Volume Variety Velocity Veracity Data at Scale Data in Many Forms Data in Motion Data Uncertainty Analysis of streaming data to enable decisions within fractions of a second. Managing the reliability and predictability of inherently imprecise data types. Terabytes to petabytes of data Structured, unstructured, text, multimedia Die IBM Big Data Zonen-Architektur Real-time Analytics Intelligence Analysis Data in Motion Integrated Exploration Ingestion and Integration Decision Management Streams Data at Rest ETL, Quality, MDM Data in Many Forms Landing, Analytics and Archive BI and Predictive Analytics Warehouse / Marts Navigation and Discovery MapReduce Hadoop Information Governance, Security and Business Continuity Was ist Hadoop? Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. MapReduce - The framework that understands and assigns work to the nodes in a cluster. HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes Scalable – add nodes without changing data formats, how data is loaded, how jobs are written, or the applications on top Cost effective – massively parallel computing on commodity servers with sizeable decrease in storage cost, which makes it affordable to model all your data Flexible – schema-less, can absorb any type of data, data from multiple sources can be joined and aggregated in arbitrary ways enabling deep analyses Fault tolerant – loss of a node results in work redirect to another location of the data and continues processing Enterprise class Umfang der IBM BigInsights Hadoop-Distribution PureData for Hadoop - Appliance simplicity Enterprise Edition Sold by # of terabytes managed Quick Start Edition New for V2.1. Free. Non-production only Basic Edition Free download - Jaql - Integrated install Apache Hadoop Enterprise ready - Integrated web console - Administrative tools, security - RDBMS, warehouse connectivity - Enterprise Integration - Performance Optimization - Pre-built applications Analytics included - Visualization Capabilities - Spreadsheet-style tool - Big SQL - Text analytics - Eclipse development -- Accelerators PureData for Hadoop brings BigInsights as an appliance form factor to the market Breadth of capabilities 6 © 2013 IBM Corporation Generelle Informationen • Name • Hostname der VM = bivm • Login • Benutzer = biadmin • Kennwort = biadmin Tutorial - Managing your Big Data environment • Dauer ca. 10 Minuten • Start „BigInsights Web Console“ über Desktop Icon, • dann weiter mit Chapter 2 / Lesson 1 / Schritt 3 (Seite 4). Tutorial - Analyzing Big Data with BigSheets • Dauer ca. 40 Minuten • Alle Prerequisites sind bereits erfüllt. • Die Daten sind heruntergeladen und importiert. • Start im Files Tab der BigInsights Web Console • mit Lesson 1 / Schritt 3 (Seite 14), (hdfs/biginsights/sheets/Watson_data_preloaded) • Ende nach Lesson 6 / Schritt 3 (Seite 21). Console Demo BigSheets Demo Blog News Spreadsheet Format From unstructured text to formatted spreadsheets and charts Chart Text Analytics Demo generate Labels / Examples AQL Regex / Dictionary unstructured text From unstructured text documents to text analytics result table text highlight create AQL Candidates combination of regex and dictionaries plus distance, case, ... AQL Filter Result Table result table duplicates, irrelevant candidates, ... Thank You! © 2013 IBM Corporation