Faculty of Science Department of Informatics University of Fribourg, Switzerland Inductive Fuzzy Classification in Marketing Analytics Doctoral Thesis presented to the Faculty of Science of the University of Fribourg, Switzerland for the award of the academic grade of Doctor of Computer Science, Doctor scientiarum informaticarum, Dr. sc. inf. by Michael Alexander Kaufmann from Recherswil SO Thesis No: 1751 Bern: Ruf 2012 2 For Alice. Follow the white rabbit. “The Caterpillar took the hookah out of its mouth and yawned once or twice, and shook itself. Then it got down off the mushroom, and crawled away into the grass, merely remarking as it went, “One side will make you grow taller, and the other side will make you grow shorter.” “One side of what? The other side of what? “ thought Alice to herself. “Of the mushroom," said the Caterpillar, just as if she had asked it aloud; and in another moment it was out of sight.“ Lewis Carrol (1916, p. 29) 3 4 Acknowledgements A scientific thesis organically evolves from a primeval soup of ideas originating from countless individuals past and present. I merely tamed and organized these ideas and developed them further. In this light, I thank all fellow human beings who made my thesis possible. I thank my first advisor, Professor Andreas Meier, for teaching me how to conduct scientific research. I will always remember those great doctoral seminars we had all across Switzerland. Also, I thank my second and third advisors, Professors Laurent Donzé and Kilian Stoffel, for agreeing to evaluate my dissertation. I thank my fellow doctoral students in the Information Systems Research Group, Daniel Fasel, Edy Portmann, Marco Savini, Luis Teràn, Joël Vogt, and Darius Zumstein for inspiring me with their research and for the all good times we had at the educational and social events we attended together. Furthermore, I thank my students Cédric Graf and Phillipe Mayer, who implemented my research ideas with prototypes in their master’s theses. The former achieved such a quality that we decided to publish his results in a book chapter. Last, but not least, I thank David Wyder, professional data miner, for allowing me to use PostFinance’s information system as a research lab for the first experiments using my research approach, which set the founding stone to my thesis proposal and to my first publication. Special thanks go to friends and family for supporting me through years of hard work. I know it wasn’t always easy with me. Rubigen, Switzerland, February 29th, 2012 Michael Alexander Kaufmann 5 6 Abstract “Inductive fuzzy classification” (IFC) is the process of assigning individuals to fuzzy sets for which membership functions are based on inductive inferences. In this thesis, different methods for membership function induction and multivariate aggregation are analyzed. For univariate membership function induction, the current thesis proposes the normalized comparisons (ratios and differences) of likelihoods. For example, a normalized likelihood ratio can represent a membership degree to an inductive fuzzy class. If the domain of the membership function is numeric, continuous membership functions can be derived using piecewise affine interpolation. If a target attribute is continuous, it can be mapped into the “Zadehan” domain of numeric truth-values between 0 and 1, and membership degrees can be computed by a normalized ratio of likelihoods of fuzzy events. A methodology for multivariate IFC for prediction has been developed a part of this thesis: First, data is prepared into a matrix format. Second, the relevant attributes are selected. Third, for each relevant attribute, a membership function to the target class is induced. Fourth, transforming the attributes into membership degrees in the inductive fuzzy target class fuzzifies these attributes. Fifth, for every data record, the membership degrees of the single attribute values are aggregated into a membership degree in the target class. Finally, the prediction accuracy of the inductive membership degrees in comparison to the target variable is evaluated. The proposed membership function induction method can be applied to analytics for selection, visualization, and prediction. First, transformation of attributes into inductive membership degrees in fuzzy target classes provides a way to test the strength of target associations of categorical and numerical variables using the same measure. Thus, relevant attributes with a high target correlation can be selected. Second, the resulting membership functions can be visualized as a graph. This provides an intuitive depiction of target associations for different values of relevant attributes and allows human decision makers to recognize interesting parts of attribute domains regarding the target. Third, transformation of attribute values into membership degrees in inductive fuzzy classes can improve prediction of statistical models because nonlinear associations can be represented in membership functions. In marketing analytics, these methods can be applied in several domains. Customer analytics on existing data using attribute selection and visualization of inductive membership functions provide insights into different customer classes. Product analytics can be improved by 7 evaluating likelihood of product usage in the data for different customer characteristics with inductive membership functions. Transforming customer attributes into inductive membership degrees, which are based on predictive models, can optimize target selection for individual marketing and enhance the response rate of campaigns. This can be embedded in integrated analytics for individual marketing. A case study is presented, in which IFC was applied in online marketing of a Swiss financial service provider. Fuzzy classification was compared to crisp classification and to random selection. The case study showed that, for individual marketing, a scoring approach can lead to better response rates than a segmentation approach because of compensation of threshold effects. A prototype was implemented that supports all steps of the prediction methodology. It is based on a script interpreter that translates inductive fuzzy classification language (IFCL) statements into corresponding SQL commands and executes them on a database server. This software supports all steps of the proposed methodology, including data preparation, membership function induction, attribute selection, multivariate aggregation, data classification and prediction, and evaluation of predictive models. The software IFCL was applied in an experiment in order to evaluate the properties of the proposed methods for membership function induction. These algorithms were applied to 60 sets of real data, of which 30 had a binary target variable and 30 had a gradual target variable, and they were compared to existing methods. Different parameters were tested in order to induce an optimal configuration of IFC. Experiments showed a significant improvement in average predictive performance of logistic regression for binary targets and regression trees for gradual targets when, prior to model computation, the attributes were inductively fuzzified using normalized likelihood ratios or normalized likelihood differences respectively. 8 Zusammenfassung Induktive unscharfe Klassifikation (inductive fuzzy classification, IFC) ist der Prozess der Zuordnung von Individuen zu unscharfen Mengen, deren Zugehörigkeitsfunktionen auf induktiven Schlussfolgerungen basieren. In der vorliegenden These werden verschiedene Methoden für die Induktion von Zugehörigkeitsfunktionen und deren multivariaten Aggregation analysiert. Es wird vorgeschlagen, für die Induktion von univariaten Zugehörigkeitsfunktionen Vergleiche (Verhältnisse und Differenzen) von Wahrscheinlichkeiten (Likelihoods) zu normalisieren. Zum Beispiel kann das normalisierte Wahrscheinlichkeitsverhältnis (normalized Likelihood Ratio, NLR) einen Zugehörigkeitsgrad zu einer induktiven unscharfen Klasse repräsentieren. Wenn der Wertebereich der Zugehörigkeitsfunktion nummerisch ist, können stetige Zugehörigkeitsfunktionen über eine stückweise affine Interpolation hergeleitet werden. Wenn die Zielvariable stetig ist, kann sie in den „Zadeh’schen“ Wertebereich der nummerischen Wahrheitswerte zwischen 0 und 1 abgebildet werden; und die Zugehörigkeitsgrade können als normalisierte Verhältnisse von empirischen bedingten Wahrscheinlichkeiten unscharfer Ereignisse (Likelihoods of fuzzy events) berechnet werden. Eine Methode für multivariate induktive unscharfe Klassifikation wurde in dieser These entwickelt. Als erstes werden die Daten in einem Matrix-Format aufbereitet. Als zweites werden die relevanten Attribute ausgewählt. Drittens wird für jedes relevante Attribut eine Zugehörigkeitsfunktion zur Zielklasse mittels normalisierten Likelihood-Vergleichen induziert. Viertens werden die Attributwerte in Zugehörigkeitsgrade zur induktiven unscharfen Zielklasse transformiert. Fünftens werden die Zugehörigkeitsgrade der einzelnen Attributwerte jedes Datensatzes in einen Zugehörigkeitsgrad in der Zielklasse aggregiert. Schliesslich wird die Vorhersageleistung der induktiven Zugehörigkeitsgrade im Vergleich mit der effektiven Zielvariable evaluiert. Die vorgeschlagenen Methoden können in der Analytik für Selektion, Visualisierung und Prognose angewendet werden. Erstens bietet die Transformation von Attributen in Zugehörigkeitswerte zu induktiven unscharfen Mengen ein Mittel, um die Stärke der Zielassoziation von nummerischen und kategorischen Variablen mit der gleichen Metrik zu testen. So können relevante Attribute mit einer hohen Korrelation mit dem Ziel selektiert werden. Zweitens können die resultierenden Zugehörigkeitsfunktionen als Graphen visualisiert werden. Das bietet eine intuitive Darstellung der Zielassoziation von 9 verschiedenen Werten von relevanten Attributen, und ermöglicht es menschlichen Entscheidungsträgern, die interessanten Teilbereiche der Attributdomänen bezüglich des Ziels zu erkennen. Drittens kann die Transformation der Attributwerte in Zugehörigkeitswerte zu induktiven unscharfen Klassen die Prognose von statistischen Modellen verbessern, weil nicht-lineare Zusammenhänge in den Zugehörigkeitsfunktionen abgebildet werden können. In der Marketing-Analytik können diese Methoden in verschiedenen Bereichen angewendet werden. Kundenanalytik aufgrund von bestehenden Daten mittels Attributselektion und Visualisierung von induktiven Zugehörigkeitsfunktionen liefert Erkenntnisse über verschiedene Kundenklassen. Produktanalytik kann verbessert werden, indem die Wahrscheinlichkeit der Produktnutzung in den Daten für verschiedene Kundenmerkmale mit induktiven Zugehörigkeitsfunktionen evaluiert wird. Zielgruppenselektion im individualisierten Marketing kann optimiert werden, indem Kundenattribute in induktive Zugehörigkeitsgrade transformiert werden, um die Rücklaufrate von Kampagnen zu erhöhen, welche auf prädiktiven Modellen basieren. Dies kann für individualisiertes Marketing in die integrierte Analytik eingebettet werden. Eine Fallstudie wird vorgestellt, in der die induktive unscharfe Klassifikation bei einem Schweizer Finanzdienstleister im online Marketing angewendet wurde. Die unscharfe Klassifikation wurde mit der scharfen Klassifikation und einer Zufallsauswahl verglichen. Dies zeigte, dass im individualisierten Marketing ein Scoring-Ansatz zu besseren Rücklaufquoten führen kann als ein Segmentierungsansatz, weil Schwellenwerteffekte kompensiert werden können. Ein Prototyp wurde implementiert, welcher alle Schritte der multivariaten IFC-Methode unterstützt. Er basiert auf einem SkriptInterpreter, welcher Aussagen der Sprache für die induktive unscharfe Klassifikation (Inductive Fuzzy Classification Language, IFCL) in entsprechende SQL-Kommandos übersetzt und auf einem Datenbankserver ausführt. Diese Software unterstützt sämtliche Schritte der vorgeschlagenen Methode, einschliesslich Datenvorbereitung, Induktion von Zugehörigkeitsfunktionen, Attributselektion, multivariate Aggregation, Datenklassifikation, Prognose, und die Evaluation von Vorhersagemodellen. Die Software wurde in einem Experiment angewendet, um die Eigenschaften der vorgeschlagenen Methoden der Induktion von Zugehörigkeitsfunktionen zu testen. Diese Algorithmen wurden bei sechzig Dateien mit realen Daten angewendet, davon bei dreissig mit binärer Zielvariable und dreissig mit einer graduellen Zielvariable. Sie wurden mit existierenden Methoden verglichen. Verschiedene Parameter wurden getestet, um eine optimale Konfiguration der 10 induktiven unscharfen Klassifikation zu induzieren. Die Experimente zeigten, dass die durchschnittliche Prognoseleistung der logistischen Regression für binäre Zielvariablen und der Regressionsbäume für graduelle Zielvariablen signifikant verbessert werden kann, wenn vor der Berechnung der Prognosemodelle die Attribute mit normalisierten Likelihood-Verhältnissen respektive mit normalisierten LikelihoodDifferenzen in Zugehörigkeitsgrade zu induktiven unscharfen Mengen transformiert werden. 11 12 Table of Contents Acknowledgements ..................................................................................... 5 Abstract ....................................................................................................... 7 Zusammenfassung ...................................................................................... 9 Table of Contents ...................................................................................... 13 List of Figures ........................................................................................... 17 List of Tables ............................................................................................. 19 1 A Gradual Concept of Truth ............................................................... 21 1.1 Research Questions ...................................................................... 22 1.2 Thesis Structure ........................................................................... 24 2 Fuzziness & Induction ........................................................................ 27 2.1 Deduction ...................................................................................... 27 2.1.1 Logic ....................................................................................... 27 2.1.2 Classification.......................................................................... 30 2.2 Fuzziness ...................................................................................... 33 2.2.1 Fuzzy Logic ............................................................................ 35 2.2.2 Fuzzy Classification............................................................... 40 2.3 Induction ....................................................................................... 43 2.3.1 Inductive Logic....................................................................... 43 2.3.2 Inductive Classification ......................................................... 46 2.4 Inductive Fuzzy Classification ☼ ................................................ 51 2.4.1 Univariate Membership Function Induction ....................... 52 2.4.2 Multivariate Membership Function Induction ☺ ................ 60 3 Analytics & Marketing ....................................................................... 63 3.1 Analytics ☼ ................................................................................... 63 3.1.1 Selection ................................................................................. 67 3.1.2 Visualization .......................................................................... 69 3.1.3 Prediction ............................................................................... 71 3.2 Marketing Analytics ☼ ................................................................ 75 3.2.1 Customer Analytics ............................................................... 76 3.2.2 Product Analytics .................................................................. 78 13 3.2.3 Fuzzy Target Groups ............................................................. 80 3.2.4 Integrated Analytics .............................................................. 82 3.2.5 Case Study: IFC in Online Marketing at PostFinance ☺ ... 85 4 Prototyping & Evaluation .................................................................. 93 4.1 Software Prototypes ..................................................................... 93 4.1.1 IFCQL..................................................................................... 93 4.1.2 IFCL ....................................................................................... 94 4.1.3 IFC-Filter for Weka ☼ ........................................................... 99 4.2 Empirical Tests .......................................................................... 105 4.2.1 Experiment Design .............................................................. 105 4.2.2 Results .................................................................................. 107 5 Precisiating Fuzziness by Induction ................................................ 115 5.1 Summary of Contribution .......................................................... 115 5.2 Discussion of Results ................................................................. 117 5.3 Outlook and Further Research .................................................. 119 6 Literature References ....................................................................... 123 A. Appendix .......................................................................................... 129 A.1. Datasets used in the Experiments ........................................... 129 A.1.1. Attribute Selection Filters ................................................. 133 A.1.2. Record Selection Filters ..................................................... 134 A.2. Database Queries...................................................................... 135 A.2.1. Database Queries for Experiments with Sharp Target Variables .......................................................................................... 135 A.2.2. Database Queries for Experiments with Numerical Target Variables .......................................................................................... 138 A.2.3. Database Queries for Both Types of Experiments ........... 143 A.3. Experimental Results Tables ................................................... 149 A.4. IFCL Syntax and Application .................................................. 157 A.4.1. Grammar Notation............................................................. 157 A.4.2. IFCL File Structure ........................................................... 157 A.4.3. Executing IFCL Files in Batch Mode ................................ 158 A.4.4. Connecting to the Database .............................................. 158 A.4.5. Droping a Database Table ................................................. 159 14 A.4.6. Executing an SQL Script ................................................... 159 A.4.7. Loading Data into The Database ...................................... 159 A.4.8. Inducing a Membership Function ..................................... 161 A.4.9. Classification of Data ......................................................... 162 A.4.10. Aggregating Multiple Variables ...................................... 162 A.4.11. Evaluating Predictions .................................................... 163 A.4.12. Data Preparation ............................................................. 164 A.4.13. Attribute Selection ........................................................... 165 A.5. Curriculum Vitae ...................................................................... 167 A.6. Key Terms and Definitions ...................................................... 167 ☺ This section is based on a publication by Kaufmann & Meier (2009) ☼ This section is based on a publication by Kaufmann & Graf (2012) 15 16 List of Figures Figure 1: The motivation of the current research was to develop and evaluate inductive methods for the automated derivation of membership functions for fuzzy classification and to propose possible applications to marketing analytics. ......................................... 23 Figure 2: Thesis structure. This thesis has three thematic categories that are reflected in the structure: theory, application, and technology. ......................................................................................... 25 Figure 3: There is no square. Adapted from “Subjective Contours” by G. Kaniza, 1976, ................................................................................... 33 Figure 4: Black or white: conventional yin and yang symbol with a sharp distinction between opposites, representing metaphysical dualism. ..................................................................................................... 34 Figure 5: Shades of grey: fuzzy yin and yang symbol with a gradation between opposites, representing metaphysical monism. ...... 34 Figure 6: A visualization of a classical set with sharp boundaries. ....... 36 Figure 7: A visualization of a fuzzy set.................................................... 36 Figure 8: Fuzzy set theory applied to the sorites paradox. .................... 37 Figure 9: Unsupervised IFC by percentile rank. .................................... 53 Figure 10: Computation of membership functions for numerical variables. ................................................................................................... 60 Figure 11: Proposed method for multivariate IFC. ................................. 60 Figure 12: Visualization of relevant attributes and their association with the target as an inductive membership function. ....... 70 Figure 14: Proposed schema for multivariate IFC for prediction. ......... 72 Figure 15: IFC prediction with binary target classes based on categorical attributes. ............................................................................... 73 Figure 16: IFC prediction with Zadehan target classes based on numerical attributes. ................................................................................ 73 Figure 17: Proposed areas of application of IFC to marketing analytics. ................................................................................................... 76 Figure 18: Schema of a fuzzy customer profile based on IFC-NLR, together with two examples of a fuzzy customer profile. ....................... 78 17 Figure 19: Inductive fuzzy classification for individual marketing. ...... 84 Figure 20: Analytics applied to individual marketing in the online channel at PostFinance. ........................................................................... 86 Figure 21: Membership function induction for a categorical attribute. ................................................................................................... 88 Figure 22: Membership function induction for a continuous attribute. ................................................................................................... 89 Figure 23: Architecture of the iFCQL interpreter. ................................. 94 Figure 24: Inductive fuzzy classification language (IFCL) system architecture. .............................................................................................. 95 Figure 25: Software architecture of the IFC-Filter............................... 100 Figure 26: Knowledge flow with IFC-Filters. ........................................ 101 Figure 27: Visualization of membership functions with the IFCFilter. ....................................................................................................... 103 Figure 28: Average prediction correlation for different aggregation methods for binary targets. .................................................................... 107 Figure 29: Average prediction correlation for different aggregation methods for fuzzified numerical targets. ............................................... 108 Figure 30: Average prediction correlation for combinations of membership function induction and aggregation methods for binary targets. ......................................................................................... 109 Figure 31: Average prediction correlation of predictions with linear versus percentile ITF, given regression trees as the aggregation method. ............................................................................... 109 Figure 32: Average prediction correlation for combinations of membership function induction and aggregation methods for fuzzified numerical targets. ................................................................... 110 Figure 33: Relationship between target linearity and IAF benefit for binary target variables. .................................................................... 111 Figure 34: Relationship between target linearity and IAF benefit for fuzzified numerical target variables. ............................................... 112 18 List of Tables Table 1 Proposed Formulae for Induction of Membership Degrees ....... 57 Table 2 Example of an Attribute Ranking regarding NLR/Target Correlation ................................................................................................ 68 Table 3 Conditional Probabilities and NLRs for a Categorical Attribute .................................................................................................... 87 Table 4 Optimization of the Gamma Parameter for Multivariate Fuzzy Aggregation .................................................................................... 90 Table 5 Resulting Product Selling Ratios per Target Group ................. 92 Table 6 Different Aggregation Methods used for Prediction Experiments ............................................................................................ 106 Table 7 Correlation and p-Value of Dataset Parameters and Improvement of Logistic Regression with IAF for Binary Target Variables ................................................................................................. 111 Table 8 Correlation and p-Value of Dataset Parameters and Improvement of Regression Trees with IAF for Zadean Target Variables ................................................................................................. 112 Table 9 Data Sets with Categorical Target Variable, Transformed into a Binary Target ............................................................................... 129 Table 10 Data Sets with Numerical Target Variable, Transformed into a Fuzzy Proposition µ (see Formula 46 and Formula 47) ............ 131 ↑ Table 11 Average Prediction Correlations per Dataset/Aggregation Method for Binary Targets ..................................................................... 149 Table 12 Average Prediction Correlations per Dataset and Aggregation Method for Numerical Target Variables .......................... 150 Table 13: Average Prediction Correlations for IAF on Data with Numerical Target Variables using Linear versus Percentile ITF ....... 151 Table 14 Average Prediction Correlations per Dataset and Three Different Combinations of Supervised Aggregation and IAF for Binary Target Variables ......................................................................... 152 Table 15 Average Prediction Correlations per Dataset and Three Different Combinations of Supervised Aggregation and IAF for Numerical Target Variables ................................................................... 153 19 Table 16 Relationship between Target Linearity and Improvement of Logistic Regression by IAF NLR for Binary Target Variables ......... 154 Table 17 Relationship between Target Linearity and Improvement of Regression Trees by IAF NLD for Numerical Target Variables ...... 155 20 1 A Gradual Concept of Truth “Would you describe a single grain of wheat as a heap? No. Would you describe two grains of wheat as a heap? No. … You must admit the presence of a heap sooner or later, so where do you draw the line?” (Hyde, 2008) How many grains does it take to constitute a heap? This question is known as the sorites paradox (Hyde, 2008). It exemplifies that our semantic universe is essentially vague, and with any luck, this vagueness is ordinal and gradual. This applies to all kinds of statements. Especially in science, different propositions or hypotheses can only be compared to each other with regard to their relative accuracy or predictive power. Fuzziness is a term that describes vagueness in the form of boundary imprecision. Fuzzy concepts are those that are not clearly delineated, such as the concept of a “heap of grain.” The classical notion of truth claims metaphysical dualism, and thus divides thought into exactly two categories: true and false. A gradual concept of truth leads our consciousness toward a metaphysical monism: all possible statements belong to the same class; but there is a gradation of degree. The continuum of propositions, ranging from completely false to completely true, contains all the information that is in-between. Fuzzy set theory provides a tool for mathematically precise definitions of fuzzy concepts, if those concepts can be ordered: assigning gradual membership degrees to their elements. This gradual concept of truth is the basis for fuzzy logic or approximate reasoning, as proposed by Zadeh (1975a) and Bellmann and Zadeh (1977). Fuzzy logic, based on the concept of fuzzy sets introduced by Zadeh (1965), allows propositions with a gradual truth-value and, thus, supports approximate reasoning, gradual and soft consolidation. Fuzzy propositions define fuzzy classes, which allow gradual, fuzzy class boundaries. In data analysis, or “the search for structure in data” (Zimmermann H. J., 1997), fuzzy classification is a method for gradation in data consolidation, as presented by Meier, Schindler, and Werro (2008) and Del Amo, Montero, and Cutello (1999). The application of fuzzy classification to marketing analytics (Spais & Veloutsou, 2005) has the advantage of precisiation (sic; Zadeh, 2008) of fuzzy concepts in the context of decision support for direct customer contact, as proposed by Werro (2008). This precisiation can be achieved by inducing membership functions to fuzzy target classes (Setnes, Kaymak, & van Nauta Lemke, 1998). 21 Inductive, or probabilistic, inference and fuzzy, or gradual, logic have been seen as incompatible, for example, by Elkan (1993). Whereas probabilistic induction can amplify experience, membership functions can precisiate vagueness. Combined, measured probabilities can be applied to precisely define the semantics of vague or fuzzy concepts by membership function induction. This thesis, eventually, demonstrates how probabilistic and fuzzy logics can be synthesized to constitute a method of inductive gradual reasoning for classification. 1.1 Research Questions Werro (2008) and Meier et al. (2008) proposed the application of Fuzzy classification to customer relationship management (CRM). A suggestion for further research by Werro (2008), namely the integration of data mining techniques into fuzzy classification software, has inspired the present thesis. As illustrated in Figure 1, the motivation of the current research was to develop and evaluate inductive methods for the automated derivation of membership functions for fuzzy classification and to propose possible applications to marketing analytics. The first motivation for research on “inductive fuzzy classification” (IFC) was the development of methods for automated derivations of understandable and interpretable membership functions to fuzzy classes. The aim was to develop and evaluate algorithms and methods to induce functions that indicate inductively inferred target class memberships. The second motivation was to improve marketing analytics with IFC. Kaufmann and Meier (2009) have presented a methodology and a case study for the application of IFC to predictive product affinity scoring for target selection. This thesis extends the application of this methodology to marketing in the field of integrated customer and product analytics in order to provide means to deal with fuzziness in marketing decisions and to enhance accountability by application of analytics to precisiate fuzzy marketing concepts, as proposed by Spais and Veloutsou (2005). The third motivation was to develop a prototype implementation of software for computing IFCs, showing the feasibility of the proposed algorithms as a proof of concept and allowing an evaluation of the proposed methodology. Seven research questions were developed at the beginning of the current research (Kaufmann, 2008). These questions guided the development of the thesis from the beginning. The answers are 22 presented in the subsequent document and summarized in the conclusions of the thesis. 1. What is the theoretical basis of IFC and what is its relation to inductive logic? 2. How can membership functions be derived inductively from data? 3. How can a business enterprise apply IFC in practice? 4. How can the proposed methods be implemented in a computer program? 5. How is IFC optimally applied for prediction? 6. Which aggregation methods are optimal for the multivariate combination of fuzzy classes for prediction? 7. Can it be statistically supported that the proposed method of IFC improves predictive performance? Figure 1: The motivation of the current research was to develop and evaluate inductive methods for the automated derivation of membership functions for fuzzy classification and to propose possible applications to marketing analytics. Following a constructive approach to business informatics and information systems research (Oesterle, et al., 2010), this thesis presents a design of new methods for IFC and proposes applications of 23 these new methods to analytics in marketing. Thus, the methodology for developing the thesis includes the following: The theoretical background is analyzed prior to presentation of the constructive design, a case study exemplifies the designed approach, software prototyping shows technical feasibility and enables experimental evaluation, and empirical data collection in combination with statistical inference is applied to draw conclusions about the proposed method. 1.2 Thesis Structure As shown in Figure 2, this thesis has three thematic categories that are reflected in the structure. First, the theoretical foundations of IFC are examined. This theory is based on a synthesis of fuzzy (gradual) and inductive (probabilistic) logics. Second, business applications of this approach are analyzed. A possible application is studied for analytic quantitative decision support in marketing. Third, technological aspects of the proposed method are examined by software prototypes for evaluation of the proposed constructs. The second chapter, which addresses theory, analyzes the theoretical foundations of IFC by approaching it from the viewpoints of logic, fuzziness, and induction. It presents proposals of likelihood-based methods for membership function induction. In Section 2.1, the basic concepts of logic and classification are analyzed. In Section 2.2, cognitive and conceptual fuzziness are discussed and approaches for resolution of conceptual fuzziness, fuzzy logic, and fuzzy classification are outlined. In Section 2.3, inductive classification, and induction automation are examined. In Section 2.4, the application of induction to fuzzy classification is explained, and a methodology for membership function induction using normalized ratios and differences of empirical conditional probabilities and likelihoods is proposed. The third chapter, which deals with designing applications of IFC, presents a study of analytics and examines the application of membership function induction to this discipline in marketing. In Section 3.1, the general methodology of logical data analysis, called analytics, is analyzed, and applications of IFC to three sub-disciplines, selection, visualization, and prediction, are proposed. In Section 3.2, applications of IFC to analytic marketing decision support, or marketing analytics (MA), are listed. A case study is presented, in which the proposed methods are applied to individual online marketing. The fourth chapter, which presents designs of technology for IFC, explains prototype implementations of the proposed IFC methodology and shows evaluation results. In Section 4.1, three prototype implementations of IFC are presented. The prototype of an inductive 24 fuzzy classification language (IFCL) is described. This description encompasses the architecture of the software and its functionality. Two additional prototypes that were developed in collaboration with master’s students guided by the author, iFCQL (Mayer, 2010) and IFC-Filter (Graf, 2010), are briefly discussed. In Section 4.2, a systematic experimental application of the IFCL prototype is described and the empirical results of testing the predictive performance of different parameters of the proposed methodology are statistically analyzed. Figure 2: Thesis structure. This thesis has three thematic categories that are reflected in the structure: theory, application, and technology. The fifth chapter, which concludes the thesis, summarizes the scientific contributions of the current research, discusses the results, and proposes further research topics for IFC. Literature references are indicated in the sixth chapter. Research details can be found in Chapter A, the appendix at the end of the thesis. 25 26 2 Fuzziness & Induction This chapter examines the foundations of IFC by analyzing the concepts of deduction, fuzziness, and induction. The first subsection explains the classical concepts of sharp and deductive logic and classification; in this section, it is presupposed that all terms are clearly defined. The second section explains what happens when those definitions have fuzzy boundaries and provides the tools, fuzzy logic and fuzzy classification, to reason about this. However, there are many terms that do not only lack a sharp boundary of term definition but also lack a priori definitions. Therefore, the third subsection discusses how such definitions can be inferred through inductive logic and how such inferred propositional functions define inductive fuzzy classes. Finally, this chapter proposes a method to derive precise definitions of vague concepts—membership functions—from data. It develops a methodology for membership function induction using normalized likelihood comparisons, which can be applied to fuzzy classification of individuals. 2.1 Deduction This subsection discusses deductive logic and classification, analyzes the classical as well as the mathematical (Boolean) approaches to propositional logic, and shows their application to classification. Deduction provides a set of tools for reasoning about propositions with a priori truth-values—or inferences of such values. Thus, in the first subsection, the concepts of classical two-valued logic and algebraic Boolean logic are summarized. The second subsection explains how propositional functions imply classes and, thus, provide the mechanism for classification. 2.1.1 Logic In the words of John Stuart Mill (1843), logic is “the science of reasoning, as well as an art, founded on that science” (p. 18). He points out that the most central entity of logic is the statement, called a proposition: The answer to every question which it is possible to frame, is contained in a proposition, or assertion. Whatever can be an object of belief, or even of disbelief, must, when put into words, assume the form of a proposition. All truth and all error lie in propositions. What, by a convenient misapplication of an abstract term, we call a truth, is simply a true proposition. (p. 27) 27 The central role of propositions indicates the importance of linguistics in philosophy. Propositions are evaluated for their truth, and thus, assigned a truth-value because knowledge and insight is based on true statements. Consider the universe of discourse in logic: The set of possible statements or propositions, 𝒫𝒫. Logicians believe that there are different levels of truth, usually two (true or false); in the general case, there is a set, 𝒯𝒯, of possible truth-values that can be assigned to propositions. Thus, the proposition 𝑝𝑝 ∈ 𝒫𝒫 is a meaningful piece of information to which a truth-value, 𝜏𝜏(𝑝𝑝) ∈ 𝒯𝒯, can be assigned. The corresponding mapping of 𝜏𝜏: 𝒫𝒫 ⟶ 𝒯𝒯 from propositions 𝒫𝒫 to truth-values 𝒯𝒯 is called a truth function. In general logic, operators can be applied to propositions. A unary operator, 𝑂𝑂 : 𝒫𝒫 ⟶ 𝒯𝒯, maps a single proposition into a set of transformed truth-values. Accordingly, a binary operator, 𝑂𝑂 : 𝒫𝒫 ×𝒫𝒫 ⟶ 𝒯𝒯, assigns a truth-value to a combination of two propositions, and an nary operator, 𝑂𝑂 : 𝒫𝒫 × ⋯×𝒫𝒫 ⟶ 𝒯𝒯, is a mapping of a combination of n propositions to a new truth-value. The logic of two-valued propositions is the science and art of reasoning about statements that can be either true or false. In the case of two-valued logic, or classical logic (𝐶𝐶𝐶𝐶), the set of possible truth values, 𝒯𝒯 : = {true, false}, contains only two elements, which partitions the class of imaginable propositions 𝒫𝒫 into exactly two subclasses: the class of false propositions and the class of true ones. With two truth-values, there is only one possible unary operator other than identity: A proposition, 𝑝𝑝 ∈ 𝒫𝒫, can be negated (not 𝑝𝑝), which inverts the truth-value of the original proposition. Accordingly, for a combination of two propositions, 𝑝𝑝 and 𝑞𝑞, each with two truth-values, there are 16 (2 ) possible binary operators. The most common binary logical operators are disjunction, conjunction, implication, and equivalence: A conjunction of two propositions, 𝑝𝑝 and 𝑞𝑞, is true if both propositions are true. A disjunction of two propositions, 𝑝𝑝 or 𝑞𝑞, is true if one of the propositions is true. An implication of 𝑞𝑞 by 𝑝𝑝 is true if, whenever 𝑝𝑝 is true, 𝑞𝑞 is true as well. An equivalence of two propositions is true if 𝑝𝑝 implies 𝑞𝑞 and 𝑞𝑞 implies 𝑝𝑝. Classical logic is often formalized in the form of a propositional calculus. The syntax of classical propositional calculus is described by the concept of variables, unary and binary operators, formulae, and truth functions. Every proposition is represented by a variable (e.g., 𝑝𝑝); every proposition and every negation of a proposition is a term; every combination of terms by logical operators is a formula; terms and formulae are themselves propositions; negation of the proposition 𝑝𝑝 is represented by ¬𝑝𝑝; conjunction of the two propositions 𝑝𝑝 and 𝑞𝑞 is represented by 𝑝𝑝 ∧ 𝑞𝑞; disjunction of the two propositions 𝑝𝑝 and 𝑞𝑞 is 28 represented by 𝑝𝑝 ∨ 𝑞𝑞; implication of the proposition 𝑞𝑞 by the proposition 𝑝𝑝 is represented by 𝑝𝑝 ⇒ 𝑞𝑞; equivalence between the two propositions 𝑝𝑝 and 𝑞𝑞 is represented by 𝑝𝑝 ≡ 𝑞𝑞; and there is a truth function, 𝜏𝜏 : 𝒫𝒫 ⟶ 𝒯𝒯 , mapping from the set of propositions 𝒫𝒫 into the set of truth values 𝒯𝒯. The semantics of propositional calculus are defined by the values of the truth function, as formalized in Formula 1 through Formula 5. 𝜏𝜏 ¬ 𝑝𝑝 ∶= 𝜏𝜏 𝑝𝑝 ∧ 𝑞𝑞 ∶= if 𝜏𝜏 𝑝𝑝 = true else false true. Formula 1 if (𝜏𝜏 𝑝𝑝 = 𝜏𝜏 𝑞𝑞 = true) else true false. Formula 2 if (𝜏𝜏 𝑝𝑝 = 𝜏𝜏 𝑞𝑞 = false) else false true. Formula 3 𝜏𝜏 𝑝𝑝 ∨ 𝑞𝑞 ∶= 𝜏𝜏 𝑝𝑝 ⇒ 𝑞𝑞 ∶= 𝜏𝜏 ¬ 𝑝𝑝 ∨ 𝑞𝑞 𝜏𝜏 𝑝𝑝 ≡ 𝑞𝑞 ∶= 𝜏𝜏 (𝑝𝑝 ⇒ 𝑞𝑞 ∧ 𝑞𝑞 ⇒ 𝑝𝑝) Formula 4 Formula 5 George Boole (1847) realized that logic can be calculated using the numbers 0 and 1 as truth values. His conclusion was that logic is mathematical in nature: I am then compelled to assert, that according to this view of the nature of Philosophy, Logic forms no part of it. On the principle of a true classification, we ought no longer to associate Logic and Metaphysics, but Logic and Mathematics. (p. 13) In Boole’s mathematical definition of logic, the numbers 1 and 0 represents the truth-values and logical connectives are derived from arithmetic operations: subtraction from 1 as negation and multiplication as conjunction. All other operators can be derived from these two operators through application of the laws of logical equivalence. Thus, in Boolean logic (𝐵𝐵𝐵𝐵), the corresponding propositional calculus is called Boolean algebra, stressing the 29 conceptual switch from metaphysics to mathematics. Its syntax is defined in the same way as that of 𝐶𝐶𝐶𝐶, except that the Boolean truth function, 𝜏𝜏 : 𝒫𝒫 ⟶ 𝒯𝒯 , maps from the set of propositions into the set of Boolean truth values, 𝒯𝒯 ∶= 0,1 , that is, the set of the two numbers 0 and 1. The Boolean truth function 𝜏𝜏 defines the semantics of Boolean algebra. It is calculated using multiplication as conjunction and subtraction from 1 as negation, as formalized in Formula 6 through Formula 8. Implication and equivalence can be derived from negation and disjunction in the same way as in classical propositional calculus. 𝜏𝜏 ¬ 𝑝𝑝 ∶= 1 − 𝜏𝜏 𝑝𝑝 Formula 6 𝜏𝜏 𝑝𝑝 ∧ 𝑞𝑞 ∶= 𝜏𝜏 𝑝𝑝 ∙ 𝜏𝜏 𝑞𝑞 Formula 7 𝜏𝜏 𝑝𝑝 ∨ 𝑞𝑞 ∶= ¬ ¬p ∧ ¬ q = 1 − 1 − 𝜏𝜏 𝑝𝑝 ∙ 1 − 𝜏𝜏 𝑞𝑞 2.1.2 Formula 8 Classification Class logic, as defined by Glubrecht, Oberschelp, and Todt (1983), is a logical system that supports statements applying a classification operator. Classes of objects can be defined according to logical propositional functions. According to Oberschelp (1994), a class, 𝐶𝐶 = 𝑖𝑖 ∈ 𝑈𝑈 | 𝛱𝛱(𝑖𝑖) , is defined as a collection of individuals, 𝑖𝑖, from a universe of discourse, 𝑈𝑈, satisfying a propositional function, 𝛱𝛱, called the classification predicate. The domain of the classification operator, . | . ∶ ℙ ⟶ 𝑈𝑈 ∗ , is the class of propositional functions ℙ and its range is the powerclass of the universe of discourse 𝑈𝑈 ∗ , which is the class of possible subclasses of 𝑈𝑈. In other words, the class operator assigns subsets of the universe of discourse to propositional functions. A universe of discourse is the set of all possible individuals considered, and an individual is a real object of reference. In the words of Bertrand Russell (1919), a propositional function is “an expression containing one or more undetermined constituents, such that, when values are assigned to these constituents, the expression becomes a proposition” (p. 155). 30 In contrast, classification is the process of grouping individuals who satisfy the same predicate into a class. A (Boolean) classification corresponds to a membership function, 𝜇𝜇 ∶ 𝑈𝑈 ⟶ {0,1}, which indicates with a Boolean truth-value whether an individual is a member of a class, given the individual’s classification predicate. As shown by Formula 9, the membership 𝜇𝜇 of individual 𝑖𝑖 in class 𝐶𝐶 = 𝑖𝑖 ∈ 𝑈𝑈 | 𝛱𝛱(𝑖𝑖) is defined by the truth-value 𝜏𝜏 of the classification predicate 𝛱𝛱(𝑖𝑖). In Boolean logic, the truth-values are assumed to be certain. Therefore, classification is sharp because the truth values are either exactly 0 or exactly 1. 𝜇𝜇 𝑖𝑖 ≔ 𝜏𝜏 𝛱𝛱 𝑖𝑖 ∈ 0,1 Formula 9 Usually, the classification predicate that defines classes refers to attributes of individuals. For example, the class “tall people” is defined by the predicate “tall,” which refers to the attribute “height.” An attribute, 𝑋𝑋, is a function that characterizes individuals by mapping from the universe of discourse 𝑈𝑈 to the set of possible characteristics 𝜒𝜒 (Formula 10). 𝑋𝑋 ∶ 𝑈𝑈 ⟶ 𝜒𝜒 Formula 10 𝜇𝜇 𝑖𝑖 ∶= 𝜏𝜏 𝑋𝑋 𝑖𝑖 = 𝑐𝑐 Formula 11 There are different types of values encoding characteristics. Categorical attributes have a discrete range of symbolic values. Numerical attributes have a range of numbers, which can be natural or real. Boolean attributes have Boolean truth-values {0,1} as a range. Ordinal attributes have a range of categories that can be ordered. On one hand, the distinction between univariate and multivariate classification, the variety, depends on the number of attributes considered for the classification predicate. The dimensionality of the classification, on the other hand, depends on the number of dimensions, or linearly independent attributes, of the classification predicate domain. In a univariate classification (𝑈𝑈𝑈𝑈), the classification predicate 𝛱𝛱 refers to one attribute, 𝑋𝑋, which is true for an individual, 𝑖𝑖, if the feature 𝑋𝑋(𝑖𝑖) equals a certain characteristic, 𝑐𝑐 ∈ 𝜒𝜒. In a multivariate classification (𝑀𝑀𝑀𝑀𝑀𝑀), the classification predicate refers to multiple element attributes. The classification predicate is true 31 for an individual, 𝑖𝑖, if an aggregation, 𝑎𝑎, of several characteristic constraints has a given value, 𝑐𝑐 ∈ 𝜒𝜒. 𝜇𝜇 𝑖𝑖 ∶= 𝜏𝜏 𝑎𝑎 𝑋𝑋 𝑖𝑖 , ⋯ , 𝑋𝑋 𝑖𝑖 = 𝑐𝑐 Formula 12 A multidimensional classification (𝑀𝑀𝑀𝑀𝑀𝑀) is a special case of a multivariate classification that refers to 𝑛𝑛-tuples of attributes, such that the resulting class is functionally dependent on the combination of all n attributes. 𝜇𝜇 𝑖𝑖 ∶= 𝜏𝜏 𝐶𝐶 𝑋𝑋 𝑖𝑖 ⋮ = ⋮ 𝐶𝐶 𝑋𝑋 (𝑖𝑖) Formula 13 This distinction between multivariate and multidimensional classification is necessary for the construction of classification functions. Multivariate classifications can be derived as functional aggregates of one-dimensional membership functions, in which the influence of one attribute to the resulting aggregate does not depend on the other attributes. In contrast, in multidimensional classification, the combination of all attributes determines the membership value, and thus, one attribute has different influences on the membership degree for different combinations with other attribute values. Therefore, multidimensional classifications need multidimensional membership functions that are defined on 𝑛𝑛-tuples of possible characteristics. 32 2.2 Fuzziness “There are many misconceptions about fuzzy logic. Fuzzy logic is not fuzzy. Basically, fuzzy logic is a precise logic of imprecision and approximate reasoning.” (Zadeh, 2008, p. 2751) Fuzziness, or vagueness (Sorensen, 2008), is an uncertainty regarding concept boundaries. In contrast to ambiguous terms, which have several meanings, vague terms have one meaning, but the extent of it is not sharply distinguishable. For example, the word tall can be ambiguous, because a tall cat is usually smaller than a small horse. Nevertheless, the disambiguated predicate “tall for a cat” is vague, because its linguistic concept does not imply a sharp border between tall and small cats. Our brains seem to love boundaries. Perhaps, making sharp distinctions quickly was a key cognitive ability in evolution. Our brains are so good at recognizing limits, that they construct limits where there are none. This is what many optical illusions are based on: for example, Kaniza’s (1976) Illusory Square (Figure 3). Figure 3: There is no square. Adapted from “Subjective Contours” by G. Kaniza, 1976, Copyright 1976 by Scientific American, Inc. An ancient symbol of sharp distinction between classes is the yin and yang symbol (Figure 4). It symbolizes a dualistic worldview—the cosmos divided into light and dark, day and night, and so on. 33 Figure 4: Black or white: conventional yin and yang symbol with a sharp distinction between opposites, representing metaphysical dualism. Adapted from http://www.texample.net/tikz/examples/yin-and-yang/ (accessed 02.2012) with permission (creative commons license CC BY 2.5). Figure 5: Shades of grey: fuzzy yin and yang symbol with a gradation between opposites, representing metaphysical monism. 34 Nevertheless, in reality, the transition between light and dark is gradual during the 24 hours of a day. This idea of gradation of our perceptions can be visualized by a fuzzy yin and yang symbol (Figure 5). Sorensen (2008) explains that many-valued logics have been proposed to solve the philosophical implications of vagueness. One many-valued approach to logic is fuzzy logic, which allows infinite truth-values in the interval between 0 and 1. In the next section, introducing membership functions, fuzzy sets, and fuzzy propositions are discussed; these are the bases for fuzzy logic, which in fact, is a precise logic for fuzziness. Additionally, it is shown how fuzzy classifications are derived from fuzzy propositional functions. 2.2.1 Fuzzy Logic Lotfi Zadeh (2008) said, “Fuzzy logic is not fuzzy” (p. 2751). Indeed, it is a precise mathematical concept for reasoning about fuzzy (vague) concepts. If the domain of those concepts is ordinal, membership can be distinguished by its degree. In classical set theory, an individual, 𝑖𝑖, of a universe of discourse, 𝑈𝑈, is either completely a member of a set or not at all. As previously explained, according to Boolean logic, the membership function 𝜇𝜇 : 𝑈𝑈 ⟶ {0,1}, for a crisp set 𝑆𝑆, maps from individuals to sharp truth-values. As illustrated in Figure 6, a sharp set (the big dark circle) has a clear boundary, and individuals (the small bright circles) are either a member of it or not. However, one individual is not entirely covered by the big dark circle, but is also not outside of it. In contrast, a set is called fuzzy by Zadeh (1965) if individuals can have a gradual degree of membership to it. In a fuzzy set, as shown by Figure 7, the limits of the set are blurred. The degree of membership of the elements in the set is gradual, illustrated by the fuzzy gray edge of the dark circle. The membership function, 𝜇𝜇 : 𝑈𝑈 → [0,1], for a fuzzy set, 𝐹𝐹, indicates the degree to which individual 𝑖𝑖 is a member of 𝐹𝐹 in the interval between 0 and 1. In Figure 6, the degree of membership of the small circles 𝑖𝑖 is defined by a normalization 𝑛𝑛 of their distance 𝑑𝑑 from the center 𝑐𝑐 of the big dark circle 𝑏𝑏, 𝜇𝜇 𝑖𝑖 = 𝑛𝑛 𝑑𝑑 𝑖𝑖, 𝑐𝑐 .In the same way as in classical set theory, set operators can construct complements of sets and combine two sets by union and intersection. Those operators are defined by the fuzzy membership function. In the original proposal of Zadeh (1965), the set operators are defined by subtraction from 1, minimum and maximum. The complement, 𝐹𝐹, of a fuzzy set, 𝐹𝐹, is derived by subtracting its membership function from 1; the union of two sets, 𝐹𝐹 ∪ 𝐺𝐺, is derived from the maximum of the membership degrees; and the intersection of two sets, 𝐹𝐹 ∩ 𝐺𝐺, is derived from the minimum of the membership degrees. 35 Figure 6: A visualization of a classical set with sharp boundaries. Figure 7: A visualization of a fuzzy set. 36 Accordingly, fuzzy subsets and fuzzy power sets can be constructed. Consider the two fuzzy sets 𝐴𝐴 and 𝐵𝐵 on the universe of discourse 𝑈𝑈. In general, 𝐴𝐴 is a fuzzy subset of 𝐵𝐵 if the membership degrees of all its elements are smaller or equal to the membership degrees of elements in 𝐵𝐵 (Formula 14). Thus, a fuzzy power set, 𝐵𝐵 ∗ , of a (potentially fuzzy) set B is the class of all its fuzzy subsets (Formula 15). Formula 14 𝐴𝐴 ⊆ 𝐵𝐵 ∶= ∀𝑥𝑥 ∈ 𝑈𝑈: 𝜇𝜇 𝑥𝑥 ≤ 𝜇𝜇 𝑥𝑥 𝐵𝐵 ∗ ≔ 𝐴𝐴 ⊆ 𝑈𝑈 𝐴𝐴 ⊆ 𝐵𝐵 Formula 15 With the tool of fuzzy set theory in hand, the sorites paradox cited in the introduction (Chapter 1) can be tackled in a much more satisfying manner. A heap of wheat grains can be defined as a fuzzy subset, 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 ⊆ ℕ, of natural numbers ℕ of wheat grains. A heap is defined in the English language as “a great number or large quantity” (merriamwebster.com, 2012b). For instance, one could agree that 1,000 grains of wheat is a large quantity, and between 1 and 1,000, the “heapness” of a grain collection grows logarithmically. Thus, the membership function of the number of grains 𝑛𝑛 ∈ ℕ in the fuzzy set 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 can be defined according to Formula 16. The resulting membership function is plotted in Figure 8. 𝜇𝜇Heap 𝑛𝑛 ≔ 0 if 𝑛𝑛 = 0 1 if 𝑛𝑛 > 1000 0.1448 ln 𝑛𝑛 else. Formula 16 Figure 8: Fuzzy set theory applied to the sorites paradox. 37 Based on the concept of fuzzy sets, Zadeh (1975a) derived fuzzy propositions (𝐹𝐹𝐹𝐹) for approximate reasoning: A fuzzy proposition has the form “𝑥𝑥 𝑖𝑖𝑖𝑖 𝐿𝐿,” where 𝑥𝑥 is an individual of a universe of discourse 𝑈𝑈 and 𝐿𝐿 is a linguistic term, defined as a fuzzy set on 𝑈𝑈. As stated by Formula 17, the truth-value 𝜏𝜏 of a fuzzy proposition is defined by the degree of membership 𝜇𝜇 of 𝑥𝑥 in the linguistic term 𝐿𝐿. 𝜏𝜏 𝑥𝑥 𝑖𝑖𝑖𝑖 𝐿𝐿 ∶= 𝜇𝜇 (𝑥𝑥) Formula 17 If 𝐴𝐴 is an attribute of 𝑥𝑥, a fuzzy proposition can also refer to the corresponding attribute value, such as 𝑥𝑥 𝑖𝑖𝑖𝑖 𝐿𝐿 ∶= 𝐴𝐴 𝑥𝑥 𝑖𝑖𝑖𝑖 𝐿𝐿 . The fuzzy set 𝐿𝐿 on 𝑈𝑈 is equivalent to the fuzzy set 𝐿𝐿 on the domain of the attribute, or dom(𝐴𝐴). In fact, the set can be defined on arbitrarily deep-nested attribute hierarchies concerning the individual. As an example, let us look at the fuzzy proposition, “Mary is blond.” In this sentence, the linguistic term “blond” is a fuzzy set on the set of people, which is equivalent to a fuzzy set blond on the color of people’s hair (Formula 18). 𝜏𝜏 "Mary is blond" = 𝜇𝜇 Mary ≡ 𝜇𝜇 color hair Mary Formula 18 Fuzzy propositions (𝐹𝐹𝐹𝐹) can be combined to construct fuzzy formulae using the usual logic operators not (¬), and (∧), and or (∨), for which the semantics are defined by the fuzzy truth function 𝜏𝜏 : ℱ ⟶ [0,1], mapping from the class of fuzzy propositions ℱ into the set of Zadehan truth values in the interval between 1 and 0. Let “𝑥𝑥 is 𝑃𝑃” and “𝑥𝑥 is 𝑄𝑄” be two fuzzy propositions on the same individual. Then their combination to fuzzy formulae is defined as follows (Formula 19 through Formula 21): negation by the inverse of the corresponding fuzzy set, conjunction by intersection of the corresponding fuzzy sets, and disjunction by union of the corresponding fuzzy sets. 𝜏𝜏 ¬ 𝑥𝑥 is 𝑃𝑃 ∶= 𝜇𝜇 𝑥𝑥 ; 𝜏𝜏 𝑥𝑥 is 𝑃𝑃 ∧ 𝑥𝑥 is 𝑄𝑄 ∶= 𝜇𝜇∩ (𝑥𝑥); 𝜏𝜏 𝑥𝑥 is 𝑃𝑃 ∨ 𝑥𝑥 is 𝑄𝑄 ∶= 𝜇𝜇∪ (𝑥𝑥) 38 Formula 19 Formula 20 Formula 21 Zadeh’s fuzzy propositions are derived from statements of the form “𝑋𝑋 is 𝑌𝑌.” They are based on the representation operator 𝑖𝑖𝑖𝑖 ∶ 𝑈𝑈×𝑈𝑈 ∗ ⟶ ℱ mapping from the universe 𝑈𝑈 of discourse and its fuzzy powerset 𝑈𝑈 ∗ to the class of fuzzy propositions ℱ. Consequently, fuzzy propositions in the sense of Zadeh are limited to statements about degrees of membership in a fuzzy set. Generally, logic with fuzzy propositions—or more precisely, a propositional logic with Zadehan truth values in the interval between 0 and 1, a “Zadehan Logic” (𝑍𝑍𝑍𝑍)—can be viewed as a generalization of Boole’s mathematical analysis of logic to a gradual concept of truth. In that sense, 𝑍𝑍𝑍𝑍 is a simple generalization of Boolean logic (𝐵𝐵𝐵𝐵), in which the truth value of any proposition is not only represented by numbers, but also can be anywhere in the interval between 0 and 1. According to the Stanford Encyclopedia of Philosophy (Hajek, 2006), fuzzy logic, in the narrow sense, is a “symbolic logic with a comparative notion of truth developed fully in the spirit of classical logic” (“Fuzzy Logic,” paragraph 3). If 𝑍𝑍𝑍𝑍 is viewed as a generalization of 𝐵𝐵𝐵𝐵, fuzzy propositions of the form “𝑋𝑋 𝑖𝑖𝑖𝑖 𝑌𝑌” are a special case, and propositions and propositional functions of any form can have gradual values of truth. Accordingly, 𝑍𝑍𝑍𝑍 is defined by the truth function 𝜏𝜏 : 𝒫𝒫 ⟶ 𝒯𝒯 mapping from the class of propositions 𝒫𝒫 to the set of Zadehan truth-values 𝒯𝒯 ≔ [0,1]. Consequently, fuzzy set membership is a special case of fuzzy proposition, and the degree of membership of individual 𝑥𝑥 in another individual 𝑦𝑦 can be defined as the value of truth of the fuzzy proposition 𝑥𝑥 ∈ 𝑦𝑦 (Formula 22). 𝜇𝜇 𝑥𝑥 ∶= 𝜏𝜏 (𝑥𝑥 ∈ 𝑦𝑦) Formula 22 𝜏𝜏 ¬ 𝑝𝑝 ∶= 1 − 𝜏𝜏 𝑝𝑝 Formula 23 𝜏𝜏 𝑝𝑝 ∧ 𝑞𝑞 ∶= 𝜏𝜏 𝑝𝑝 ∙ 𝜏𝜏 𝑞𝑞 Formula 24 The Zadehan truth function 𝜏𝜏 defines the semantics of 𝑍𝑍𝑍𝑍. As in Boolean algebra, its operators can be defined by subtraction from 1 as negation, and multiplication as conjunction, as formalized in Formula 23 and Formula 24. Disjunction, implication, and equivalence can be derived from negation and conjunction in the same way as in Boolean logic. 39 In that light, any proposition with an uncertain truth-value smaller than 1 or greater than 0 is a fuzzy proposition. Additionally, every function with the range [0,1] can be thought of as a truth function for a propositional function. For example, statistical likelihood 𝐿𝐿(𝑦𝑦|𝑥𝑥) can be seen as a truth function for the propositional function, “y is likely if x,” as a function of x. This idea is the basis for IFC proposed in the next section. The usefulness of this generalization is shown in the chapter on applications, in which fuzzy propositions such as “customers with characteristic X are likely to buy product Y” are assigned truthvalues that are computed using quantitative prediction modeling. 2.2.2 Fuzzy Classification A fuzzy class, 𝐶𝐶 : = ∼ 𝑖𝑖 ∈ 𝑈𝑈 | 𝛱𝛱(𝑖𝑖) , is defined as a fuzzy set 𝐶𝐶 of individuals 𝑖𝑖, whose membership degree is defined by the Zadehan truth-value of the proposition 𝛱𝛱 𝑖𝑖 . The classification predicate, 𝛱𝛱, is a propositional function interpreted in 𝑍𝑍𝑍𝑍. The domain of the fuzzy class operator, ∼ . | . ∶ ℙ ⟶ 𝑈𝑈 ∗ , is the class of propositional functions, ℙ, and the range is the fuzzy power set, 𝑈𝑈 ∗ (the set of fuzzy subsets) of the universe of discourse, 𝑈𝑈. In other words, the fuzzy class operator assigns fuzzy subsets of the universe of discourse to propositional functions. Fuzzy classification is the process of assigning individuals a membership degree to a fuzzy set, based on their degrees of truth of the classification predicate. It has been discussed, for example, by Zimmermann (1997), Del Amo et al. (1999), and Meier et al. (2008). A fuzzy classification is achieved by a membership function, 𝜇𝜇 : 𝑈𝑈 ⟶ [0,1], that indicates the degree to which an individual is a member of a fuzzy class, 𝐶𝐶, given the corresponding fuzzy propositional function, 𝛱𝛱. This membership degree is defined by the Zadehan truth-value of the corresponding proposition, 𝛱𝛱 𝑖𝑖 , as formalized in Formula 25. 𝜇𝜇 𝑖𝑖 ≔ 𝜏𝜏 𝛱𝛱 𝑖𝑖 Formula 25 In the same way as in crisp classification, the fuzzy classification predicate refers to attributes of individuals. Additionally, Zadehan logic introduces two new types of characteristics. Zadehan attributes have a range of truth values represented by 𝒯𝒯 ≔ [0,1]. Linguistic attributes have a range of linguistic terms (fuzzy sets) together with the Zadehan truth-value of membership in those terms (Zadeh, 1975b). In a univariate fuzzy classification (𝑈𝑈𝑈𝑈), the fuzzy classification predicate 𝛱𝛱 refers to one attribute, 𝑋𝑋, and it corresponds to the membership degree of the attribute characteristic 𝑋𝑋(𝑖𝑖) in a given fuzzy 40 restriction (Zadeh, 1975a), 𝑅𝑅 ∈ 𝜒𝜒 ∗ , which is a fuzzy subset of possible characteristics 𝜒𝜒 (Formula 26). 𝜇𝜇 𝑖𝑖 ≔ 𝜏𝜏 𝑋𝑋 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑅𝑅 Formula 26 𝜇𝜇 𝑖𝑖 ≔ 𝑎𝑎 𝜏𝜏 𝑋𝑋 i 𝑖𝑖𝑖𝑖 𝑅𝑅 , … , 𝑋𝑋 i 𝑖𝑖𝑖𝑖 𝑅𝑅 Formula 27 In a multivariate fuzzy classification (𝑀𝑀𝑀𝑀𝑀𝑀), 𝛱𝛱 refers to multiple attributes. The truth function of the classification predicate for an individual, 𝑖𝑖, equals to an aggregation, 𝑎𝑎, of several fuzzy restrictions of multiple attribute characteristics, 𝑋𝑋 𝑖𝑖 , 𝑗𝑗 = 1 … 𝑛𝑛 (Formula 27). In a multidimensional fuzzy classification (𝑀𝑀𝑀𝑀𝑀𝑀), 𝛱𝛱 refers to 𝑛𝑛tuples of functionally independent attributes. The membership degree of individuals in a multidimensional class is based on an 𝑛𝑛-dimensional fuzzy restriction, 𝑅𝑅 (Formula 28), which is a multidimensional fuzzy set on the Cartesian product of the attribute ranges with a multidimensional membership function of 𝜇𝜇 : 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑋𝑋 × … × 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑋𝑋 ⟶ [0,1]. 𝜇𝜇 𝑖𝑖 ≔ 𝜏𝜏 𝑋𝑋 𝑖𝑖 ⋮ 𝑋𝑋 𝑖𝑖 is 𝑅𝑅 Formula 28 41 42 2.3 Induction Given a set of certainly true statements, deduction works fine. The problem is that the only certainty philosophy can offer is Descartes’s “I think therefore I am” proposition; however, postmodern philosophers are not so sure about the I anymore (Precht, 2007, p. 62 ff). Therefore, one should be given a tool to reason under uncertainty, and this tool is induction. In this chapter, inductive logic is analyzed, the application of induction to fuzzy classification is discussed, and a methodology for membership function induction using normalized ratios and differences of empirical conditional probabilities and likelihoods is proposed. 2.3.1 Inductive Logic Traditionally, induction is defined as drawing general conclusions from particular observations. Contemporary philosophy has shifted to a different view because, not only are there inductions that lead to particular conclusions, but also there are deductions that lead to general conclusions. According to Vickers (2009) in the Stanford Encyclopedia of Philosophy (SEP), it is agreed that induction is a form of inference that is “contingent” and “ampliative” (“The contemporary notion of induction”, paragraph 3), in contrast to deductive inference, which is necessary and explicative. Induction is contingent, because inductively inferred propositions are not necessarily true in all cases. And it is ampliative because, in Vickers words, “induction can amplify and generalize our experience, broaden and deepen our empirical knowledge” (“The contemporary notion of induction”, paragraph 3). In another essay in the SEP, inductive logic is defined as “a system of evidential support that extends deductive logic to less-than-certain inferences” (Hawthorne, 2008, “Inductive Logic,” paragraph 1). Hawthorne admits that there is a degree of fuzziness in induction: In an inductive inference, “the premises should provide some degree of support for the conclusion” (“Inductive Logic,” para. 1). The degree of support for an inductive inference can thus be viewed as a fuzzy restriction of possible inferences, in the sense of Zadeh (1975a). Vickers (2009) explains that the problem of induction is two-fold: The epistemic problem is to define a method to distinguish appropriate from inappropriate inductive inference. The metaphysical problem is to explain in what substance the difference between reliable and unreliable induction actually exists. Epistemologically, the question of induction is to find a suitable method to infer propositions under uncertainty. State of the art methods rely on empirical probabilities or likelihoods. There are many interpretations of probability (Hájek, 2009). For the context of this 43 thesis, one may agree that a mathematical probability, 𝑃𝑃 𝐴𝐴 , numerically represents how probable it is that a specific proposition 𝐴𝐴 is true: 𝑃𝑃 𝐴𝐴 ≡ 𝜏𝜏 "𝐴𝐴 is probable" ; and that the disjunction of all possible propositions, the probability space Ω, is certain, i.e., 𝑃𝑃 Ω = 1. In practice, probabilities can be estimated by relative frequencies, or sampled empirical probabilities 𝑝𝑝 in a sample of 𝑛𝑛 observations, defined by the ratio between the number of observations, 𝑖𝑖, in which the proposition 𝐴𝐴 is true, and the total number of observations (Formula 29). 𝑃𝑃(𝐴𝐴) ≈ 𝑝𝑝 𝐴𝐴 ∶= 𝜏𝜏(𝐴𝐴 ) 𝑛𝑛 Formula 29 A conditional probability (Weisstein, 2010a) is the probability for an outcome 𝑥𝑥, given that 𝑦𝑦 is the case, as formalized in Formula 30. 𝑃𝑃(𝑥𝑥 | 𝑦𝑦) = 𝑃𝑃(𝑥𝑥 ∧ 𝑦𝑦) 𝑃𝑃(𝑦𝑦) Formula 30 Empirical sampled conditional probabilities can be applied to compute likelihoods. According to James Joyce, “in an unfortunate, but now unavoidable, choice of terminology, statisticians refer to the inverse probability PH(E) as the ‘likelihood’ of H on E” (Joyce, 2003, “Conditional Probabilities and Bayes' Theorem,” paragraph 5). The likelihood of the hypothesis 𝐻𝐻 is an estimate of how probable the evidence or known data 𝐸𝐸 is, given that the hypothesis is true. Such a probability is called a “posterior probability” (Hawthorne, 2008, “inductive Logic,” paragraph 5), that is, a probability after measurement, shown by Formula 31. 𝐿𝐿 𝐻𝐻 𝐸𝐸 ∶= 𝑝𝑝 𝐸𝐸 | 𝐻𝐻 Formula 31 In the sense of Hawthorne (2008), the general law of likelihood states that, for a pair of incompatible hypotheses 𝐻𝐻 and 𝐻𝐻 , the evidence 𝐸𝐸 supports 𝐻𝐻 over 𝐻𝐻 , if and only if 𝑝𝑝 𝐸𝐸 | 𝐻𝐻 > 𝑝𝑝 𝐸𝐸 | 𝐻𝐻 . The likelihood ratio (𝐿𝐿𝐿𝐿) measures the strength of evidence for 𝐻𝐻 over 𝐻𝐻 (Formula 32). Thus, the “likelihoodist” (sic; Hawthorne, 2008, “Likelihood Ratios, Likelihoodism, and the Law of Likelihood,” paragraph 5) solution to the epistemological problem of induction is the likelihood ratio as measure of support for inductive inference. 44 𝐿𝐿𝐿𝐿 𝐻𝐻 > 𝐻𝐻 | 𝐸𝐸 ∶= 𝐿𝐿 𝐻𝐻 𝐸𝐸 𝐿𝐿 𝐻𝐻 𝐸𝐸 Formula 32 According to Hawthorne, the prior probability of a hypothesis, 𝑝𝑝 𝐻𝐻 , that is, an estimated probability prior to measurement of evidence 𝐸𝐸, plays an important role for inductive reasoning. Accordingly, Bayes’ theorem can be interpreted and rewritten using measured posterior likelihood and prior probability in order to apply it to the evaluation of scientific hypotheses. According to Hawthorne (2008), the posterior probability of hypothesis 𝐻𝐻 conditional to evidence 𝐸𝐸 is equal to the product of the posterior likelihood of 𝐻𝐻 given 𝐸𝐸 and the prior probability of 𝐻𝐻, divided by the (measured) probability of 𝐸𝐸 (Formula 33). 𝑝𝑝 𝐻𝐻 𝐸𝐸 = 𝐿𝐿(𝐻𝐻|𝐸𝐸) ∙ 𝑝𝑝 𝐻𝐻 𝑝𝑝(𝐸𝐸) Formula 33 What if there is fuzziness in the data, in the features of observations, or in the theories? How is likelihood measured when the hypothesis or the evidence is fuzzy? If this fuzziness is ordinal, that is, if the extent of membership in the fuzzy terms can be ordered, a membership function can be defined, and an empirical probability of fuzzy events can be calculated. Analogous to Dubois and Prade (1980), a fuzzy event 𝐴𝐴 in a universe of discourse 𝑈𝑈 is a fuzzy set on 𝑈𝑈 with a membership function 𝜇𝜇 : 𝑈𝑈 ⟶ [0,1]. For categorical elements of 𝑈𝑈, the estimated probability after n observations is defined as the average degree of membership of observations 𝑖𝑖 in 𝐴𝐴, as formalized in Formula 34. 𝑃𝑃 𝐴𝐴 ≈ 𝑝𝑝 𝐴𝐴 = 𝜇𝜇 (𝑖𝑖) 𝑛𝑛 Formula 34 By application of Formula 34 to Formula 31, the likelihood of ordinal fuzzy hypothesis 𝐻𝐻, given ordinal fuzzy evidence 𝐸𝐸, can be defined as a conditional probability of fuzzy events, as shown in Formula 35. 45 𝐿𝐿 𝐻𝐻 𝐸𝐸) = 𝑝𝑝(𝐸𝐸|𝐻𝐻) = 𝜇𝜇∩ (𝑖𝑖) 𝜇𝜇 (𝑖𝑖) Formula 35 The question of the metaphysical problem of induction is: what is the substance of induction? In what kind of material does the difference between reliable and unreliable inductive inference exist? The importance of this question cannot be underestimated, since reliable induction enables prediction. A possible answer could be that the substance of an induction is the amount of information contained in the inference. This answer presupposes that information is a realist category, as suggested by Chmielecki (1998). According to Shannon’s information theory (Shannon, 1948), the information contained in evidence 𝑥𝑥 about hypothesis 𝑦𝑦 is equal to the difference between the uncertainty (entropy), 𝐻𝐻(𝑦𝑦), about the hypothesis 𝑦𝑦 and the resulting uncertainty, 𝐻𝐻 (𝑦𝑦), after observation of the evidence 𝑥𝑥, 𝐼𝐼 𝑥𝑥, 𝑦𝑦 = 𝐻𝐻 𝑦𝑦 − 𝐻𝐻 𝑦𝑦 = Σ Σ 𝑝𝑝(𝑥𝑥 ∧ 𝑦𝑦)log 𝑝𝑝 𝑥𝑥 ∧ 𝑦𝑦 / 𝑝𝑝 𝑥𝑥 𝑝𝑝 𝑦𝑦 . Shannon’s quantity of information is defined in terms of joint probabilities. However, by application of Shannon’s theory, the metaphysical problem of induction is transferred to a metaphysical problem of probabilities because, according to Shannon, the basic substance of information is the probability of two signals occurring simultaneously compared to the probability of occurring individually. (One could link this solution to the concept of quantum physical particle probability waves [Greene, 2011], but this would go beyond the scope of this thesis and would be highly speculative; therefore, this link is not explored here. Suffice it to state that probability apparently is a fundamental construct of matter and waves as well as of information and induction.) 2.3.2 Inductive Classification Inductive classification is the process of assigning individuals to a set based on a classification predicate derived by an inductive inference. Inductive classification can be automated as a form of supervised machine learning (Witten & Frank, 2005): a class of processes (algorithms or heuristics) that learn from examples to decide whether an individual, 𝑖𝑖, belongs to a given class, 𝑦𝑦, based on its attributes. Generally, supervised machine learning processes induce a model from a dataset, which generalizes associations in the data in order to provide support for inductive inference. This model can be used for predicting the class membership of new data elements. Induced classification models, called classifiers, are first trained using a training set with known class membership. Then, they are applied to a test or prediction set in order to derive class membership predictions. Examples of 46 classification learning algorithms that result in classifications are decision trees, classification rules, and association rules. In those cases, the model consists of logical formulae of attribute values, which predict a crisp class value. Data are signs (signals) that represent knowledge such as numbers, characters, or bits. The basis for automated data analysis is a systematic collection of data on individuals. The most frequently used data structure for analytics is the matrix, in which every individual, 𝑖𝑖 (a customer, a transaction, a website, etc.), is represented by a row, and every attribute, 𝑋𝑋 , is represented by a column. Every characteristic, 𝑋𝑋 (𝑖𝑖), of individual 𝑖𝑖 for attribute 𝑋𝑋 is represented by one scalar value within the matrix. A training dataset 𝑑𝑑 is an 𝑚𝑚× 𝑛𝑛 + 1 matrix with 𝑚𝑚 rows, 𝑛𝑛 columns for 𝑋𝑋 , … , 𝑋𝑋 and a column 𝑌𝑌 indicating the actual class membership. The columns 𝑋𝑋 , 1 ≤ 𝑘𝑘 ≤ 𝑛𝑛 are called analytic variables, and 𝑌𝑌 is called the target variable, which indicates membership in a target class 𝑦𝑦. In case of a binary classification, for each row index 𝑖𝑖, the label 𝑌𝑌(𝑖𝑖) is equal to 1 if and only if individual 𝑖𝑖 is in class 𝑦𝑦 (Formula 36). 𝑌𝑌(𝑖𝑖) ∶= 1 0 if 𝑖𝑖 ∈ 𝑦𝑦 else. Formula 36 A machine learning process for inductive sharp classification generates a model 𝑀𝑀 (𝑖𝑖), mapping from the Cartesian product of the analytic variable ranges into the set {0,1}, indicating inductive support for the hypothesis that 𝑖𝑖 ∈ 𝑦𝑦. As discussed in the section on induction, the model should provide support for inductive inferences about an individual’s class membership: Given 𝑀𝑀 (𝑖𝑖) = 1, the likelihood of 𝑖𝑖 ∈ 𝑦𝑦 should be greater than the likelihood of 𝑖𝑖 ∉ 𝑦𝑦. The inductive model 𝑀𝑀 can be applied for prediction to a new dataset with unknown class indicator, which is either a test set for performance evaluation or a prediction set, where the model is applied to forecast class membership of new data. The test set or prediction set 𝑑𝑑′ has the same structure as the training set 𝑑𝑑, except that the class membership is unknown, and thus, the target variable 𝑌𝑌 is empty. The classifier 𝑀𝑀 , derived from the training set, can be used for predicting the class memberships of representations of individuals 𝑖𝑖 ∈ 𝑑𝑑. The model output prediction 𝑀𝑀 (𝑖𝑖) yields an inductive classification defined by 𝑖𝑖 𝑀𝑀 𝑖𝑖 = 1}. In order to evaluate the quality of prediction of a crisp classifier model, several measures can be computed. In this section, likelihood 47 ratio and Pearson correlation are mentioned. The greater the ratio between likelihood for target class membership, given a positive prediction, and the likelihood for target class membership, given a negative prediction, the better the inductive support of the model. Thus, the predictive model can be evaluated by the likelihood ratio of target class membership given the model output (Formula 37). 𝐿𝐿𝐿𝐿 𝑌𝑌 𝑖𝑖 = 1 | 𝑀𝑀 𝑖𝑖 = 1 ∶= 𝑝𝑝 𝑀𝑀 𝑖𝑖 = 1 | 𝑌𝑌 𝑖𝑖 = 1 𝑝𝑝 𝑀𝑀 𝑖𝑖 = 1 | 𝑌𝑌 𝑖𝑖 = 0 Formula 37 Working with binary or Boolean target indicators and model indicators allows the evaluation of predictive quality by a measure of correlation of the two variables 𝑀𝑀 and 𝑌𝑌 (Formula 38). The correlation between two numerical variables can be measured by the Pearson correlation coefficient as the ratio between the covariance of the two variables and the square root of the product of individual variances (Weisstein, 2010b). corr 𝑀𝑀 , 𝑌𝑌 = 𝐸𝐸 𝑀𝑀 − avg 𝑀𝑀 𝑌𝑌 − avg 𝑌𝑌 stddev 𝑀𝑀 ∙ stddev(𝑌𝑌) Formula 38 The advantage of the correlation coefficient is its availability in database systems. Every standard SQL (structured query language) database has an implementation of correlation as an aggregate function. Thus, using the correlation coefficient, evaluating the predictive performance of a model in a database is fast and simple. However, it is important to stress that evaluating predictions with a measure of correlation is only meaningful if the target variable as well as the predictive variable are Boolean, Zadehan, or numeric in nature. It will not work for ordinal or categorical target classes, except if they are transformed into a set of Boolean variables. For example, in database marketing, the process of target group selection uses classifiers to select customers who are likely to buy a certain product. In order to do this, a classifier model can be computed in the following way: Given set of customers 𝐶𝐶, we know whether they have bought product 𝐴𝐴 or not. Let 𝑐𝑐 be an individual customer, and 𝐶𝐶 be the set of customers who bought product 𝐴𝐴. Then, the value 𝑌𝑌(𝑐𝑐) of target variable 𝑌𝑌 for customer 𝑐𝑐 is defined in Formula 39. 48 𝑌𝑌(𝑐𝑐) = 1 0 if 𝑐𝑐 ∈ 𝐶𝐶 else. Formula 39 The analytic variables for customers are selected from every known customer attribute, such as age, location, transaction behavior, recency, frequency, and monetary value of purchase. The aim of the classifier induction process is to learn a model, 𝑀𝑀 , that provides a degree of support for the inductive inference that a customer is interested in the target product 𝐴𝐴. This prediction, 𝑀𝑀 (c) ∈ {0,1}, should provide a better likelihood to identify potential buyers of product 𝐴𝐴, and it should optimally correlate with the actual product usage of existing and future customers. 49 50 2.4 Inductive Fuzzy Classification The understanding of IFC in the proposed research approach is an inductive gradation of the degree of membership of individuals in classes In many interpretations, the induction step consists of learning fuzzy rules (e.g., Dianhui, Dillon, & Chang, 2001; Hu, Chen, & Tzeng, 2003; Roubos, Setnes, & Abonyi, 2003; Wang & Mendel, 1992). In this thesis, IFC is understood more generally as inducing membership functions to fuzzy classes and assigning individuals to those classes. In general, a membership function can be any function mapping into the interval between 1 and 0. Consequently, IFC is defined as the process of assigning individuals to fuzzy sets for which membership functions are generated from data so that the membership degrees are based on an inductive inference. An inductive fuzzy class, 𝑦𝑦′, is defined by a predictive scoring model, 𝑀𝑀 : 𝑈𝑈 → [0,1], for membership in a class, 𝑦𝑦. This model represents an inductive membership function for 𝑦𝑦′, which maps from the universe of discourse 𝑈𝑈 into the interval between 0 and 1 (Formula 40). 𝜇𝜇 : 𝑈𝑈 → [0,1] ∶= 𝑀𝑀 Formula 40 𝜏𝜏 “𝑖𝑖 𝑖𝑖𝑖𝑖 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜 𝑦𝑦” ≔ 𝑀𝑀 (𝑖𝑖) Formula 41 Consider the following fuzzy classification predicate 𝑃𝑃 𝑖𝑖, 𝑦𝑦 : = “𝑖𝑖 is likely a member of 𝑦𝑦. ” This is a fuzzy proposition (Zadeh, 1975a) as a function of 𝑖𝑖 and 𝑦𝑦, which indicates that there is inductive support for the conclusion that individual 𝑖𝑖 belongs to class 𝑦𝑦. The truth function, 𝜏𝜏 , of this fuzzy propositional function can be defined by the membership function of an inductive fuzzy class, 𝑦𝑦′. Thus, 𝑃𝑃(𝑖𝑖, 𝑦𝑦) is a fuzzy restriction on 𝑈𝑈 defined by 𝜇𝜇 (Formula 41). In practice, any function that assigns values between 0 and 1 to data records can be used as a fuzzy restriction. The aim of IFC is to calculate a membership function to a fuzzy set of likely members in the target class. Hence, any type of classifier with a normalized numeric output can be viewed as an inductive membership function to the target class, or as a truth function for the fuzzy proposition 𝑃𝑃 𝑖𝑖, 𝑦𝑦 . State of the art methods for IFC in that sense include linear regression, logistic regression, naïve Bayesian classification, neural networks, fuzzy classification trees, and fuzzy rules. These are classification methods 51 yielding numerical predictions that can be normalized in order to serve as a membership function to the inductive fuzzy class 𝑦𝑦 (Formula 42). 2.4.1 𝑦𝑦 ≔ 𝑖𝑖 ∈ 𝑈𝑈 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜 𝑦𝑦 Formula 42 Univariate Membership Function Induction This section describes methods to derive membership functions for one variable based on inductive methods. First, unsupervised methods are described, which do not require learning from a target class indicator. Second, supervised methods for predictive membership functions are proposed. Numerical attributes can be fuzzified in an unsupervised way, that is, without a target variable, by calculating a membership function to a fuzzy class x is a large number, denoted by the symbol ↑: the fuzzy set of attribute values that are large relative to the available data. This membership function, 𝜇𝜇↑ : dom 𝐶𝐶 ⟶ 0,1 , maps from the attribute domain of the target variable into the set of Zadehan truth values. This unsupervised fuzzification serves two purposes. First, it can be used to automatically derive linguistic interpretations of numerical data, such as “large” or “small.” Second, it can be used to transform numerical attributes into Zadehan target variables in order to calculate likelihoods of fuzzy events. There are two approaches proposed here to compute a membership function to this class: percentile ranks and linear normalization based on minimum and maximum. For a numeric or ordinal variable 𝑋𝑋 with a value 𝑥𝑥 𝜖𝜖 dom(𝑋𝑋), the percentile rank (PR) is equal to the sampled probability that the value of the variable 𝑋𝑋 is smaller than 𝑥𝑥. This sampled probability is calculated by the percentage of values in dom(𝑋𝑋) that are smaller than or equal to 𝑥𝑥. This sampled probability can be transformed into a degree of membership in a fuzzy set. Inductively, the sampled probability is taken as an indicator for the support of the inductive inference that a certain value, 𝑋𝑋 , is large in comparison to the distribution of the other attribute values. The membership degree of 𝑥𝑥 in the fuzzy class “relatively large number”, symbolized by ↑, is then defined as specified in Formula 43. 𝜇𝜇↑ 𝑥𝑥 ∶= 𝑝𝑝 𝑋𝑋 < 𝑥𝑥 Formula 43 For example, customers can be classified by their profitability. The percentile rank of profitability can be viewed as a membership function 52 of customers in the fuzzy set ↑ of customers with a high profitability. Figure 9 shows an example of an IFC-PR of customer profitability for a financial service provider. 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 -1'000 -500 0 500 1'000 Customer Profitability High Low Figure 9: Unsupervised IFC by percentile rank. Adapted from “An Inductive Approach to Fuzzy Marketing Analytics,” by M. Kaufmann, 2009, In M. H. Hamza (Ed.), Proceedings of the 13th IASTED International Conference on Artificial Intelligence and Soft Computing: Copyright 2009 by Publisher. A simpler variant of unsupervised fuzzification for generating a membership function for a relatively large number (↑) is linear normalization (IFC-LN). For a numerical attribute 𝐶𝐶, it is defined as the relative distance to the minimal attribute value, as specified in Formula 44. 𝜇𝜇↑ 𝐶𝐶 𝑖𝑖 ∶= 𝐶𝐶 𝑖𝑖 − min 𝐶𝐶 max 𝐶𝐶 − min 𝐶𝐶 Formula 44 For the membership function induction (MFI) methods in the following sections, the target variable for supervised induction must be a Zadehan variable, 𝑌𝑌: 𝑈𝑈 ⟶ 0,1 , mapping from the universe of discourse (the set of possible individuals) into the interval of Zadehan 53 truth values between 0 and 1. Thus, 𝑌𝑌(𝑖𝑖) indicates the degree of membership of individual 𝑖𝑖 in the target class 𝑦𝑦. In the special case of a Boolean target class, 𝑌𝑌(𝑖𝑖) is equal to 1 if 𝑖𝑖 ∈ 𝑦𝑦, and it is equal to 0 if 𝑖𝑖 ∉ 𝑦𝑦. In the analytic training data, a target class indicator 𝑌𝑌 can be deduced from data attributes in the following way: o If an attribute, 𝐴𝐴, is Zadehan with a range between 0 and 1, it can be defined directly as the target variable. In fact, if the variable is Boolean, this implies that it is also Zadehan, because it is a special case (Formula 45). Zadehan 𝐴𝐴 ⟹ 𝜇𝜇 𝑖𝑖 ∶= 𝐴𝐴(𝑖𝑖) o If an attribute, 𝐵𝐵, is categorical with a range of 𝑛𝑛 categories, it can be transformed into 𝑛𝑛 Boolean variables 𝜇𝜇 ( k = 1, 2, … , n ), where 𝜇𝜇 (𝑖𝑖) indicates whether record 𝑖𝑖 belongs to class 𝑘𝑘, as specified by Formula 46. categorical 𝐵𝐵 ⟹ 𝜇𝜇 (𝑖𝑖) ∶= o Formula 45 1 0 if 𝐵𝐵(𝑖𝑖) = 𝑘𝑘 else. Formula 46 If an attribute, 𝐶𝐶, is numeric, this thesis proposes application of an unsupervised fuzzification, as previously specified, in order to derive a Zadehan target variable, as formalized in Formula 47. This is called an inductive target fuzzification (ITF). numerical 𝐶𝐶 ⟹ 𝜇𝜇 𝑖𝑖 ∶= 𝜇𝜇↑ 𝐶𝐶 𝑖𝑖 Formula 47 The second approach for univariate membership function induction is supervised induction based on a target variable. In order to derive membership functions to inductive fuzzy classes for one variable based on the distribution of a second variable, it is proposed to normalize comparisons (ratios and differences) of likelihoods for membership function induction. For example, a normalized likelihood ratio can represent a membership degree to an inductive fuzzy class. 54 The basic idea of inductive fuzzy classification based on normalized likelihood ratios (IFC-NLR) is to transform inductive support of target class membership into a membership function with the following properties: The higher the likelihood of 𝑖𝑖 ∈ 𝑦𝑦 in relation to 𝑖𝑖 ∉ 𝑦𝑦, the greater the degree membership of 𝑖𝑖 in 𝑦𝑦 . For an attribute 𝑋𝑋, the NLR function calculates a membership degree of a value 𝑥𝑥 𝜖𝜖 dom(𝑋𝑋) in the predictive class 𝑦𝑦 , based on the likelihood of target class membership. The resulting membership function is defined as a relation between all values in the domain of the attribute 𝑋𝑋 and their NLRs. As discussed in Section 2.3.1, following the principle of likelihood (Hawthorne, 2008), the ratio between the two likelihoods is an indicator for the degree of support for the inductive conclusion that 𝑖𝑖 ∈ 𝑦𝑦, given the evidence that 𝑋𝑋(𝑖𝑖) = 𝑥𝑥. In order to transform the likelihood ratio into a fuzzy set membership function, it can be normalized in the interval between 0 and 1. Luckily, for every ratio, R = A/B, there exists a normalization, N = A/(A+B), having the following properties: o N is close to 0 if R is close to 0. o N is close to 1 if R is a large number. o N is equal to 0.5 if and only if R is equal to 1. This kind of normalization is applied to the aforementioned likelihood ratio in order to derive the NLR function. Accordingly, the membership µ of an attribute value x in the target class prediction y’ is defined by the corresponding NLR, as formalized in Formula 48. µ y ' ( x) := NLR( y | x) = L( y | x ) L( y | x) + L(¬y | x) Formula 48 In fact, one can demonstrate that the NLR function is equal to the posterior probability of y, conditional to x, if both hypotheses y and ¬y are assumed to be of equal prior probability (Formula 52), by application of the second form of Bayes’ theorem (Joyce, 2003), as presented in Formula 50. The trick is to express the probability of the evidence p(x) in terms of a sum of products of prior probabilities, p0, and measured likelihoods, L, of the hypothesis and its alternative by application of Formula 33. 55 Theorem. NLR ( y | x) = p( y | x) ⇔ p0 ( y) = p0 (¬y) Proof. Formula 49 p0 (y)L(y | x) (c.f. Formula 36) p(x) p0 (y)L(y | x) = p0 (y)L(y | x) + p0 (¬y)L(¬y | x) p(y | x) = [if p(x) = p(y)p(x | y) + p(¬y)p(x |¬y) and p(y) := p0 (y) and p(x | y) := L(y | x)] Formula 50 L(y | x) [if p0 (y) :=: p0 (¬y)] L(y | x) + L(¬y | x) =: NLR(y | x), q.e.d. = Alternatively, two likelihoods can be compared by a normalized difference, as shown in Formula 51. In that case, the membership function is defined by a normalized likelihood difference (NLD), and its application for classification is called inductive fuzzy classification by normalized likelihood difference (IFC-NLD). In general, IFC methods based on normalized likelihood comparison can be categorized by the abbreviation IFC-NLC. µ y ' ( x) := NLD ( y | x) = L( y | x) − L(¬y | x) + 1 2 Formula 51 If a target attribute is continuous, it can be mapped into the Zadehan domain of numeric truth-values between 0 and 1, and membership degrees can be computed by a normalized ratio of likelihoods of fuzzy events. If the target class is fuzzy, for example because the target variable is gradual, the likelihoods are calculated by fuzzy conditional relative frequencies based on fuzzy set cardinality (Dubois & Prade, 1980). Therefore, the formula for calculating the likelihoods is generalized in order to be suitable for both sharp and fuzzy characteristics. Thus, in the general case of variables with fuzzy truth-values, the likelihoods are calculated as defined in Formula 52. 56 n ∑ µ L( y | x) := ∑ i =1 x n i =1 (i ) µ y (i ) µ y (i) Formula 52 n L(¬y | x) := ∑ µ (i)(1 − µ (i)) ∑ (1 − µ (i)) i =1 x y n i =1 y Accordingly, the calculation of membership degrees using the NLR function (Formula 52) works for both categorical and fuzzy target classes and for categorical and fuzzy analytic variables. For numerical attributes, the attribute values can be discretized using quantiles, and a piecewise linear function can be approximated to average values in the quantiles and the corresponding NLR. A membership function for individuals based on their attribute values can be derived by aggregation, as explained in Section 2.4.2. Following the different comparison methods for conditional probabilities described by Joyce (2003), different methods for the induction of membership degrees using conditional probabilities are proposed in Table 1. They have been chosen in order to analytically test different Bayesian approaches listed by Joyce (2003) for their predictive capabilities. Additionally, three experimental measures were considered: logical equivalence, normalized correlation, and a measure based on minimum and maximum. In those formulae, 𝑥𝑥 and 𝑦𝑦 are assumed to be Zadehan with a domain of [0,1] or Boolean as a special case. These formulae are evaluated as parameters in the meta-induction experiment described in Section 4.2. Table 1 Proposed Formulae for Induction of Membership Degrees Method Formula Likelihood of y given x (L) 𝐿𝐿 𝑦𝑦 𝑥𝑥 = 𝑝𝑝(𝑥𝑥|𝑦𝑦) Normalized likelihood ratio (NLR) Normalized likelihood ratio unconditional (NLRU) Normalized likelihood difference (NLD) Normalized likelihood difference unconditional (NLDU) 𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑥𝑥 𝑦𝑦 𝑝𝑝 𝑥𝑥 𝑦𝑦 + 𝑝𝑝 𝑥𝑥 ¬𝑦𝑦 𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑥𝑥 𝑦𝑦 − 𝑝𝑝 𝑥𝑥 ¬𝑦𝑦 + 1 2 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑥𝑥 𝑦𝑦 𝑝𝑝 𝑥𝑥 𝑦𝑦 + 𝑝𝑝(𝑥𝑥) 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑥𝑥 𝑦𝑦 − 𝑝𝑝(𝑥𝑥) + 1 2 57 Conditional probability of y given x (CP) Normalized probability ratio (NPR) Normalized probability ratio unconditional (NPRU) Normalized probability difference (NPD) Normalized probability difference unconditional (NPDU) Equlivalence - if and only if (IFF) 𝑝𝑝(y|𝑥𝑥) 𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑦𝑦 𝑥𝑥 𝑝𝑝 𝑦𝑦 𝑥𝑥 + 𝑝𝑝 𝑦𝑦 ¬𝑥𝑥 𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑦𝑦 𝑥𝑥 − 𝑝𝑝 𝑦𝑦 ¬𝑥𝑥 + 1 2 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑦𝑦 𝑥𝑥 𝑝𝑝 𝑦𝑦 𝑥𝑥 + 𝑝𝑝(𝑦𝑦) 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑦𝑦 𝑥𝑥 = 𝑝𝑝 𝑦𝑦 𝑥𝑥 − 𝑝𝑝(𝑦𝑦) + 1 2 Minimum – maximum (MM) avg 1−x∙ 1−y Normalized correlation (NC) min∈() 𝑝𝑝 𝑦𝑦 𝑧𝑧 ∙ 1−y∙ 1−x 𝑝𝑝 𝑦𝑦 𝑥𝑥 + min∈() 𝑝𝑝 𝑦𝑦 𝑧𝑧 corr(x, 𝑦𝑦) + 1 2 + max∈() 𝑝𝑝 𝑦𝑦 𝑧𝑧 A method for discretization of a numerical range is the calculation of quantiles or n-tiles for the range of the analytical variable. A quantile discretization using n-tiles partitions the variable range into n intervals having the same number of individuals. The quantile 𝑄𝑄 𝑖𝑖 for an attribute value 𝑍𝑍(𝑖𝑖), of a numeric attribute 𝑍𝑍 and an individual 𝑖𝑖, is calculated using Formula 53, where 𝑛𝑛 is the number of quantiles, 𝑚𝑚 is the total number of individuals or data records, 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑖𝑖) is the position of the individual in the list of individuals sorted by their values of attribute 𝑍𝑍, and 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡(𝑟𝑟) is the closest integer that is smaller than the real value 𝑟𝑟. 𝑄𝑄 𝑖𝑖 ∶= 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑛𝑛 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑖𝑖 − 1 𝑚𝑚 Formula 53 The rank of individual 𝑖𝑖 relative to attribute 𝑍𝑍, 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑖𝑖 , in a dataset 𝑆𝑆 is the number of other individuals, ℎ, that have higher values in attribute 𝑍𝑍, calculated using Formula 54. 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑖𝑖) ∶= ℎ ∈ 𝑆𝑆 ∀𝑖𝑖 ∈ 𝑆𝑆: 𝑍𝑍(ℎ) > 𝑍𝑍(𝑖𝑖) Formula 54 In order to approximate a linear function, the method of twodimensional piecewise linear function approximation (PLFA) is proposed. For a list of points in ℝ , ordered by the first coordinate, 𝑥𝑥 , 𝑦𝑦 , 𝑥𝑥 , 𝑦𝑦 , … , 𝑥𝑥 , 𝑦𝑦 , for every point 𝑥𝑥 , 𝑦𝑦 except the last one 58 (𝑖𝑖 = 1,2, … , 𝑛𝑛 − 1), a linear function, 𝑓𝑓 𝑥𝑥 = 𝑎𝑎 𝑥𝑥 + 𝑏𝑏 , can be interpolated to its neighbor point, where 𝑎𝑎 is the slope (Formula 55) and 𝑏𝑏 is the intercept (Formula 56) of the straight line. 𝑎𝑎 ∶= 𝑦𝑦 − 𝑦𝑦 𝑥𝑥 − 𝑥𝑥 𝑏𝑏 ∶= 𝑦𝑦 −𝑎𝑎 𝑥𝑥 Formula 55 Formula 56 For the calculation of membership degrees for quantiles, the input is a list of points with one point for every quantile k. The first coordinate is the average of the attribute values in k. The second coordinate is the inductive degree of membership µ y ' in target class y, given Z(i) is in quantile k, for example derived using the NLR function. y k := µ y ' (k ) xk := avg{Z (i ) | QnZ (i ) = k} Formula 57 Finally, a continuous, piecewise affine membership function can be calculated, truncated below 0 and above 1, and is composed of straight lines for every quantile k = 1, …, n − 1; n ≥ 2 of the numeric variable Z (Formula 58). ⎧0 ⎪a x + b 1 ⎪ 1 ⎪ ⎪ µ y ( x) := ⎨ak x + bk ⎪ ⎪ ⎪an−1 x + bn−1 ⎪ ⎩1 | a1 x + b1 ≤ 0 ∨ an−1 x + bn−1 ≤ 0 | x ≤ x2 | xk < x ≤ xk +1 Formula 58 | x > xn−1 | a1 x + b1 ≥ 1 ∨ an−1 x + bn−1 ≥ 1 The number of quantiles can be optimized, so that the correlation of the membership function with the target variable is optimal, as illustrated in Figure 10. 59 Figure 10: Computation of membership functions for numerical variables. 2.4.2 Multivariate Membership Function Induction As shown in Figure 11, the proposed process for inducing a multivariate inductive fuzzy class consists of preparing the data, inducing univariate membership functions for the attributes, transforming the attribute values into univariate target membership degrees, classifying individuals by aggregating the fuzzified attributes into a multivariate fuzzy classification, and evaluating the predictive performance of the resulting model. Figure 11: Proposed method for multivariate IFC. The idea of the process is to develop a fuzzy classification that ranks the inductive membership of individuals, 𝑖𝑖, in the target class 𝑦𝑦 gradually. This fuzzy classification will assign individuals an inductive membership degree to the predictive inductive fuzzy class 𝑦𝑦′ using the multivariate model 𝜇𝜇 . The higher the inductive degree of membership 𝜇𝜇 (𝑖𝑖) of an individual in 𝑦𝑦′, the greater the degree of inductive support for class membership in the target class 𝑦𝑦. 60 In order to accomplish this, a training set is prepared from source data, and the relevant attributes are selected using an interestingness measure. Then, for every attribute 𝑋𝑋 , a membership function, 𝜇𝜇 : dom 𝑋𝑋 ⟶ [0,1], is defined. Each 𝜇𝜇 is induced from the data such that the degree of membership of an attribute value 𝑋𝑋 (𝑖𝑖) in the inductive fuzzy class 𝑦𝑦′ is proportional to the degree of support for the inference that 𝑖𝑖 ∈ 𝑦𝑦. After that, in the univariate classification step, each variable, 𝑋𝑋 , is fuzzified using 𝜇𝜇 . The multivariate fuzzy classification step consists of aggregating the fuzzified attributes into one multivariate model, 𝜇𝜇 , of data elements that represents the membership function of individual 𝑖𝑖 in 𝑦𝑦′. This inductive fuzzy class corresponds to an IFC that can be used for predictive ranking of data elements. The last step of the process is model evaluation through analyzing the prediction performance of the ranking. Comparing the forecasts with the real class memberships in a test set does this. In the following paragraphs, every step of the IFC process is described in detail. In order to analyze the data, combining data from various sources into a single coherent matrix composes a training set and a test set. All possibly relevant attributes are merged into one table structure. The class label 𝑌𝑌 for the target variable has to be defined, calculated, and added to the dataset. The class label is restricted to the Zadehan domain, as defined in the previous section. For multiclass predictions, the proposed process can be applied iteratively. Intuitively, the aim is to assign to every individual a membership degree in the inductive fuzzy class 𝑦𝑦′. As explained in Section 2.3.1, this degree indicates support for the inference that an individual is a member of the target class 𝑦𝑦. The membership function for 𝑦𝑦′ will be derived as an aggregation of inductively fuzzified attributes. In order to accomplish this, for each attribute, a univariate membership function in the target class is computed, as described in the previous section. Once the membership functions have been induced, the attributes can be fuzzified by application of the membership function to the actual attribute values. In order to do so, each variable, 𝑋𝑋 , is transformed into an inductive degree of membership in the target class. The process of mapping analytic variables into the interval [0, 1] is an attribute fuzzification. The resulting values can be considered a membership degree to a fuzzy set. If this membership function indicates a degree of support for an inductive inference, it is called an inductive attribute fuzzification (IAF), and this transformation is denoted by the symbol ⇝ in Formula 59. 𝑋𝑋 i ⇝ 𝜇𝜇 𝑋𝑋 i Formula 59 61 The most relevant attributes are selected before the IFC core process takes place. The proposed method for attribute selection is a ranking of the Pearson correlation coefficients (Formula 38) between the inductively fuzzified analytic variables and the (Zadehan) target class indicator Y. Thus, for every attribute, 𝑋𝑋 , the relevance regarding target 𝑦𝑦 is defined as the correlation of its inductive fuzzification with the target variable (see Section 3.1.1). In order to obtain a multivariate membership function for individuals 𝑖𝑖 derived from their fuzzified attribute values 𝜇𝜇 𝑋𝑋 𝑖𝑖 , their attribute value membership degrees are aggregated. This corresponds to a multivariate fuzzy classification of individuals. Consequently, the individual’s multivariate membership function 𝜇𝜇 : 𝑈𝑈 ⟶ [0,1] to the inductive fuzzy target class 𝑦𝑦′ is defined as an aggregation, 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, of the membership degrees of 𝑛𝑛 attributes, 𝑋𝑋 , 𝑘𝑘 = 1,2, … , 𝑛𝑛 (Formula 60). 𝜇𝜇 𝑖𝑖 ∶= 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝜇𝜇 𝑋𝑋 𝑖𝑖 , … , 𝜇𝜇 𝑋𝑋 𝑖𝑖 Formula 60 By combining the inductively fuzzified attributes into a multivariate fuzzy class of individuals, a multivariate predictive model, 𝜇𝜇 , is obtained from the training set. This corresponds to a classification of individuals by the fuzzy proposition “𝑖𝑖 is likely a member of 𝑦𝑦,” for which the truth value is defined by an aggregation of the truth values of fuzzy propositions about the individual’s attributes, 𝑋𝑋 𝑖𝑖 is 𝑦𝑦′. This model can be used for IFC of unlabeled data for predictive ranking. Applying an alpha cutoff, 𝑖𝑖 𝜇𝜇 (𝑖𝑖) ≥ 𝛼𝛼 , an 𝛼𝛼 ∈ [0,1] leads to a binary classifier. There are different possibilities for calculating the aggregation, 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎. Simpler methods use an average of the attribute membership degrees, logical conjunction (minimum, algebraic product), or logical disjunction (maximum, algebraic sum). More sophisticated methods involve the supervised calculation of a multivariate model. In this thesis, normalized or cutoff linear regression, logistic regression, and regression trees are considered. These different aggregation methods were tested as a parameter in the meta-induction experiment described in Section 4.2 in order to find an optimal configuration. Finally, in order to evaluate predictive performance, the classifier is applied to a hold-out test set, and the predictions 𝜇𝜇 𝑖𝑖 are compared with the actual target variable (𝑖𝑖). The correlation between the prediction and the target, corr 𝜇𝜇 , 𝑌𝑌 , can be used to compare the performance of different IFC models. 62 3 Analytics & Marketing 3.1 Analytics Analytics is “the method of logical data analysis” (merriamwebster.com, 2012a). According to Zimmermann (1997), data analysis is the “search for structure in data”. The more data is available, the more complex it becomes to find relevant information. Consequently, organizations and individuals analyze their data in order to gain useful insights. Business analytics is defined as “a broad category of applications and techniques for gathering, storing, analyzing and providing access to data to help enterprise users make better business and strategic decisions” (Turban, Aronson, Liang, & Sharda, 2007, p. 256). The ability of enterprises to analyze the potentially infinite space of available data—their capacity of business analytics—is a major competitive advantage. Companies that use analytics as key strategies are called “analytics competitors” by Davenport (2006, p. 3). They can differentiate themselves through a better customer understanding in a time when products and technologies are becoming more and more comparable. Analytics competitors apply predictive modeling to a wide range of fields, such as customer relationship management, supply chain management, pricing, and marketing. Their top management understands and advocates that most business functions can benefit from quantitative optimization. In fact, business analytics can be applied to almost any area that concerns an enterprise: o Customer relationship management (CRM): Prediction of the most appropriate customer relationship activity. o Web analytics: Optimization of the website according to click stream analysis. o Compliance: Prediction of illegal behavior such as fraud or money laundering. o Risk management: Prediction of credit worthiness. o Strategic management: Visualization of customer profiles for product or market strategies. o Marketing: Prediction of customer product affinity. Kohavi, Rothleder, and Simoudis (2002) identified comprehensibility and integration as the driving forces of emerging trends in business analytics. Today, good business analytics is either comprehensible or integrated: Human decision makers need 63 interpretable, visual, or textual models in order to understand and apply analytic insights in their daily business. Contrarily, information systems demand machine readable, integrated automated interfaces from analytic applications to operational systems in order to apply scorings or classifications in automated processes. In order to enhance the data basis for analytics, large data pools, such as data warehouses, have been built. Today, the problem is no longer the lack of data, but the abundance of it. Data mining (in the broader sense) is “the process of discovering patterns in data”. In a more narrow sense, those “structural patterns” help not only to explain the data but also to make “nontrivial predictions on new data” (Witten & Frank, 2005, p. 5). The common aspect of the two definitions is the discovery of patterns in data. In contrast, the second definition emphasizes the inductive aspect of data mining, namely prediction. Data mining, in its broader sense, means any form of discovery of knowledge in data, whereas in the narrower sense, it means model induction for prediction using statistics and machine learning. One aim of data mining is the induction of models for datasets. If applied correctly, models not only describe the past, but also help to forecast the future. Successful data mining is a practical demonstration of inductive logic. There are two categories of machine learning: non-supervised and supervised learning. The former extracts models such as clusters or association rules from data without labels. The latter induces a model for labeled data, representing the relationship between the data and the label. Continuous labels lead to regression models, and categorical ones lead to classification models or classifiers. Descriptive data analysis presents data as is, that is, it describes the data without generalization or conclusion. It is completely deductive, because the numbers are true without need for inductive support. In a business context, this is often called reporting. A report is an understandable tabular or graphical representation of relevant data. Usually, methods of classification and aggregation are applied to consolidate data into meaningful information. Frequently applied techniques for descriptive data analysis are deductive classification, aggregation, and grouping. For deductive classification, data records are classified according to an a priori predicate. Therefore, the class membership is known in advance. Aggregation is the calculation of a scalar value from a vector or set, such as sums, averages, or extrema. Often, these aggregations are grouped by one or more variables. In such cases, the aggregated values are calculated for each combination of values in the grouping variables For example, the average cost and benefit per customer segment could be an interesting data description for many companies. An important part of descriptive data analysis is the presentation of the results. The raw output of an analysis is usually 64 a matrix or table, but human decision makers are usually managers that want to decide quickly and intuitively based on insights that can be understood. Therefore, visualization of data descriptions can help increase the understanding of analytic insights. Data descriptions might indicate certain conclusions or inferences that cannot be directly deduced from the data. For example, if twice as many women than men have bought a certain product, an inference could be that women are more likely to buy this product in the future. The question, therefore, is, what kind of inductive inferences can be made from the given data descriptions, and how high is the degree of support in the data for this kind of inductive inference? There are two important inductive techniques in data analysis: attribute selection and prediction. In data abundance, not only the number of available records is incredibly large but also the number of available attributes. Automated attribute selection applies statistical and machine learning techniques for a ranking of association between attributes and a prediction target, and thus filters out the most relevant attributes for inductive inferences regarding a target class. Prediction means inference of an unknown feature (a target variable) from known features (the analytic variables). This can be accomplished to a certain degree of likelihood using a model induced by methods from statistics or machine learning. There are two kinds of predictions: inductive classification and regression. Inductive classification means predicting a categorical target variable, and regression means predicting a numerical target variable. Fuzzy classification is a special case in which the prediction consists of a numeric value indicating the gradual membership in a category. Insights from analytics are traditionally presented to human decision makers as tables and graphics. Today, more and more analytic results are loaded automatically from analytic systems into operational systems, a concept that is called integrated analytics. Those systems can automate certain decision processes or display analytic insights to users of operational systems. Marketing decisions can be automated, such as choosing an individualized advertisement message in the online channel (Kaufmann & Meier, 2009). The process of integrated analytics can be described in five steps. First, analytic data is collected from different sources. Second, a predictive model is induced from the data, either in a supervised form using a target variable or in an unsupervised form of clustering or association analysis. Third, a prediction, classification, or score is assigned to the original data based on the induced model. Fourth, these predictions are transmitted to the operational systems where they are applied to decision support. Finally, outcomes of decision support—for example, sales decisions and 65 actions—are fed back into the analytic data pool for meta-analysis and iterative optimization. Application of fuzzy logic techniques to analytics is called fuzzy data analysis. This permits gradual relationships between data and predictions. The application of fuzzy logic to analytics is appropriate when there is fuzziness in the data, the prediction target, or in the association between the two (Zimmermann H. J., 1997). The advantage of fuzzy classification methods is that the class boundary is not sharp: Patterns detected by fuzzy data mining provide soft partitions of the original data. Furthermore, an object can belong to several classes with different degrees of membership (Zimmermann H. J., 2000). Two main advantages of fuzzy logic techniques pointed out by Hüllermeier (2005) are the elimination of certain threshold effects because of gradual soft data partitioning, and the comprehensibility and interpretability of resulting models containing linguistic terms. Hüllermeier discusses possible fields of application of fuzzy set theory to data mining: Fuzzy cluster analysis partitions a set of objects into several fuzzy sets with similar properties in an unsupervised manner. Learning of fuzzy rules computes decision rules with fuzzy propositions. Fuzzy decision trees are a special case of fuzzy rules in which every node of the tree partitions the data into a fuzzy subset with an optimal distinction criterion. Fuzzy association analysis computes association rules between fuzzy restrictions on variables. Fuzzy classification partitions sharp data into fuzzy sets according to a classification predicate (Meier, Schindler, & Werro, 2008). If this predicate is inferred by induction, the process is called inductive fuzzy classification, or IFC (Kaufmann & Meier, 2009). In business analytics, scoring methods are used for ranking data records according to desirable features. Multivariate methods based on multiple attributes, for instance linear or logistic regression, can predict targets such as customer profitability, product affinity, or credit worthiness. The scores are used in daily decision making. Record scoring based on data in an information system is an inference about unknown target variables, which is not deductive. Rather, it is a form of inductive inference. This introduces fuzziness into decision support because there is only some degree of support for the hypothesis that a given record belongs to a certain class. In fact, statistical models can increase the likelihood of correct inductive classification. Accordingly, for every scored record, there is fuzziness in the membership to the target class. Consequently, fuzzy logic is the appropriate tool for reasoning about those fuzzy target classes. The solution is to compute a continuous membership function mapping from the data into the fuzzy target class to which every record has a degree of membership. Ranking or scoring models yield continuous predictions instead of crisp 66 classifications. Those models can serve as target membership functions. Thus, scoring methods correspond to an IFC. The proposed IFC method provides a means for computing inductive membership functions for a target class. These membership functions can be visualized. Furthermore, they can be used for inductive fuzzification of attributes, which can be applied for attribute selection and for improving predictive models. Thus, the proposed IFC methods can be applied in the following areas of data analysis: o Selection: Attributes can be scored by the correlation of automatically generated inductive membership degrees with a Boolean or Zadehan target class in order to select relevant attributes. o Visualization: Induced membership functions can serve as humanreadable diagrams of association between analytic and target variables using a plot of automatically generated membership functions. o Prediction: Datasets used for prediction can be inductively fuzzified by transforming the original data into inductive target membership degrees, which can improve the correlation of predictions with the actual class membership in existing statistical methods. 3.1.1 Selection In practice, there is an abundance of data available. The problem is to find relevant attributes for a given data mining task. Often, thousands of variables or more are available. Most machine learning algorithms are not suited for such a great number of inputs. Also, human decision makers need to know which of those customer attributes are relevant for their decisions. Therefore, the most relevant attributes need to be selected before they can be used for visualization or prediction. In order to find relevant attributes for predicting a target class y, an attribute selection method can be derived from membership functions induced by IFC, using the method proposed in Section 2.4. All possible analytic variables can be ranked by the correlation coefficient of their NLR with the target variable. For every attribute, Xk, the membership function in the predictive target class y’, denoted by µy’(Xk), is computed. The membership in target y is indicated by a Zadehan variable Y, which is a variable with a domain of gradual truth values in the interval between 0 and 1. Then, the Pearson sample correlation coefficients between the NLR-based membership degrees and the Zadehan values of Y are calculated. Thus, the relevance of attribute Xk regarding target y is defined as the correlation of its inductive 67 fuzzification with the target variable (Formula 61). The most relevant variables are those with the highest correlations. relevance( X k ) := corr ( µ y ' ( X k ), Y ) Formula 61 The fuzzification of analytic variables prior to attribute relevance ranking has the advantage that all types of analytic variables— linguistic, categorical, Boolean, Zadehan, and numeric variables—can be ranked using the same measure. Choosing correlation as a measure of association with the target has the advantage that it is a standard aggregate in SQL, and thus readily available as a well-performing precompiled function in major database systems. To illustrate the proposed method, the attributes of the German credit data are ranked regarding the target class of customers with a good credit rating (Table 2). Those attributes have been inductively fuzzified with NLRs. The Pearson coefficient of correlation between the resulting membership degrees and the Boolean target variable good credit rating has been calculated. One can see that, for a credit rating, checking status, credit history, or duration of customer relationship are quite correlated attributes, whereas for example, the number of existing credits is less relevant. This kind of attribute selection can be used as an input for visualization and prediction. When a target class is defined, the relevant correlated variables can be identified. A visualization of five to ten most relevant variables gives good insights about associations in the data. Table 2 Example of an Attribute Ranking regarding NLR/Target Correlation1 1 Attribute Correlation of NLR with target Checking status 0.379304635 Credit history 0.253024544 Duration 0.235924759 Purpose 0.190281416 Savings status 0.189813415 Credit amount 0.184522282 Housing 0.173987212 Property magnitude 0.157937117 Age 0.136176815 http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data) 68 Employment 0.128605942 Personal status 0.093525855 Installment commitment 0.091375548 Other payment plans 0.079139824 Foreign worker 0.077735317 Job 0.061875096 Other parties 0.060046535 Own telephone 0.051481771 Existing credits 0.042703728 Residence since 0.034584104 Num. dependents 0.007769441 3.1.2 Visualization The IFC-NLR method can be applied to create visualizations of associations between an analytic variable (a factor) and a class (a target). A visualization of variable association is a human-readable presentation that allows the reader to see the distribution of target likelihood in tabular form or as a graph. Using IFC for visualization, the table consists of relations between factor values and normalized target likelihood ratios, and the graph results in a plot of the inductive membership function. For categorical factors, the graph is a bar chart. For numerical factors, the membership function is plotted as a continuous line. The advantage of this method is that the notion of membership of a factor X in a target Y is semantically clear and intuitively understandable by readers. Furthermore, the semantics of the NLR function is mathematically clearly defined. As an example, for two variables from the German credit dataset, checking status and duration, the inductive fuzzy membership function in the target class of customers with a good credit rating can be visualized as shown in Figure 12. This graphic can be interpreted as follows: Customers without checking accounts or with a balance of more than $200 are more likely to have good credit histories than not. Customers with a negative balance are more likely to have bad credit ratings. For the duration of the credit (in months), the shorter the duration is, the higher the likelihood of good credit history. Credits with durations of more than 19 months have higher likelihoods of being associated with bad credit histories, and accordingly, the NLR is less than 0.5. 69 Figure 12: Visualization of relevant attributes and their association with the target as an inductive membership function. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 1731 Copyright 2012 by Publisher. 70 3.1.3 Prediction A method for the application of likelihood-based IFC to prediction has been introduced by Kaufmann and Meier (2009). The basic idea is to create a multivariate inductive model for target class membership with a combination of inductively fuzzified attributes derived by membership function induction (see Section 2.4). The proposed approach for application of likelihood-based IFC to prediction consists of a univariate inductive fuzzification of analytic variables prior to a multivariate aggregation. This has the advantage that non-linear associations between analytic variables and target membership can be modeled by an appropriate membership function. As presented in Figure 13, the following steps are applied in order to derive an inductive membership degree of individual i in the prediction y’ for class y based on the individual’s attributes: o A: The raw data consists of sharp attribute values. o B: An inductive definition of the membership function for the attribute values in the predictive fuzzy class y’ is calculated using the previously described IFC-NLC methods. o C: The attribute values are fuzzified using the derived membership functions from step B. This step is called inductive attribute fuzzification (IAF), defined as supervised univariate fuzzy classification of attribute values. o D: After that, the dataset consists of fuzzified attribute values in the interval [0,1], indicating the inductive support for class membership. o E: The several fuzzified analytic variables are aggregated into a membership degree of individuals in the predictive class. This can be a simple conjunction or disjunction, a fuzzy rule set, or a statistical model derived by supervised machine learning algorithms such as logistic or linear regression. o F: This results in a multivariate membership function that outputs an inductive membership degree for individual i in class y, which represents the resulting prediction. It is proposed to preprocess analytic data with IFC methods in order to improve prediction accuracy. More specifically, attributes used for data mining can be transformed into inductive membership degrees in the target class, which can improve the performance of existing data mining methods. The basic idea is to create a multivariate model for a target variable with a combination of inductively fuzzified attributes derived by membership function induction. 71 Figure 13: Proposed schema for multivariate IFC for prediction. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 172. Copyright 2012 by Publisher. Empirical tests (Section 4.2) have shown that using the proposed method of IAF significantly improves the average correlation between prediction and target. For binary target variables, a combination of IAF with an NLR, and the subsequent application of a logistic regression, turned out to be optimal. This configuration is illustrated in Figure 14. For numerical targets, a linear fuzzification (LF) of the target, IAF using NLD, and calculation of a regression tree have turned out to be optimal. This configuration is illustrated in Figure 15. However, this improvement by IAF can be shown only in average prediction correlation. There are instances of data in which the application of IAF lowers the predictive performance. Therefore, an IAF is worth a try, but it has to be tested whether it really improves the prediction or not in the given data domain. IAF provides a tool for finetuning predictive modeling, but answering the question about the best classification algorithm in a specific context takes place in the data mining process in which the algorithm with the best results is selected (Küsters, 2001). 72 Figure 14: IFC prediction with binary target classes based on categorical attributes. Figure 15: IFC prediction with Zadehan target classes based on numerical attributes. 73 74 3.2 Marketing Analytics “Analytics competitors understand that most business functions— even those, like marketing, that have historically depended on art rather than science—can be improved with sophisticated quantitative techniques.” (Davenport, 2006, p. 4) The application of the method of data analysis to marketing is called marketing analytics. This can optimize the marketing mix and individual marketing on multiple channels. An important challenge in marketing is its financial accountability: making costs and benefits of marketing decisions explicit. Spais and Veloutsou (2005) proposed that marketing analytics could increase the accountability of marketing decisions. They point out the need of incorporating marketing analytics in daily marketing decision making, which is a conceptual shift for marketers to work with mathematical tools. Furthermore, they suggest that the problem of fuzziness in consumer information should be addressed by fuzzy logic techniques. Individual marketing is capable of accounting for all costs and benefits of a marketing campaign to daily CRM decisions. Because all customers, advertisements, and sales are recorded in information systems, the benefit of marketing activities can be measured. For example, in an individual marketing campaign, the sales ratio of targeted customers can be compared to average or random customers, and the sales increase can be directly linked to the corresponding marketing campaigns. The application of IFC methods to data analysis can be used in marketing analytics. Specifically, customer and product analytics, target selection, and integrated analytics are proposed as possible applications for the methods that have been developed in this thesis. Using IFC methods for selection, visualization, and prediction can support these four fields in marketing analytics. The benefits of IFC include automated generation of graphics with a linguistic interpretation, clear model semantics for visualization by membership function induction, and a possibly better target selection because of optimized predictive models using IAF. As shown in Figure 16, general application of IFC to the field of analytics for selection, visualization, and prediction can be applied to marketing analytics, specifically. Visualizations of induced membership functions of attributes in marketing targets provide fuzzy customer and product profiles. Furthermore, MFI allows fuzzy definitions of target groups based on customer characteristics. Finally, scoring methods can represent membership functions to fuzzy target groups defined as fuzzy sets of customers to which every individual customer has a degree of 75 membership. These can be integrated into automated analytics processes for individual marketing. Figure 16: Proposed areas of application of IFC to marketing analytics. 3.2.1 Customer Analytics The aim of customer analytics is to make associations between customer characteristics and target classes of interesting customers with desirable features understandable to marketing decision makers. The question is, which customer features are associated with the target? A customer report visualizes the likelihood of target class membership given different values of an attribute in order to show relevant features that distinguish the target class from the rest of the customers. In the context of CRM, there are several target customer classes that are interesting for profiling: o Product affinity: Customers who have bought a given product. o Recency: Customers who are active buyers because they have recently bought a product. o Frequency: Customers who buy frequently. o Loyalty: Customers who have used products or services for a long period of time. o o 76 Profitability: Customers who are profitable to the enterprise. Monetary value: Customers who generate a high turnover. A customer profile for the mentioned target customer classes should answer the question, what distinguishes customers in this class from other customers? Thus, the association between class membership and customer attributes is analyzed. The customer attributes that can be evaluated for target likelihood include all of the aforementioned characteristics, and additionally, socio-demographic data, geographic data, contract data, and transaction behavior recorded in operational computer systems, such as CRM interactions and contacts in direct marketing channels. If one looks more closely at the target customer classes, one can see that, often, these classes are fuzzy. For example, to generate a “high turnover” is a fuzzy proposition, and the corresponding customer class is not sharply defined. Of course, one could discretize the class using a sharp boundary, but this does not reflect reality in an appropriate manner. Therefore, it is proposed to visualize fuzzy customer classes as fuzzy sets with membership functions. In order to generate customer profiles based on a target class and the relevant customer attributes, the method of MFI by NLRs presented in Section 2.4.1 can be applied. It is proposed to select the relevant customer attributes first, using the method previously described. After that, the relevant attributes are called profile variables. For a profile variable X, the values xk ∈ dom(X) are called profile characteristics. They are assigned an inductive membership degree for the predictive target class y’ defined by µy’(xk) := NLR(y|xk). A plot of the corresponding membership function represents a visual fuzzy customer profile for variable X, as illustrated by Figure 17. Two instances of a fuzzy customer profile using NLRs are shown in Figure 17, in which the customers of investment funds products of a financial service provider are profiled by customer segment and customer age. Interpretation of the graphs suggests that customers above the age of 50 and customers in the segments Gold and Premium are more likely to buy investment funds than the average customer. A fuzzy customer profile derived with the IFC-NLR method creates a visual image of the customer class to be analyzed by showing degrees of membership of customer features in the target characteristic. Marketing managers can easily interpret the resulting reports. 77 Figure 17: Schema of a fuzzy customer profile based on IFC-NLR, together with two examples of a fuzzy customer profile. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 173. Copyright 2012 by Publisher. 3.2.2 Product Analytics Marketing target groups can be defined by typical characteristics of existing product users. This method is based on analysis of existing customer data in the information systems of a company, which is an instance of secondary market research. The data of customers that are product users (the test group) is compared to the data of customers that do not use the product (the control group). The attributes that separate test and control group most significantly are used for defining the target group of potential customers. These attributes are the most selective ones. Accordingly, the target group is derived inductively by similarity to the set of existing product users. As an example, a customer profile 78 could show that customers between 30 and 40 years of age in urban areas are twice as likely to be product users as other customers. In such a case, the target group for this particular product could be defined accordingly. Analytic target group definitions are based on a set of customer characteristics. However, these characteristics differ in relevance. Thus, the set of relevant customer characteristics for a product target group is a fuzzy set because the boundary between relevant and non-relevant characteristics is gradual. The corresponding membership degree can be precisiated by a relevance or selectivity metric. It is proposed that an NLR be used as a measure of selectivity. The likelihood of product usage, UP (the notation is for product P), given that a customer record, i, in the existing customer database d has characteristic c, can be computed accurately as a sampled conditional probability, p(c|UP)d, defined as the number of product users that have this feature divided by the total number of product users (Formula 62). In this formula, c(i) and UP(i) are Boolean truth values that indicate presence or absence of a customer characteristic and the usage of the product. p(c | U P ) d := | {i ∈ d | c(i ) ∧ U P (i )} | | {i ∈ d | U P (i )} | Formula 62 This product usage likelihood can be compared to the likelihood, p(c |¬UP)d, of the opposite hypothesis that a customer is not a product user, given characteristic c, calculated as the number of non-users of the product that have characteristic c divided by the total number of nonusers (Formula 63). p(c | ¬U P ) d := | {i ∈ d | c(i ) ∧ ¬U P (i )} | | {i ∈ d | ¬U P (i )} | Formula 63 Thus, the selectivity of a customer characteristic c can be expressed by the ratio between the likelihood of product usage, given c, and likelihood of product non-usage, given c (Formula 64). This ratio can be normalized as formalized in Formula 65. LR (U P | c) d := p (c | U P ) d p ( c | ¬U P ) d Formula 64 79 NLR (U P | c) d := p (c | U P ) d p(c | U P ) d + p(c | ¬U P ) d Formula 65 Finally, the selectivity of characteristic c, expressed as an NLR, represents an inductive degree of membership of this characteristic in the fuzzy target group definition for product P. An inductive fuzzy target group definition, t, based on analysis of database d is a fuzzy set of customer features and its membership function is defined by the corresponding NLR (Formula 66). µt (c) d := NLR (u P | c) d Formula 66 A customer characteristic may or may not be defined by a granular attribute value. Characteristics can also be computed by functional combination of several basic attribute values. Furthermore, the customer characteristic indicator c(i) can indicate a fuzzy truth value in the interval [0,1] if the corresponding characteristic c is a fuzzy proposition such as “high turnover.” In that case, the definition of the NLR for fuzzy truth values from Section 2.4.1 can be applied. However, the aim of analytic target group definition is to find or construct optimal defining target customer characteristic indicators with the highest possible degree of membership in the target group definition. These indicators can be constructed by logical connections or functions of granular customer characteristics. This kind of analytic target group definition is a conceptual one, suited for presentation to human decision makers, because it is intuitively understandable. It results in a ranking of customer characteristics that define typical product customers. However, for integrated analytics in analytic CRM, a scoring approach is more promising because it yields better response rates, although the resulting models may be less comprehensible. 3.2.3 Fuzzy Target Groups Contemporary information technology provides the means to individualize marketing campaigns. Each customer is targeted not only directly but also individually with an advertisement message, decision, or activity. The behavior of an organization toward an individual customer is analytically customized according to the customer’s classification, and customers with different characteristics are targeted with different behaviors. Individual customers are assigned a score for different CRM targets based on a predictive multivariate model. IFC 80 methods can be applied to customer data for individual marketing in order to calculate a predictive scoring model for product usage. This model can be applied on the data to score customers for their product affinity, which corresponds to an IFC of customers in which the predictive model is a multivariate membership function. An aim of customer analytics is the application of prediction to the selection of a target group with likelihood of belonging to a given class of customers. For example, potential buyers or credit-worthy customers can be selected by applying data analysis to the customer database. This is done either by segmentation or scoring (Sousa, Kaymak, & Madeira, 2002). By applying a segmentation approach, sharp sets of customers with similar characteristics are calculated. Those segments have a given size. For individual marketing, a scoring approach is more promising, in which every customer is assigned a score predicting a likelihood of response. The score is calculated by application of a predictive multivariate model with numeric output, such as neural networks, linear regression, logistic regression, or regression trees. When a numeric score can be normalized, it represents a membership function to a fuzzy set of customers with a high response likelihood—a fuzzy target group—to which every customer has a degree of membership. This scoring process can be applied for every product. Thus, for every individual customer, the degree of membership to all possible cross-selling target groups is known, and in direct customer contact, the customer can be assigned the advertisement message with the highest score. There are different methods for fuzzy customer scoring. For example, fuzzy clustering for product affinity scoring was presented by Setnes, Kaymak, and van Nauta Lemke (1998), in which the reason for using fuzzy systems instead of neural networks is declared as the comprehensibility of the model. Kaufmann and Meier (2009) evaluated a supervised fuzzy classification approach for prediction using a combination of NLRs with an algebraic disjunction. In this thesis, a synthesis of probabilistic modeling and approximate reasoning applying fuzzy set theory is proposed for prediction (as previously explained), using univariate inductive membership functions for improving the target correlations of logistic regressions or regression trees. However, all inductive scoring methods that yield a numeric value representing response likelihood can be normalized in order to represent an inductive membership function to a fuzzy set of customers, and can, therefore, be categorized as IFC. A classical sharp CRM target group T ⊂ C is a subset of all customers C defined by a target group definition t: T := {i ∈ C | t(i)}. In the case of scoring methods for target selection, the output of the model application is numeric and can be normalized in the interval [0,1] to 81 represent a membership function. In that case, the target group is a fuzzy set T’ and the (normalized) customer score is a membership function, µT’:C→[0,1]. This score is defined by a predictive model, M, as a multivariate combination of n customer attributes, X1, …, Xn, as shown in Formula 67. µT' (i) := M ( X 1 (i),..., X n (i)) Formula 67 Because T’ is a fuzzy set, it has to be defuzzified when a campaign requires a binary decision. For example, the decision about contacting a customer by mail has to be sharp. An alpha cutoff of the fuzzy set allows the definition of individual marketing target groups of optimal size regarding budget and response rate, and leads to a sharp target group T , as formalized in Formula 68. α Tα := {i ∈ C | µT ' (i) ≥ α } 3.2.4 Formula 68 Integrated Analytics For individual marketing, the fuzzy target group membership degrees are predictive scores that are processed via the CRM system, which dispatches them to the distribution channel systems for mapping customers to individualized advertisement messages. In all computer supported channels with direct customer contact, inbound or outbound, as soon as the customer is identified, a mapping can be calculated to the product to which the customer has the highest score. Based on that mapping, the advertisement message is chosen. In the online channel, logged-in customers are displayed individual advertisement banners. Customers are sent individual letters with product advertisement. In the call center and in personal sales, if the customer can be identified, the agent can try cross-selling for the next best product. In analytic customer relationship management (aCRM), customer data is analyzed in order to improve customer interactions in multiple channels (Turban et al., 2007). This can be applied to individual marketing. Target groups for aCRM campaigns are derived analytically and individually by application of classification and regression models to individual customer data records. The aim is to increase campaign response rates with statistical methods. Cross-selling campaigns analyze product affinity of customers given their features. Churn 82 (change and turn) campaigns target customers with a low loyalty prediction in order to re-gain their trust. Figure 18 illustrates a schematic information system that can enable aCRM processes. In contemporary enterprises, due to intensive application of computing machinery and electronic databases in business processes, there exist large amounts of possibly useful customer data in various software applications that can serve as data sources for aCRM. The heterogeneity of data storage implies a necessity for data integration before they can be analyzed. A system that automates this task is called a customer data warehouse (Inmon, 2005). Based on this integrated data pool, analytics provides inductions of predictive models for desirable customer classes, such as product purchase, profitability, or loyalty. If these models output a gradual degree of membership to the target classes, their application to the customer database can be called an IFC. These membership degrees are defined by a prediction of class membership for individual customers, sometimes called lead detection. The corresponding fuzzy sets of customers can be called fuzzy target groups. These analytic results, called leads by aCRM managers, are transferred into the operational CRM application, where their presentation to human decision makers provides decision support for direct customer contact. The utilization of analytic data output (leads) takes place in individual marketing channels with direct customer contact—for instance, web presence, mailing, call center, or personal sales. The results of aCRM campaigns, such as sales, are electronically collected and fed back into the customer data warehouse in order to improve future aCRM campaigns by meta-analytics. This mechanism is called a closed loop. Generally, aCRM is an instance of a business intelligence process (Gluchowski, Gabriel, & Dittmar, 2008), which involves data sourcing, integration, analytics, presentation, and utilization. Furthermore, it is also an instance of integrated analytics, in which analytics are integrated into operational systems with feedback mechanisms. In comparison to mass marketing, in individual marketing, every customer is provided with the advertisement that fits best. Especially when customer databases, such as a data warehouse, are present, individual scoring is possible based on available data: The customer’s target membership score is used to assign an individual advertisement message to each customer. These messages are delivered electronically via customer relationship management application to individual marketing channels. Campaign target groups are individualized. All customers in the target groups are known individually and are contacted directly. 83 Figure 18: Inductive fuzzy classification for individual marketing. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 170. Copyright 2012 by Publisher. 84 3.2.5 Case Study: IFC in Online Marketing at PostFinance The following case study shows the application of the proposed methods in practice, from a point of view of information systems and methodology. Membership function induction is applied to a real-world online marketing campaign. A comparison with crisp classification and random selection is made. PostFinance Inc.2 is a profitable business unit of Swiss Post. Its activities contribute significantly to the financial services market in Switzerland. PostFinance is an analytic enterprise feeding business processes with information gained from predictive analytics. The online channel of PostFinance supports individual advertisement banners for logged-in customers. Based on the customer score, a customer is assigned the advertisement message for the product with the highest response likelihood computed for that particular customer. By clicking on the banner, the customer has the opportunity to order the product directly. The marketing process of PostFinance uses analytical target groups and predictive scores on product affinity for individual marketing. Target groups for online marketing campaigns are selected using inductive classification on the customer data warehouse. Figure 19 shows the process from customer data to individualized online marketing. Target groups are defined in the data warehouse. This is done using a predictive model (e.g., logistic regression) or a crisp classification (e.g., customers above the age of 50 with a balance higher than CHF 10,000). A dataset is generated that assigns an individual advertisement message to every single customer based on his or her target group. This dataset is loaded into the online platform, where it controls the online advertisements. In the online channel, for every customer that logs in, the individual advertisement message is mapped according to the previous target group selection. A case study was conducted with PostFinance in 2008 in order to test IFC in a real online marketing campaign. Inductive fuzzy classification was applied in a PostFinance online marketing campaign promoting investment funds. The aim of the prediction was to forecast customers with an enhanced likelihood of buying investments funds. The resulting fuzzy classification yielded a fuzzy target group for an individual marketing campaign in which every customer had a gradual degree of membership. 2 http://www.postfinance.ch 85 Figure 19: Analytics applied to individual marketing in the online channel at PostFinance. Adapted from “ An Inductive Fuzzy Classification Approach applied to Individual Marketing.,” by M. Kaufmann and A. Meier, 2009, In Proceedings of the 28th North American Fuzzy Information Processing Society Annual Conference, p. 4. Copyright 2009 by Publisher. The same advertisement message was shown to three groups of customers: a test group of 5,000 customers with the highest degree of membership to a fuzzy target group, a control group of 5,000 customers selected by classical crisp target group definition, and a second control group with randomly selected customers. The hypothesis was that a gradual scoring approach (an inductive fuzzy customer classification) would provide better response rates than a crisp Boolean target selection (a segmentation approach), and that a fuzzy target group selection using typical characteristics of product users would classify customers more softly and eliminate certain threshold effects of Boolean classification because attributes can be compensated for with a multivariate gradual approach. It was tested whether customers in a fuzzy target group selected with a scoring approach had a higher response rate than customers in a crisp target group selected with a classical segmentation approach. Furthermore, the two groups were compared with a random selection of customers. In the following, it is shown how the data mining methodology presented in Section 2.4 has been applied to a real-world direct marketing campaign. 86 In order to prepare a test set for model induction, a sample customer dataset was selected from the customer data warehouse. Because of the excellent quality of the cleaned and integrated data in the data warehouse, the data preparation step was accomplished by a simple database query. The dataset contained anonymous customer numbers associated with customer specific attributes. As the target variable, the class label was set to 1 if the customer had investment funds in his product portfolio and to 0 if else. Mutual information of the dependent variables with the target variable was chosen as a ranking method. The following attributes were selected as relevant: o Customer segment (CS; 0.047 bit), o Overall balance (OB; 0.035 bit), o Customer group (CG; 0.016 bit), o Age (A; 0.013 bit). o Number of products (NP; 0.036 bit), o Loyalty (L; 0.021 bit), o Balance on private account (BP; 0.014 bit), and Table 3 Conditional Probabilities and NLRs for a Categorical Attribute X1: customer segment Y=1 Y=0 p(X1 | Y=1) p(X1 | Y=0) NLRY=1(X) Basis SMB Gold Premium Total 11'455 249 12'666 5'432 29'802 308'406 2'917 54'173 10'843 376'339 0.38 0.01 0.43 0.18 0.82 0.01 0.14 0.03 0.32 0.52 0.75 0.86 Using the method presented in Section 2.4, for each of the relevant attributes, the fuzzy restriction corresponding to the likelihood of having investment funds was induced. In the following section, the induction processes for a categorical and a continuous attribute are described in detail. As the first example, in the domain of the categorical attribute customer segment (X1), there are four values: Basis, SMB, Gold, and Premium. For customers who have not yet bought investment funds, the aim is to define a degree of membership in a fuzzy restriction on the customer segment domain in order to classify them for their likelihood to buy that product in the future. The frequencies and conditional probabilities are presented in Table 3. The first column indicates the customer segment. Column Y=1 87 contains the number of customers of each segment who have bought investment funds. Column Y=0 contains the number of customers of each segment who have not bought that product. According to Formula 69, the membership degree of segment Basis in the fuzzy restriction y1 was induced as follows: µ y (" Basis" ) = 1 p( X 1 = " Basis" | Y = 1) p( X 1 = " Basis" | Y = 1) + p( X 1 = " Basis" | Y = 0) 0.38 = = 0.32 = NLRY =1 (" Basis" ) 0.38 + 0.82 Formula 69 The other degrees of the membership were induced analogically. The resulting values are shown in column NLR of Table 3. The corresponding membership function is illustrated in Figure 20. Members hip degree to product 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 B as is NL R S MB G old P remium C us tomer S egment Figure 20: Membership function induction for a categorical attribute. Adapted from “ An Inductive Fuzzy Classification Approach applied to Individual Marketing.,” by M. Kaufmann and A. Meier, 2009, In Proceedings of the 28th North American Fuzzy Information Processing Society Annual Conference, p. 4. Copyright 2009 by Publisher. As a second example, for the continuous attribute overall balance, the membership function was induced in the following way: First, the NLR for deciles of the attribute’s domain was calculated analogically to the previous example, represented by grey squares in Figure 21. Then, a function, A / (1 + exp(B – C ln(x + 1)))+D, was fitted by optimizing the parameters A to D using the method of John (1998). The resulting membership function for the customer overall balance is shown as a line in Figure 21. 88 Members hip degree to product NL R F (x) = 0.71/(1+ E X P (7.1-­‐0.80*L N(x+ 1)))+ 0.09 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50'000 100'000 O verall B alance 150'000 Figure 21: Membership function induction for a continuous attribute. Adapted from “ An Inductive Fuzzy Classification Approach applied to Individual Marketing.,” by M. Kaufmann and A. Meier, 2009, In Proceedings of the 28th North American Fuzzy Information Processing Society Annual Conference, p. 5. Copyright 2009 by Publisher. Every relevant attribute of the original dataset was transformed to a fuzzy membership degree using SQL directly in the database. Categorical variables were transformed using a case differentiation statement. For example, the attribute customer segment was transformed using the following SQL command: select case when customer_segment = 'Basis' then 0.32 when customer_segment = 'SMB' then 0.52 when customer_segment = 'Gold' then 0.75 when customer_segment = 'Premium' then 0.86 end as fuzzy_customer_segment from ads_funds Continuous variables were transformed using a function expression. For example, the attribute overall balance was fuzzified using the following SQL expression: 89 select 0.71/(1+EXP(7.1-0.8*LN(overall_balance + 1)))+0.09 as fuzzy_overall_balance from ads_funds The individual fuzzy attribute domain restrictions were then aggregated to a multivariate fuzzy class of customers. This was done using an SQL statement implementing the gamma operator (Zimmermann & Zisno, 1980) defined in Formula 70. Γγ ( µ y1 (e j ), … , µ yn (e j )) := µ yi (e j ) γ ⋅ µ yi (e j )1−γ i Formula 70 i := (1 − ∏ 1 − µ yi ( xij ))γ ⋅ (∏ µ yi ( xij ))1−γ i i Table 4 Optimization of the Gamma Parameter for Multivariate Fuzzy Aggregation I(Y;Y') I(Y;Y'=1) LR(Y'=1) Gamma = 0 0.018486 0.4357629 6.7459743 Gamma = 0.5 0.0185358 0.4359879 6.7488694 Gamma = 1 0.0209042 0.4787567 7.2359172 In order to define the gamma parameter, different performance measures were calculated. As shown in Table 4, a gamma of 1, corresponding to an algebraic disjunction or full compensation, was most successful. Thus, the multivariate fuzzy classification was performed using the following SQL statement: Select (1(1-fuzzy_number_of_products) * (1-fuzzy_customer_segment) * (1-fuzzy_overall_balance) 90 * (1-fuzzy_loyalty) * (1-fuzzy_age) * (1-fuzzy_balance_on_private_account) * (1-fuzzy_customer_group) ) as multivariate_fuzzy_classification from f_ads_funds As a result, a fuzzy class of customers was calculated, for whom the degree of membership indicates the product affinity for investment funds and represents a product affinity score. This can be used in individual marketing for defining target groups for different products. In order to test the resulting fuzzy classifier, a pilot marketing campaign was performed using the resulting fuzzy target group. First, a target group of 5,000 customers with the highest membership degree was selected from the fuzzy class using an alpha cutoff (Test Group 1). Second, as a comparison, 5,000 other customers were selected using a crisp classification (Test Group 2), using the following crisp constraints: Select case when Customer segment in (‘Gold’, ‘Premium’, ‘SMB’) And Customer_group = 50 Plus And Loyalty > 14 And Number_of_products > 1 And Age between 35 and 75 And Balance on private account > 3000 And Overall_balance > 20000 Then 1 else 0 end as multivariate_crisp_classification From ads_funds This conventional target group selection used the same seven customer attributes as the fuzzy classification, but the classification was done using a crisp constraints predicate. Third, 5,000 customers were selected randomly (Test Group 3) in order to compare inductive classification to random selection. To each of those 15,000 customers, an online advertisement for investment funds was shown. After the marketing campaign, the product selling ratio during three months was measured. The results are shown in Table 5. The column Y=1 indicates the number of customers who have bought the product. The product selling ratio for the target group selected by IFC was the most effective. This individual marketing campaign using inductive fuzzy target group selection 91 showed that fuzzy classification can lead to a higher product selling ratio than crisp classification or random selection. In the case study, IFC predictions of product affinity were more accurate than those of the crisp classification rules on exactly the same attributes. The conclusion is that, in comparison to crisp classification, fuzzy classification has an advantageous feature that leads to better response rates because it eliminates certain threshold effects by compensation between attributes. Table 5 Resulting Product Selling Ratios per Target Group Test group 1: Fuzzy classification 2: Crisp classification 3: Random selection 92 Y=1 31 15 10 Y=0 4939 5037 5016 Sales Rate 0.63% 0.30% 0.20% 4 Prototyping & Evaluation Three software prototypes were programmed as proofs of concept of the technological aspects of the proposed methods for automated MFI in fuzzy data analysis. Master’s students developed two of them, iFCQL and IFC-Filter for Weka. The author developed the inductive fuzzy classification language IFCL. This implementation also allows experiments on the implemented methods for experimental evaluation. Using a meta-induction approach, the designed method was applied for prediction in several real-world datasets in order to analyze characteristics and optimal parameters of the constructed methodology and to compare it to conventional predictive approaches. Classical inductive statistical methods were applied to gain insights about induction by IFC. 4.1 4.1.1 Software Prototypes IFCQL The first attempt of prototyping software for IFC was developed in a master’s thesis by Mayer (2010). The basic idea was an extension of the existing fuzzy classification and query language FCQL, for which a prototype interpreter was built by Werro (2008). In order to derive fuzzy membership functions directly from the underlying data, the FCQL language was extended with commands to induce membership functions in order to classify data based on them. For example, it was intended to describe MFI using the syntax “induce fuzzy membership of <dependent variable> in <target variable> from <relation>.” For more detail on the iFCQL syntax, see the report by Mayer (2010). As shown in Figure 22, the architecture of iFCQL reflects the language-oriented design approach. The user can enter commands in iFCQL, which are translated into SQL, or he or she can enter SQL in the command shell, which is transmitted directly to the database by a database connector. Using language instead of a graphical user interface, it is possible to save and reload scripts for data mining processes—this was the intention. Nevertheless, the implementation of the iFCQL language interpreter, consisting of lexer, parser, evaluator, and SQL generator, turned out to be a complex task that is not directly associated with IFC. The original idea was to develop a tool to support the proposed data mining methodology of this thesis (Section 2.4). Thus, there was one statement class for each of the six steps: data preparation (audit), 93 MFI, fuzzification of attribute values, attribute selection, fuzzy classification of data records, and model evaluation. Eventually, only three syntax elements for the main tasks were implemented, namely MFI, univariate classification of attributes, and multivariate classification of records. Figure 22: Architecture of the iFCQL interpreter. Adapted from “Implementation of an Inductive Fuzzy Classification Interpreter,“ (master's thesis), by P. Mayer, 2010, p. 59. Copyright 2010 by Mayer, University of Fribourg, Switzerland. 4.1.2 IFCL This section presents an implementation of a tool that enables the computation of inductive membership functions based on the methodology proposed in Section 2.4. The inductive fuzzy classification language (IFCL) is a research prototype. Its goal is to show the 94 feasibility of implementing the data mining methodology based on IFC. It is a markup language designed for simplicity of description and parsing; therefore, the development of the language interpreter was kept simple. In addition to supporting the basic steps of the proposed method, the scope is to support automated experiments for benchmarking the predictive performance of the proposed methodology. This meta-induction experiment and its results are presented in Section 4.2 Figure 23: Inductive fuzzy classification language (IFCL) system architecture. As shown in Figure 23, the system architecture of IFCL consists of an IFCL file, a command line shell, the IFCL program itself, Weka regression algorithms, a relational database, and a data file. In the IFCL file, the data mining process steps are described in the IFCL language. In a command line shell, the IFCL program is invoked, and the corresponding IFCL file is passed to the program. There is no graphical user interface because the research concentrated on evaluating automated inductive inferences. The IFCL actions representing steps of the data mining process are interpreted by the IFCL program, which translates them into SQL queries to the relational database. For the supervised multivariate aggregation functions linear regression, logistic regression and M5P regression trees, open source implementations of mining algorithms from the Weka machine learning workbench (Hall et al. 2009) are accessed by the IFCL program. The IFC process steps that have been translated to SQL are executed on a database server. Analytic data is loaded into the database via a data file in comma-separated value format (CSV). The IFCL language supports a simple load functionality that creates the necessary database tables based on a data description and loads records from a file into database tables. The data file contains the data to be analyzed, and the IFCL program provides the means to compute predictive models based on the data in the form of membership functions. The functionality of the software encompasses all steps of the proposed data mining process for IFC as presented in Section 2.4.2. This 95 includes data preparation, univariate MFI, fuzzification of attribute values, attribute selection, multivariate aggregation, data classification, prediction, and evaluation of predictions. This prototype provides the possibility o to prepare data using arbitrary SQL statements, o to induce membership functions with all thirteen methods proposed in Table 1 in Section 2.4.1, o to fuzzify attribute values using the induced membership functions directly in the database, o to rank attribute relevance by sorting correlation membership degrees and target class indicators, o to aggregate many inductively fuzzified attributes into a fuzzy class membership for data records using eight different aggregation methods shown in Table 6, o to assign membership degrees by fuzzy classification to data records in the database, o and to evaluate predictive models with correlation analysis. o Executing IFCL script files in batch mode, o Dropping a database table, o Loading data into the database, o Classification of data, between All of these steps can be programmed in IFCL-syntax. IFCL is fully automatable using scripting files. All models are displayed and stored in SQL syntax. Therefore, it provides repeatability of experiments and comprehensibility of models with respect to their database implementation as queries. Details on the syntax and application of IFCL can be found in the appendix (Section A.4) To summarize, the functionality of IFCL encompasses the following points: o Connecting to the database, o Executing an SQL script, o Inducing a membership function, o Aggregating multiple variables, o Data preparation, and o Evaluating predictions, o Attribute selection. 96 Steps of the data mining process in Section 2.4.2 are invoked in so called IFCL actions. A sequence of actions is listed in IFCL files. IFCprocesses can be described in a tree-like structure of IFCL files. There are nodes and leaves of the IFCL tree. An IFCL file is either a node as a list of calls to other IFCL files or a leaf containing actual execution sequences. IFCL leaves consist of a database connection followed by a sequence of IFCL actions. An IFCL file can call the execution of several other predefined IFCL files in batch mode using the action execifcl. In order to do so, one needs to indicate the correct operation system file path of the corresponding IFCL file, either as an absolute path, or a path relative to the location of the process that invoked the IFCL program. IFCL can connect to a database with the action connect. This is necessary for operation and has to be done at the beginning of an IFCL file leaf. One needs to know the hostname of the database server, the service identifier, the port number, a valid username, and a password. The user must be granted the rights to read tables. For data classifications, the user needs grants to create tables. The action drop allows the deletion of an existing database table. The IFCL program tests whether the indicated table exists, and executes a drop command to the database if and only if the table exists in the database. This is useful in preventing an exception because of an attempt to drop a nonexisting table. IFCL allows executing SQL statements in the database with the action execsql. A single SQL command can be indicated directly within the IFCL syntax. Scripts (statement sequences) can be called from a separate SQL script file. The file system path has to be indicated in order to call an execution of an SQL file. In order to load analytic data into the database server, there are two possibilities. First, the SQL*Loader can be called, which is highly parameterized. Second, a simpler load mechanism can be invoked involving less code. The SQL*Loader has the advantage that it is extremely configurable by a control file. The drawback is that there is a large amount of code that needs to be written for simpler load tasks. Therefore, for simpler loads that do not involve transformation, the IFCL load utility is a faster method to load data into the database. The call to the SQL*Loader can be made from an IFCL file using the action sqlldr. In order to load data, one needs a data file containing the analytic data and an SQL*Loader control file for the configuration of the load process. Furthermore, one can indicate two SQL*Loader parameters, namely skip and errors. (For more details, consult the SQL*Loader documentation.) For simpler data loads, the IFCL program provides a data loading utility with the action load. The advantage is that the load action automatically creates or recreates a table for the data load based on an enumeration of attributes and their type, where 97 the type is either n (numerical) or c (categorical). In the enumerations, the number of the attribute and the number of its corresponding type must be equal: Type 3 corresponds to Attribute 3. The attribute names will correspond to table columns, so their syntax is syntactically restricted to valid column names. The load action must also be provided a path to a file containing the load data, and a delimiter that separates data cells in the text file. White spaces and semicolons cannot be used as delimiters because they are syntactical elements of the IFCL language, and an escape mechanism has not been implemented. In order to induce a univariate membership function for a single attribute to the target class, the IFCL action inducemf can be used. A database table is indicated on which the MFI will be based. One specific table column is defined as a target variable. As analytic variables, a specific table column can be defined, a list of columns can be enumerated, or all columns can be chosen for MFI. One of several different methods for MFI can be chosen (see Section 2.4.1). The output containing the induced membership function in SQL syntax is written to the indicated output file. This SQL classification template will be used later for univariate fuzzification of input variables. When more than one column is chosen as analytic variables, some of those variables can be skipped. The IFCL action will skip columns if they are enumerated in the corresponding action parameter. This means that those columns are not considered for MFI and they are also not copied into the fuzzified table. Alternatively, some columns can be left out by indicating the IFCL action to leave columns in the corresponding action parameters the way they are. This means that, for those columns, no membership function is induced, but those columns are copied into the fuzzified table in their original state. This can be useful to incorporate original key columns in a fuzzified table. In order to fuzzify the input variables using the membership functions obtained, the action classify can be invoked using the previously generated SQL classification template file. The classification action can also apply multivariate fuzzy classification templates obtained by the IFCL action aggregatemv, which aggregates multiple variables into a multivariate fuzzy class. Template files generated by the IFCL actions inducemf and aggregatemv contain an SQL query in which the input and output tables are parameterized. A data classification applies these two parameters (classified table and output table) to this query template stored in the template file. Data is read from an existing table, the classified table, and transformed using the induced membership functions, and the resulting transformation is written into the output table. In order to aggregate multiple variables into a multivariate fuzzy classification, the corresponding aggregation function can be computed 98 using the IFCL action aggregatemv. A base table is indicated on which the aggregation takes place. This is usually a transformed table containing inductively fuzzified attributes derived with the actions inducemf and classify, but for Weka regression algorithms (linreg, logreg, regtree) the aggregatemv action can also be applied to the original data. Again, the analytic variables and the target variable are indicated, which usually correspond to those variables used for MFI. As aggregation operator, either an unsupervised aggregation, such as a minimum (min), maximum (max), algebraic product (ap), algebraic sum (as), or average (avg), can be chosen or a supervised aggregation, such as a linear regression (linreg), a logistic regression (logreg), or a regression tree (regtree), can be calculated. The column that contains the aggregated membership value for each row must be given a column alias. The resulting aggregation function is written as a query template to an output file for which the path is defined in the action definition. This query template represents the multivariate model as a membership function in the target class. It can be used for data classification and prediction with the classify action. In order to evaluate correlations between columns, the IFCL action evaluate can be applied. This is useful to evaluate predictive performance, but also for attribute ranking and selection. Using IFCL, an attribute selection can be accomplished by ranking the inductively fuzzified analytic variables by their correlations with the target variable. In order to do this, for all columns of the table, a membership function to the target is induced in a training set with the inducemf action and the columns are transformed into a membership degree with the classify action. Then, the correlations of all fuzzified variables and the target variable can be evaluated with the evaluate action. 4.1.3 IFC-Filter for Weka The IFC-NLR data mining methodology introduced by Kaufmann and Meier (2009) has been implemented by Graf (2010) as a supervised attribute filter in the Weka machine learning workbench (Hall et al., 2009). The aim of this implementation is the application of IFC-NLR in a typical data mining process. This IFC-Filter allows the evaluation of the algorithm. Weka can be used for data mining in order to create predictive models based on customer data. The IFC-Filter can facilitate visualization of the association between customer characteristics and the target class, and it can improve predictive performance of customer scoring by transforming attribute values into inductive fuzzy membership degrees, as previously explained. Thus, it can perform MFI 99 and IAF. As shown in Figure 24, the Weka software provides data access to relational databases (RDB), comma separated values (CSV), and attribute relation file format (ARFF); classification and regression; and predictions. The IFC-Filter implements the functionality for MFI, IAF, and visualizations based on the calculated membership functions. Figure 24: Software architecture of the IFC-Filter. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 184. Copyright 2012 by Publisher. The IFC-Filter transforms sharp input data into membership degrees that indicate the inductive support for the conclusion that the data record belongs to the target class. In order to do so, first, membership functions are induced from data and optionally displayed to the screen. Then, these functions are applied to fuzzify the original attributes. Visualization and prediction based on the concepts of MFI and IAF are two main uses for the IFC-Filter software in the data mining process. In order to visualize associations between variables, inductive membership functions can be plotted with the method IFC-NLR described earlier. Thus, for every analytic variable, a function mapping from the variable’s domain into a degree of membership in the inductive fuzzy target class is displayed graphically. This plot gives intuitive 100 insights about associations between attribute values and the target class. For prediction, the transformation of crisp attribute values from a data source into inductive membership degrees can enhance the performance of existing classification algorithms. The IFC-Filter transforms the original attribute values into inductive membership indicating target class likelihood based on the original value. After that, a classical prediction algorithm, such as logistic regression, can be applied to the transformed data and to the original data in order to compare the performance of IAF. It is possible and likely that IAF improves prediction. When that is the case, IAF data transformation can be applied to huge data volumes in relational databases using the SQL code generated by the IFC-Filter. The software is available for download3 in source and binary format for researchers and practitioners for experimenting and practicing IFC. Figure 25: Knowledge flow with IFC-Filters. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 186. Copyright 2012 by Publisher. As proposed in the section on application of IFC to analytics (Section 3.1) the IFC-Filter can visualize membership functions that indicate target membership likelihoods. Figure 25 shows two screenshots of the membership function plots for the variables Duration and Checking status from the German Credit dataset. The IFC-Filter has the possibility to activate a frame containing a graphical 3 http://diuf.unifr.ch/is/ifc (accessed 11.2010) 101 illustration of the resulting membership functions. Each analytical variable is represented by a tab in this frame. The presentation of numerical analytical variables differs slightly from that of categorical analytical variables, because it presents a continuous membership function. As shown in Figure 26, the illustration of numerical analytical variables consists of four fields. The first field is a table containing the NLRs with the corresponding quantiles and average quantile values (AQVs). The second field shows a histogram containing the NLRs and their corresponding AQVs. The third field shows the membership function of the analytical variable. The fourth field shows the membership function in SQL syntax for this particular analytical variable, which can be used directly in a relational database for fuzzy classification of variables. The illustration of categorical analytical variables can be reduced to three fields. The first field is a table containing the NLR with the corresponding quantile and average value of the quantile. The second field shows a histogram containing the NLRs corresponding to the categorical values. The third field shows the membership function in SQL syntax. An additional tab, the SQL panel, contains the membership functions of all analytic variables in SQL. It displays a concatenation of the membership functions in SQL syntax for all analytical variables that have been input for MFI by the IFC-Filter. This database script can be applied in a database in order to transform large database tables into inductive degrees of membership in a target class. This can improve the predictive quality of multivariate models. 102 Figure 26: Visualization of membership functions with the IFC-Filter. Adapted from “Fuzzy Target Groups in Analytic Customer Relationship Management,” by M. Kaufmann and C. Graf, 2012, In A. Meier and L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing, p. 188. Copyright 2012 by Publisher. 103 104 4.2 Empirical Tests This section describes the experiments that have been conducted using the IFCL prototype (Section 4.1.2). Sixty real-world datasets (see the appendix in Section A.1) with categorical and numerical target attributes have been used to build predictive models with different parameters. The results of these experiments allow the answering of the following research questions: Which is a suitable MFI method for the application of fuzzy classification for prediction? Which aggregation operators are suitable for the multivariate combination of fuzzy classes? Can it be statistically supported that this method of IFC leads to higher predictive performance? In order to find the answers to these research questions, the aim is to inductively infer conclusions about inductive methods, which is called a meta-induction scheme. Thus, quantitative statistical methods are applied to the data on the performance of IFC using different parameters gained by running the automated prediction experiments. 4.2.1 Experiment Design The aim of these experiments was to determine the optimal parameters for a maximal predictive performance in order to gain insights about IFC. Using the scripting functionality of the IFCL prototype, a systematic benchmarking of predictive performance of different IFC algorithms was conducted. From the UCI Machine Learning Repository,4 Weka numeric datasets archive,5 and Weka UCI datasets archive,6 60 statistical datasets were used for systematic automated prediction experiments using different parameters. These datasets are listed in Table 9 and Table 10 in Section A.1. They contain real-world data from different processes that can be used for testing machine learning algorithms. They have been chosen by their tabular structure and by the type of the prediction target. Apart from that, they were chosen randomly. For categorical target classes, a Boolean target variable was extracted with a simple predicate (see Table 9, column Target). For numerical target classes, a Zadehan target variable was generated using inductive target fuzzification using both linear and percentile fuzzification (see Table 10). http://archive.ics.uci.edu/ml/ (accessed 03.2010) http://sourceforge.net/projects/WEKA/files/datasets/datasetsnumeric.jar/datasets-numeric.jar/download (accessed 08.2010) 6 http://sourceforge.net/projects/WEKA/files/datasets/datasets-UCI.jar/datasetsUCI.jar/download (accessed 08.2010) 4 5 105 The experimental parameters included 13 methods for IAF (Table 1), two types of base target classes (numerical or categorical), two types of inductive target fuzzification for numerical target variables (ITF; Formula 43 and Formula 44), and eight methods for multivariate aggregation (Table 6). As benchmarks, three state of the art prediction methods (regression trees, linear and logistic regression) were applied for prediction without prior IAF. Table 6 Different Aggregation Methods used for Prediction Experiments Abbreviation logreg regtree linreg avg min ap as max Description Logistic regression Regression trees Linear regression Average membership degree Minimum membership degree Algebraic product Algebraic sum Maximum membership degree In order to test predictive performance, for every combination of dataset, IAF method, ITF method, and aggregation method, the correlation of the prediction with the actual class value was computed. This has been done using a repeated hold-out cross-validation method: For every parameter combination, the correlation result was calculated 10 times with different random splits of 66.7% training data versus 33.3% test data, and the results were averaged. Prediction quality of model instances was measured with the Pearson correlation coefficient (Weisstein, 2010b) between the (Boolean or Zadehan) predictive and target variables in order to measure the proximity of predictions and targets in two-dimensional space. Significance of prediction correlation for experiment parameters θ and θ was estimated with a non-parametric statistical test, the onesided Wilcoxon Signed-Rank (WSR) test (Weisstein, 2012a), where the null hypothesis, H0, supposes that the average prediction correlation, γ(θ1) := avg(corr(Y’,Y)| θ1), of experiments with parameter θ1 is smaller or equal to γ(θ2), where Y is the target variable, Y’ is the predictive model output, and both Y and Y’ are Boolean or Zadehan variables with a range of truth values within [0,1]. The statistical test hypothesis is formalized by H0: γ(θ1) ≤ γ(θ2). If the corresponding p-value returned by WSR is significantly small, it is likely that θ1 performs better than θ 2. In addition to their statistical meaningfulness, the reason for using these measures, Pearson correlation and WSR, is that both are 1 2 , 106 precompiled functions in certain database systems, and thus computed very efficiently for large numbers of experiments. 4.2.2 Results This subsection describes the results of the meta-inductive prediction experiments. It describes the findings about the best performing methods for MFI and multivariate class aggregation. Additionally, it evaluates which dataset parameters correlate with the performance improvement gained by IAF. In order to evaluate the best performing aggregation method, the average correlation between prediction and actual value of the target variable has been calculated for every aggregation method for categorical as well as numerical targets over all datasets and MFI methods. In Figure 27, the result of this evaluation for categorical variables is shown. Clearly, logistic regression is the best aggregation method for binary targets with an average correlation of 0.71.7 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 Average Correlation logreg regtree linreg avg min ap as max 0.71 0.69 0.69 0.64 0.53 0.52 0.45 0.45 Figure 27: Average prediction correlation for different aggregation methods for binary targets. A Wilcoxon Signed Rank (WSR)8 test returned a p-value of 0.0115 for the hypothesis that the average correlation between prediction and target for logistic regression is smaller than or equal to that of regression trees. This means that, with a high probability, the predictive performance of logistic regression is higher than that of regression tree. The p-value for the difference in performance between logistic and linear regression is even smaller: 0.00105. Therefore, 7 See 8 Query 2 in Chapter A See Query 3 and data in Table 11 in Chapter A 107 logistic regression performs significantly better as an aggregation method for prediction than the other methods. As illustrated by Figure 28, the best performing method of multivariate class aggregation for numerical targets is regression trees with an average correlation of 0.66.9 The WSR significance test10 showed, with a p-value of 1.30152E-06, that the average correlation of regression trees is smaller than or equal to the average correlation of linear regression. Again, this result is, statistically, very significant. 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 Average Correlation regtree linreg avg logreg ap max min as 0.66 0.58 0.52 0.52 0.45 0.42 0.41 0.39 Figure 28: Average prediction correlation for different aggregation methods for fuzzified numerical targets. In order to determine the best performing combination of membership induction method for IAF and multivariate aggregation method, the average correlation between prediction and actual value of the target variable has been calculated for every combination. As shown in Figure 29, the best combination for binary target variables is logistic regression, with an NLR as MFI method for IAF, with an average prediction correlation of 0.71.11 The best benchmark method without IAF is regression trees on rank 24 of 107 with an average prediction correlation of 0.699. The top 23 combinations all involved a form of IAF. The best performing method was to combine an IAF using NLR with a logistic regression. The WSR significance test returned a p-value of 0.180012 for the hypothesis that the average correlation between prediction and target for logistic regression with NLR is smaller than or equal to that of logistic regression with IAF using NLD. This means that, for logistic See Query 7 in Chapter A See Query 8 and data in Table 12 in Chapter A 11 See Query 4 in Chapter A 12 See Query 5 and data in Table 14 in Chapter A 9 10 108 regression with IAF, NLR does not perform significantly better than NLD. However, the WSR showed, with a p-value of 0.04888, that a logistic regression with IAF using NLR performs equally or less well compared to regression trees without IAF. Consequently, a significant performance improvement of logistic regression models can be achieved, on average, with IAF. 0.720 0.710 0.700 0.690 0.680 0.670 0.660 0.650 0.640 Average Correlation logreg(nlr): Rank 1 logreg(nld): Rank 2 logreg(cp): Rank 3 0.712 0.710 0.710 regtree(no logreg(npd): logreg(npdu) IAF): Rank Rank 4 : Rank 5 24 0.709 0.708 0.699 logreg(no IAF): Rank 30 linreg(no IAF): Rank 44 0.688 0.669 Figure 29: Average prediction correlation for combinations of membership function induction and aggregation methods for binary targets. 0.69 0.68 0.67 0.66 0.65 0.64 0.63 0.62 0.61 Average Correlation Linear Percentile 0.69 0.64 Figure 30: Average prediction correlation of predictions with linear versus percentile ITF, given regression trees as the aggregation method. In order to apply MFI methods from Section 2.4, numerical target variables need to be fuzzified. As proposed in this section, two approaches for inductive target fuzzification (ITF) have been proposed. Both were tested and compared in the experiments. In Figure 30, one can see that the average correlation between prediction and target was much higher for linear ITF.13 The WSR significance test14 returned a pvalue of 0.0000289. Thus, linear target fuzzification performs significantly better for predicting fuzzy classes than percentile target fuzzification. 13 14 See Query 9 in Chapter A See Query 10 and data in Table 13 in Chapter A 109 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 Average Correlation regtree(nld): regtree(nldu): Rank 1 Rank 2 0.697 0.695 regtree(nc): Rank 3 0.690 regtree(npdu) regtree(npr): : Rank 4 Rank 5 0.690 regtree(no IAF): Rank 14 linreg(no IAF): Rank 25 logreg(no IAF): Rank 38 0.621 0.591 0.569 0.689 Figure 31: Average prediction correlation for combinations of membership function induction and aggregation methods for fuzzified numerical targets. In Figure 31, the results for experiments with combinations of different IAF and aggregation methods for data with numerical targets15 are shown, given linear ITF of numerical target variables. In this case, IAF with NLD, together with a supervised aggregation using Weka M5P regression trees (regtree), is the best method with an average correlation of 0.697. The difference between this combination and the benchmark prediction using regression trees without IAF is significant with a WSR p-value16 of 0.0026. There are several parameters that distinguish datasets. These include number of variables, number and percentage of categorical or numerical variables, number of rows, ratio between number of rows and number of columns, and the linearity of the prediction target. This linearity was measured by the average correlation of the predictions of a linear regression benchmark model, YLR, with the target variable Y: Linearity := avg(corr(YLR,Y)). The question is whether any of these parameters correlate significantly with the improvement of predictive performance of regression models with IAF. In order to answer this question for binary target predictions, correlation and level of significance of correlation were calculated, using a non-parametric correlation test, between the parameters and the relative improvement of the predictive performance of logistic regression if the analytic variables are fuzzified using NLR versus logistic regression without IAF.17 As the statistical test, Spearman’s rho (Weisstein, 2012b), with a z-score significance test was applied, which exists as a precompiled function in the database.18 15 16 See Query 11 in Chapter A See Query 12 and data in Table 15 in Chapter A See Query 14 in Chapter A 18 http://docs.oracle.com/cd/B28359_01/server.111/b28286/functions029.htm (6/2012) 17 110 Table 7 Correlation and p-Value of Dataset Parameters and Improvement of Logistic Regression with IAF for Binary Target Variables Parameter Correlation with Improvement with IAF p-value Number of categorical variables 0.335516011 0.069899788 Number of numeric variables -0.097871137 0.60688689 Percentage of categorical variables 0.206624854 0.273290146 Percentage of numeric variables -0.206624854 0.273290146 Number of columns Average correlation of the prediction of linear regression 0.228400654 0.224758897 -0.50567297 0.004362976 Number of rows (n) Ratio number of rows / number of columns 0.089665147 0.637498617 -0.001557286 0.993483625 Improvement of logistic regression by IAF NLR 0.3 0.2 0.1 0 -0.1 0 0.2 0.4 0.6 0.8 1 1.2 -0.2 -0.3 Average correlation of predictions of linear regression Figure 32: Relationship between target linearity and IAF benefit for binary target variables. As shown in Table 7, the correlation between target linearity and improvement by NLR fuzzification is negative and significant (p < 0.005). This means that the less linear the connection between the analytic variables and the target variable is, the more a Boolean prediction using logistic regression can be improved with an IAF using NLRs. In Figure 32, this association is visualized in a scatter plot.19 19 See Query 15 and data in Table 16 in Chapter A 111 For fuzzified numerical prediction targets, Table 8 shows similar results.20 The improvement of regression trees with IAF NLD correlates negatively and significantly with the linearity of association between analytic and target variables. The lower the performance of linear regression, the more a predictive regression tree model can benefit from IAF using NLD. This association is illustrated21 in Figure 33. Table 8 Correlation and p-Value of Dataset Parameters and Improvement of Regression Trees with IAF for Zadean Target Variables Parameter Correlation w/ Improvement by IAF p-value Number of categorical variables 0.121461217 0.522578264 Number of numeric variables 0.03610839 0.84975457 Percentage of categorical variables 0.036497846 0.848152597 Percentage of numeric variables -0.036497846 0.848152597 Number of columns Average correlation of the prediction of linear regression 0.195910119 0.299481536 -0.583537264 0.000712151 Number of rows (n) Ratio number of rows / number of columns 0.081432863 0.668807278 -0.043381535 0.81994093 Improvement of regression trees by IAF NLD 1 0.8 0.6 0.4 0.2 0 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Average correlation of predictions of linear regression Figure 33: Relationship between target linearity and IAF benefit for fuzzified numerical target variables. 20 21 See Query 14 in Chapter A See Query 15 and data in Table 17 in Chapter A 112 “Every species is vague, every term goes cloudy at its edges, and so in my way of thinking, relentless logic is only another name for stupidity—for a sort of intellectual pigheadedness. If you push a philosophical or metaphysical enquiry through a series of valid syllogisms—never committing any generally recognized fallacy—you nevertheless leave behind you at each step a certain rubbing and marginal loss of objective truth and you get deflections that are difficult to trace, at each phase in the process. Every species waggles about in its definition, every tool is a little loose in its handle, every scale has its individual.” (H. G. Wells, 1908) 113 114 5 Precisiating Fuzziness by Induction 5.1 Summary of Contribution A methodology for IFC has been introduced. This method allows automated derivation of membership functions from data. The idea is to translate sampled probabilities into fuzzy restrictions. Thus, a conceptual switch from the probabilistic to the possibilistic view is applied in order to reason about the data in terms of fuzzy logic. Business applications of MFI have been proposed in the area of marketing analytics in the fields of customer and product analytics, fuzzy target groups, and integrated analytics for individual marketing. A prototype inductive fuzzy classification language (IFCL) has been implemented and applied in experiments with 60 datasets with categorical and numerical target variables in order to evaluate performance improvements by application of the proposed methodology. Seven questions (Section 1.1) have guided the research conducted for this thesis. The following list summarizes the answers that this thesis proposes: o What is the theoretical basis of inductive fuzzy classification (IFC) and what is its relation to inductive logic? Inductive logic is defined as “a system of evidential support that extends deductive logic to less-than-certain inferences” (Hawthorne, 2008, “”Inductive Logic,” paragraph 1). Fuzzy classes are fuzzy sets (Zadeh 1965) defined by propositional functions (Russell, 1919) with a gradual truth-value (i.e., mapping to fuzzy propositions; Zadeh 1975). Their membership functions are thus inductive if their defining propositional functions are based on less-than-certain inferences (Section 2.4). The inductive fuzzy class y’ = { i ∈U | i is likely a member of y } is defined by a predictive model, µ, for membership in class y, where “likely” is a fuzzy set of hypotheses. The truth function of the fuzzy propositional function L(i,y) := “i is likely a member of y” is a fuzzy restriction (Zadeh, 1975a) on U defined by µy‘. o How can membership functions be derived inductively from data? Membership functions can be derived using ratios or differences of sampled conditional probabilities. Different formulas have been proposed in Section 2.4.1 and tested in Section 4.2 Normalized differences and ratios of likelihoods have shown to provide optimal membership degrees. For numeric variables, a continuous membership function can be defined by piecewise 115 linear function approximation. The numeric variable is discretized using quantiles and a linear function is approximated to the average values and the membership degrees for every adjacent pair of quantiles. For numerical target variables, normalization leads to a linear fuzzification with a membership degree to a fuzzy set that can be used to calculate likelihoods of fuzzy hypotheses. o How can a business enterprise apply IFC in practice? o How can the proposed methods be implemented in a computer program? Inductive fuzzy classes can support marketing analytics by providing selection, visualization, and prediction in the fields of customer analytics, product analytics, fuzzy target selection, and integrated analytics for individual marketing. These techniques improve decision support for marketing by generating visual, intuitive profiles and improved predictive models, which can enhance sales (Section 3.2). A case study shows a real-world application with a financial service provider (Section 3.2.5). The IFCL prototype uses a database server for data storage, access, and computation (Section 4.1.2). The analytic process is configured in a text file using a synthetic language called IFCL. This language allows definition of all computations necessary for MFI, IAF, target fuzzification, attribute selection, supervised multivariate aggregation, fuzzy classification of data using predictive models, and model evaluation. Additionally, the IFCL prototype allows experiments on the proposed methods for MFI in batch mode. Furthermore, student works have shown implementations of IFC with graphical user interfaces. o How is IFC optimally applied for prediction? 116 First, data is prepared and relevant attributes are selected. Then, each attribute is inductively fuzzified, so that the membership degree indicates the degree of membership in the target class. Finally, those multiple membership degrees are aggregated into a single membership degree with a multivariate regression, which represents the inductive class prediction. Optimally, for binary targets, attributes are fuzzified with NLRs and aggregated using logistic regression. For numerical targets, the target attribute is fuzzified using a linear normalization, the dependent attributes are fuzzified using NLD and then aggregated using regression trees (Section 3.1.3). o Which aggregation methods are optimal for the multivariate combination of fuzzy classes for prediction? In the experimental phase, product, algebraic sum, minimum, maximum, average, logistic regression, linear regression, and regression trees have been tested for their predictive performance using the prototype software IFCL. For the aggregation of fuzzified attributes into a single prediction value for data records, the experiments showed that, on average, a supervised aggregation using multivariate regression models (logistic regression for Boolean targets and regression trees for Zadehan targets) are optimal (Section 4.2.2). o Can it be statistically supported that the proposed method of IFC improves predictive performance? Inductive attribute fuzzification using NLRs significantly improved average prediction results of classical regression models (Section 4.2.2). In fact, this improvement significantly and negatively correlated with the linearity of association between the data and the target class. Thus, the less linear this association is, demonstrated by a smaller correlation of the predictions of a linear regression model with the actual values, the more a multivariate model can be improved by transforming attributes into an inductive membership degree using the methods proposed in this thesis. 5.2 Discussion of Results Human perception and cognition is blurred by uncertainty and fuzziness. Concepts in our minds cluster perceptions into categories that cannot always be sharply distinguished. In some cases, binary classifications are feasible, such as detecting the presence or absence of light. Nevertheless, in many cases, our classification relies on fuzzy terms such as bright. Some concepts are gradual and can be ordered by their degree. In our previous example, brightness is a perception that is ordinal because some visions are brighter than others. By mapping an ordinal scale to Zadehan truth values, a membership function can be defined in order to clearly precisiate the semantics of fuzzy words, concepts, or terms, in the sense of Zadeh (2008). In fact, fuzzy set theory has not yet been recognized enough for its most valuable contribution: It provides a tool for assigning precise definitions to fuzzy statements! In Section 2.2.1, the sorites paradox, an age-old philosophical puzzle, was easily resolved by application of fuzzy set theory; such is the power of Zadehan logic. 117 Conditioned by fuzziness of cognition, the human mind infers and reasons based on fuzzy ideas, and often, successfully so. Induction is an uncertain type of reasoning, and thus classifies inferences into a fuzzy set: the fuzzy class of inductively supported conclusions. On one hand, induction is always uncertain, and even the most unlikely event can actually happen, even if induction indicates the contrary, such as the observation of a black swan (Taleb, 2007). On the other hand, how can we learn if not from past experience? Good inductive inferences are reliable more often than not. Furthermore, gaining insights by evaluating likelihoods of hypotheses using objective data is a cornerstone of empiricism. Thus, induction and its support measure, the likelihood ratio, can amplify past experience by leading to better and more reliable inductive inferences. While many concepts are fuzzy, induction provides a tool for precisiation. The transformation of likelihood ratios into membership functions to fuzzy sets by normalization is a tool for reasoning about uncertainty by amplification of past experience, where the fuzziness of class membership is precisiated by likelihood in the data. Accordingly, MFI with NLRs applies the law of likelihood and fuzzy set theory to make inductive inferences and to represent them as membership functions to fuzzy classes. There are three main advantages of applying likelihood-based MFI in analytics. First, a membership function makes hidden meaningful associations between attributes in the data explicit. It allows visualizing these associations in order to provide insight into automated inductive inferences. Second, a two-dimensional plot of these membership functions is easily interpretable by humans. Those visualizations are precisiations of inductive associations. Third, these membership functions are models that allow for a better quality of predictions and thus more accurate decisions. Likelihood-based precisiation of fuzziness can be applied to marketing analytics. Data analysis on customers, products, and transactions can be applied for customer relationship management (CRM) and individual marketing. This kind of analytic CRM provides decision support to company representatives in marketing channels with direct customer contact for offers, underwriting, sales, communications, gifts, and benefits. For the analytics process, it is proposed to classify customers inductively and fuzzily in order to target individual customers with the right relationship activities. First, the customers are classified by assigning them to different targets (classes), such as product offers or promotional gifts. Second, this classification is done inductively by analyzing present data as evidence for or against target membership based on customer characteristics. Third, the assignment of individual 118 customers to targets is fuzzy if the resulting classes have a gradual membership function. Finally, those inductive membership degrees to fuzzy targets provide decision support by indicating the likelihood of desired response in customer activities based on customer characteristics. These membership functions can be used in analytics to select the most important characteristics regarding a target, visualize the relationship between characteristics and a target, and predict target membership for individual customers. The proposed method for MFI is based on calculating likelihoods of target membership in existing data. These likelihoods are turned into a fuzzy set membership function by comparison and normalization using an NLR or an NLD. This algorithm has been implemented in the Weka machine learning workbench as a supervised attribute filter, which inductively fuzzifies sharp data and visualizes the induced membership functions. This software can be downloaded for experimental purposes. Sharp binary classification and segmentation lead to anomalies of threshold, and fuzzy classification provides a solution with gradual distinctions that can be applied to CRM, as pointed out by Werro (2008). Gradual customer partitioning has the advantage that the size of target groups can be varied by choosing a threshold according to conditions such as budget or profitability. Of course, scoring methods for CRM are state of the art; additionally, the present approach suggests that fuzzy logic is the appropriate tool for reasoning about CRM targets because they are essentially fuzzy concepts. Scoring provides a means for precisiation of this fuzziness, which provides numerical membership degrees of customers in fuzzy target groups. The advantages of MFI for fuzzy classification of customers are efficiency and precision. First, membership functions to CRM targets can be derived automatically, which is an efficient way of definition. And second, those inferred membership functions are precise because they indicate an objective, measurable degree of likelihood for target membership. Thus, the semantics of the membership function is clearly grounded. 5.3 Outlook and Further Research The first point of further research is further development of the IFCL software. This prototype has fulfilled its purpose in answering the research questions of this thesis, but it lacks many features to make it suitable for productive use. Further development of this software could address the following points: 119 o refactoring of the source code with a clear object oriented design methodology, o adding a graphical user interface, o redesign of certain syntactical elements that have turned out to be inconvenient, o adding a syntax checker for the IFCL language, o o o adding support for all databases with a JDBC interface, handling null values in the data, and allowing multiclass predictions. As a second research direction, the field of application of IFC will be diversified. So far, applications have been proposed and evaluated in the area of marketing. Providing applications for web semantics analytics and biomedical analytics is envisioned. Third, the logical framework of fuzzy classification by induction is intended to be generalized. “Inductive approximate reasoning” can be defined as a methodology for the application of logic with gradual truthvalues using less-than-certain inferences. Also, the epistemological problem of induction will be tackled, because the likelihood ratio as a measure of support is distorted for large values of 𝑝𝑝(𝐻𝐻 ∩ 𝐸𝐸)/𝑝𝑝(𝐻𝐻), which is exemplified by the paradox of the raven (Vickers, 2009): Is the conclusion that a Raven (𝐸𝐸) is black (𝐻𝐻) truly supported by countless observations of white clouds (𝐻𝐻 ∩ 𝐸𝐸)? At least, it increases the likelihood ratio by decreasing 𝑝𝑝(𝐸𝐸|𝐻𝐻); perhaps, there might be other useful measures of inductive support. The most important suggestion for further research is the investigation of multidimensional MFI. This thesis is concentrated on induction of membership functions for single variables and their multivariate combination. As such, not all possible combinations of all variables are evaluated (see Section 2.2.2). In a multivariate model, a given attribute value can be assigned the same weight for every combination with values of other variables, for example, in a linear regression. In a multidimensional model, the same attribute value has different weights in different contexts, because every vector of a possible attribute value combination is assigned an individual weight or membership degree. Therefore, it might be interesting for future research to investigate the properties of multidimensional MFI. Instead of a multivariate combination of univariate membership functions, a multidimensional membership function assigns a membership degree to vectors of attribute value combinations, and the influence of an attribute value is not only dependent on itself, but also on the context of the other attribute’s values. Thus, predictive performance could be 120 improved because the interdependency of attributes can be modeled. This could resolve the “watermelon paradox” stated by Elkan (1993, p. 703), because the inductive membership function to the concept of watermelon is measured in two dimensions, with “red inside” as a first coordinate and “green outside” as a second. The following research questions can be of interest: How are multidimensional membership functions induced from data? How are multidimensional membership degrees computed for vectors of categorical values, for vectors of numerical values, and for vectors of numerical and categorical values? How are continuous functions derived from those discrete degrees? Can multidimensional membership functions improve prediction? In what cases is it advisable to work with multidimensional membership functions? A possibility is to use n-linear interpolation using n-simplexes (Danielsson & Malmquist, 1992), which are hyper-triangles or generalizations of triangles in multidimensional space. Using simplexes, for every n-tuple of possible dimension value combinations, a corresponding likelihood measure (such as an NLR) could be sampled. Then, n different membership degree samples would represent the edges of a simplex. This hyper-triangle would represent a piece of the continuous multidimensional membership function, which could be approximated by combining simplexes to piecewise linear hyper surfaces. 121 122 6 Literature References Backus, J. W., Bauer, F. L., Green, J., Katz, C., McCarthy, J., Perlis, A. J., et al. (1960). Report on the algorithmic language ALGOL 60. (P. Naur, Ed.) Communications of the Association for Computing Machinery , 3 (5), pp. 299-314. Bellmann, R. E., & Zadeh, L. A. (1977). Local and Fuzzy Logics. In J. M. Dunn, & G. Epstein (Eds.), Modern Uses of Multiple-Valued Logic (pp. 283-335). Kluwer Academic Publishers. Boole, G. (1847). The Matematical Analysis of Logic, being an Essay towards a Calculus of Deductive Reasoning. London: George Bell. Carrol, L. (1916). Alice's Adventures in Wonderland. New York: Sam'l Gabriel Sons & Company. Chmielecki, A. (1998). What is Information? Twentieth World Congress of Philosophy. Boston, Massachusetts USA. Danielsson, R., & Malmquist, G. (1992). Multi-Dimensional Simplex Interpolation - An Approach to Local Models for Prediction. Chemometrics and Intelligent Laboratory Systems , 14, pp. 115128. Davenport, T. H. (2006). Competing on Analytics. Harvard Business Review . Del Amo, A., Montero, J., & Cutello, V. (1999). On the Principles of Fuzzy Classification. Proc. 18th North American Fuzzy Information Processing Society Annual Conf, (pp. 675 – 679). Dianhui, W., Dillon, T., & Chang, E. J. (2001). A Data Mining Approach for Fuzzy Classification Rule Generation. IFSA World Congress and 20th NAFIPS International Conference, (pp. 2960-2964). Dubois, D. J., & Prade, H. (1980). Fuzzy Sets and Systems. Theory and Applications. New York: Academic Press. Elkan, C. (1993). The Paradoxical Success of Fuzzy Logic. Proceedings of the Eleventh National Conference on Artificial Intelligence, (pp. 698–703). Washington D.C. 123 Glubrecht, J.-M., Oberschelp, A., & Todt, G. (1983). Klassenlogik. Mannheim: Wissenschaftsverlag. Gluchowski, P., Gabriel, R., & Dittmar, C. (2008). Management Support Systeme und Business Intelligence. Berlin: Springer Verlag. Graf, C. (2010). Erweiterung des Data-Mining-Softwarepakets WEKA um induktive unscharfe Klassifikation (Master's Thesis). Department of Informatics, University of Fribourg, Switzerland. Greene, B. (2011). The Hidden Reality. Parallel Universes And The Deep Laws Of The Cosmos. New York: Random House. Hüllermeier, E. (2005). Fuzzy Methods in Machine Learning and Data Mining: Status and Prospects. Fuzzy Sets and Systems , pp. 387406. Hájek, A. (2009). Interpretations of Probability. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy. Stanford: The Metaphysics Research Lab, Stanford University. Hajek, P. (2006). Fuzzy Logic. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy. Stanford: The Metaphysics Research Lab, Stanford University. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations , 11 (1). Hawthorne, J. (2008). Inductive Logic. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy. Stanford: The Metaphysics Research Lab, Stanford University. Hu, Y., Chen, R., & Tzeng, G. (2003). Finding Fuzzy Classification Rules using Data Mining Techniques. Pattern recognition letters , 24 (1-3), pp. 509-514. Hyde, D. (2008). Sorites Paradox. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Stanford : The Metaphysics Research Lab, Stanford University. Inmon, W. H. (2005). Building the Data Warehouse (fourth ed.). New York: Wiley Publishing. 124 John, E. G. (1998). Simplified Curve Fitting Using Speadsheet Add-ins. International Journal of Engineering Education , 14 (5), pp. 375380. Joyce, J. (2003). Bayes' Theorem. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy. Stanford: The Metaphysics Research Lab, Stanford University. Küsters, U. (2001). Data Mining Methoden: Einordung und Überblick. In H. Hippner, U. Küsters, M. Meyer, & K. Wilde (Eds.), Handbuch Data Mining im Marketing - Knowledge Discovery in Marketing Databases (pp. 95-130). Wiesbaden: vieweg GABLER. Kaniza, G. (1976). Subjective contours. Scientific American (234), pp. 48-52. Kaufmann, M. A. (2009). An Inductive Approach to Fuzzy Marketing Analytics. In M. H. Hamza (Ed.), Proceedings of the 13th IASTED international conference on Artificial Intelligence and Soft Computing. Palma, Spain. Kaufmann, M. A. (2008). Inductive Fuzzy Classification. Thesis Proposal . Internal Working Paper, University of Fribourg, Switzerland. Kaufmann, M. A., & Graf, C. (2012). Fuzzy Target Groups in Analytic Customer Relationship Management. In A. Meier, & L. Donzé (Eds.), Fuzzy Methods for Customer Relationship Management and Marketing - Applications and Classification. London: IGI Global. Kaufmann, M. A., & Meier, A. (2009). An Inductive Fuzzy Classification Approach Applied To Individual Marketing. Proc. 28th North American Fuzzy Information Processing Society Annual Conf. Kohavi, R., Rothleder, N. J., & Simoudis, E. (2002). Emerging Trends in Business Analytics. Communications of the ACM , 45 (8), pp. 4548. Mayer, P. (2010). Implementation of an Inductive Fuzzy Classification Interpreter. Master's Thesis. University of Fribourg. Meier, A., Schindler, G., & Werro, N. (2008). Fuzzy classification on relational databases. In M. Galindo (Ed.), Handbook of Research 125 on Fuzzy Information Processing in Databases (Vol. II, pp. 586614). London: Information Science Reference. merriam-webster.com. (2012a). Analytics. Retrieved from Merriam Webster Online Dictionnary: http://www.merriamwebster.com/dictionary/analytics merriam-webster.com. (2012b). Heap. Retrieved from Merriam Webster Online Dictionnary: http://www.merriamwebster.com/dictionary/heap Mill, J. S. (1843). A System of Logic. London: John W. Parker, West Strand. Oberschelp, A. (1994). Allgemeine Mengenlehre. Mannheim: Wissenschaftsverlag. Oesterle, H., Becker, J., Frank, U., Hess, T., Karagiannis, D., Krcmar, H., et al. (2010). Memorandum zur gestaltungsorientierten Wirtschaftsinformatik. Multikonferenz Wirtschaftsinformatik (MKWI). Göttingen. Precht, R. D. (2007). Wer bin ich - und wenn ja wie viele? Eine philosophische Reise. München: Goldmann. Roubos, J. A., Setnes, M., & Abonyi, J. (2003). Learning Fuzzy Classification Rules from Labeled Data. Information Sciences— Informatics and Computer Science: An International Journal , Volume 150 (Issue 1-2). Russell, B. (1919). Introduction to Mathematical Philosophy. London: George Allen & Unwin, Ltd. Setnes, M., Kaymak, H., & van Nauta Lemke, H. R. (1998). Fuzzy Target Selection in Direct Marketing. Proc. of the IEEE/IAFE/INFORMS Conf. On Computational Intelligence for Financial Engineering (CIFEr). Shannon, C. (1948). A Mathematical Theory of Communication. The Bell Systems Technical Journal (Vol. 27), pp. 379–423, 623–656. Sorensen, R. (2008). Vagueness. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Stanford: The Metaphysics Research Lab, Stanford University. 126 Sousa, J. M., Kaymak, U., & Madeira, S. (2002). A Comparative Study of Fuzzy Target Selection Methods in Direct Marketing. Proceedings of the 2002 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE '02). Spais, G., & Veloutsou, C. (2005). Marketing Analytics: Managing Incomplete Information in Consumer Markets and the Contribution of Mathematics to the Accountability of Marketing Decisions. Southeuropean Review of Business Finance & Accounting (3), pp. 127-150. Taleb, N. N. (2007). The Black Swan. The Impact of the Highly Improbable. New York: Random House. Turban, E., Aronson, J. E., Liang, T.-P., & Sharda, R. (2007). Decision Support and Business Intelligence Systems. New Jersey: Pearson Education. Vickers, J. (2009). The Problem of Induction. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy. Stanford: The Metaphysics Research Lab, Stanford University. Wang, L., & Mendel, J. (1992). Generating Fuzzy Rules by Learning from Examples. IEEE Transactions on Systems, Man and Cybernetics (Issue 6), pp. 1414 – 1427. Weisstein, E. W. (2010a). Conditional Probability. Retrieved from MathWorld--A Wolfram Web Resource: http://mathworld.wolfram.com/ConditionalProbability.html Weisstein, E. W. (2010b). Correlation Coefficient. Retrieved from MathWorld--A Wolfram Web Resource: http://mathworld.wolfram.com/CorrelationCoefficient.html Weisstein, E. W. (2012b). Spearman Rank Correlation Coefficient. From MathWorld--A Wolfram Web Resource: http://mathworld.wolfram.com/SpearmanRankCorrelationCoeffic ient.html Weisstein, E. W. (2012a). Wilcoxon Signed Rank Test. From MathWorld--A Wolfram Web Resource.: http://mathworld.wolfram.com/WilcoxonSignedRankTest.html Wells, H. G. (1908). First and Last Things; A Confession Of Faith And Rule Of Life. New York: G. P. Putnam's Sons. 127 Werro, N. (2008). Fuzzy Classification of Online Customers. PhD. Dissertation, University of Fribourg, Switzerland. Witten, I. H., & Frank, E. (2005). Data Mining, Practical Machine Learning Tools and Techniques (Second ed.). San Francisco: Morgan Kaufmann Publishers. Zadeh, L. A. (1965). Fuzzy Sets. Information and Control (8), pp. 338– 353. Zadeh, L. A. (1975a). Calculus of Fuzzy Restrictions. In L. A. Zadeh, K.S. Fu, K. Tanaka, & M. Shimura (Eds.), Fuzzy sets and Their Applications to Cognitive and Decision Processes. New York: Academic Press. Zadeh, L. A. (1975b). The Concept of a Linguistic Variable and its Application to Approximate Reasoning. Information Sciences , I (8), pp. 199–251. Zadeh, L. A. (2008). Is There a Need for Fuzzy Logic. Information Sciences , 178 (13). Zimmermann, H. J. (1997). Fuzzy Data Analysis. In O. Kaynak, L. A. Zadeh, B. Turksen, & I. J. Rudas (Eds.), Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications. Berlin: Springer-Verlag. Zimmermann, H. J. (2000). Practical Applications of Fuzzy Technologies. Berlin: Springer-Verlag. Zimmermann, H. J., & Zisno, P. (1980). Latent Connectives in Human Decision Making. Fuzzy Sets and Systems , 37-51. 128 A. Appendix A.1. Datasets used in the Experiments Table 9 Data Sets with Categorical Target Variable, Transformed into a Binary Target Name Target Filter abalone rings <= 9 No adult class = ' >50K' No anneal class <> '3' yes balance class = 'L' No bands band_type = 'band' Yes breastw car class = 'benign' acceptable <> 'unacc' cmc contraceptive_met hod <> '1' No credit a16 = '+' No creditg class = 'good' No No No 22 Records Link (accessed 08.2010) http://archive.ics.uci.edu/ml/d 2743 atasets/Abalone http://archive.ics.uci.edu/ml/d 21623 atasets/Adult http://archive.ics.uci.edu/ml/d 597 atasets/Annealing http://archive.ics.uci.edu/ml/d 404 atasets/Balance+Scale http://archive.ics.uci.edu/ml/d 237 atasets/Cylinder+Bands http://archive.ics.uci.edu/ml/d atasets/Breast+Cancer+Wisc 437 onsin+(Diagnostic) http://archive.ics.uci.edu/ml/d 1145 atasets/Car+Evaluation http://archive.ics.uci.edu/ml/d atasets/Contraceptive+Metho 1002 d+Choice http://archive.ics.uci.edu/ml/d 431 atasets/Credit+Approval http://archive.ics.uci.edu/ml/d atasets/Statlog+(German+Cr 679 edit+Data) http://archive.ics.uci.edu/ml/d atasets/Pima+Indians+Diabe 489 tes http://archive.ics.uci.edu/ml/d 87 atasets/Acute+Inflammation glass class <> 'tested_positive' inflammation = 'yes' type in ( '''build wind float''', '''build wind nonfloat''', '''vehic wind float''', '''vehic wind nonfloat''') heart y= '2' No hepatitis class = 'LIVE' No http://archive.ics.uci.edu/ml/d 157 atasets/Glass+Identification http://archive.ics.uci.edu/ml/d 188 atasets/Statlog+(Heart) http://archive.ics.uci.edu/ml/d 59 atasets/Hepatitis horse outcome = '1' Yes 124 http://archive.ics.uci.edu/ml/d diabetes diagnosis 22 No No No See Section A.1.1 and A.1.2 129 atasets/Horse+Colic hypothyroid class <> 'negative' Yes internetads y=1 No ionosphere class = 'g' No iris class = 'Iris-setosa' No letter No lymph class = 'F' class = 'metastases' segment class = 'sky' No sonar class = 'Mine' No spectf y=1 Yes vehicle class = 'saab' No waveform class <> 1 No wine class = 2 No wisconsin diagnosis = 'B' Yes yeast cls = 'CYT' Yes 130 No http://archive.ics.uci.edu/ml/d 1800 atasets/Thyroid+Disease http://archive.ics.uci.edu/ml/d atasets/Internet+Advertisem 2153 ents http://archive.ics.uci.edu/ml/d 230 atasets/Ionosphere http://archive.ics.uci.edu/ml/d 99 atasets/Iris http://archive.ics.uci.edu/ml/d 13221 atasets/Letter+Recognition http://archive.ics.uci.edu/ml/d 99 atasets/Lymphography http://archive.ics.uci.edu/ml/d atasets/Statlog+(Image+Seg 1490 mentation) http://archive.ics.uci.edu/ml/d atasets/Connectionist+Bench 143 +(Sonar,+Mines+vs.+Rocks) http://archive.ics.uci.edu/ml/d 177 atasets/SPECTF+Heart http://archive.ics.uci.edu/ml/d atasets/Statlog+(Vehicle+Sil 552 houettes) http://archive.ics.uci.edu/ml/d atasets/Waveform+Database 3342 +Generator+(Version+2) http://archive.ics.uci.edu/ml/d 110 atasets/Wine http://archive.ics.uci.edu/ml/d atasets/Breast+Cancer+Wisc 391 onsin+(Diagnostic) http://archive.ics.uci.edu/ml/d 973 atasets/Yeast Table 10 Data Sets with Numerical Target Variable, Transformed into a Fuzzy Proposition µ↑ (see Formula 43 and Formula 44) Name Target Filter auto93 µ (price) Yes autohorse No baskball µ (horsepower) µ (points_per_minut e) bodyfat µ (bodyfat) No bolts µ (t20bolt) No breasttumor µ (tumor_size) No cholestrol µ (chol) No cloud µ-(te) No µ (violentcrimesperp op) Yes ↑ ↑ ↑− ↑ ↑ ↑ ↑ No ↑− communities No elusage µ (prp) µ (average_electric ity_usage) fishcatch µ (wheight) No forestfires µ (area) No fruitfly µ (longevity) No housing µ (price) No lowbwt µ (bwt) No cpu 23 ↑ ↑ ↑ ↑ ↑ ↑ ↑ No 23 Records Link (accessed 08.2010) http://www.amstat.org/public ations/jse/v1n1/datasets.lock. 138 html http://archive.ics.uci.edu/ml/d 65 atasets/Automobile http://lib.stat.cmu.edu/datase 62 ts/smoothmeth http://lib.stat.cmu.edu/datase 163 ts/bodyfat http://lib.stat.cmu.edu/datase 30 ts/bolts http://archive.ics.uci.edu/ml/ machine-learning181 databases/breast-cancer/ http://archive.ics.uci.edu/ml/d 203 atasets/Heart+Disease http://lib.stat.cmu.edu/datase 68 ts/cloud http://archive.ics.uci.edu/ml/d atasets/Communities+and+C 1363 rime http://archive.ics.uci.edu/ml/ machine-learning135 databases/cpu-performance/ http://lib.stat.cmu.edu/datase 35 ts/smoothmeth http://www.amstat.org/public ations/jse/datasets/fishcatch. dat http://www.amstat.org/public ations/jse/datasets/fishcatch.t 97 xt http://archive.ics.uci.edu/ml/d 346 atasets/Forest+Fires http://www.amstat.org/public ations/jse/datasets/fruitfly.da t http://www.amstat.org/public 79 ations/jse/datasets/fruitfly.txt http://archive.ics.uci.edu/ml/d 348 atasets/Housing http://www.umass.edu/statda ta/statdata/data/lowbwt.txt 124 http://www.umass.edu/statda See Section A.1.1 and A.1.2 131 ta/statdata/data/lowbwt.dat meta µ (error_rate) No parkinsons µ (motor_UPDRS) Yes pbc µ (survival_time) Yes pharynx µ (days_survived) Yes pollution µ (mort) No quake µ (richter) No sensory No servo µ (score) µ (servo_rise_time ) sleep µ (total_sleep) No slump µ (slump_cm) Yes strike µ (volume) No veteran µ(survival) No wineqred µ (quality) No wineqwhite µ (quality) No 132 ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ No http://archive.ics.uci.edu/ml/d 348 atasets/Meta-data http://archive.ics.uci.edu/ml/d 1172 atasets/Parkinsons http://lib.stat.cmu.edu/datase 114 ts/pbc http://www.umass.edu/statda ta/statdata/data/pharynx.txt http://www.umass.edu/statda 117 ta/statdata/data/pharynx.dat http://lib.stat.cmu.edu/datase 39 ts/pollution http://lib.stat.cmu.edu/datase 1463 ts/smoothmeth http://lib.stat.cmu.edu/datase 391 ts/sensory http://archive.ics.uci.edu/ml/d 108 atasets/Servo http://lib.stat.cmu.edu/datase 41 ts/sleep http://archive.ics.uci.edu/ml/d atasets/Concrete+Slump+Tes 75 t http://lib.stat.cmu.edu/datase 416 ts/strikes http://lib.stat.cmu.edu/datase 88 ts/veteran http://archive.ics.uci.edu/ml/d 1058 atasets/Wine+Quality http://archive.ics.uci.edu/ml/d 3247 atasets/Wine+Quality A.1.1. Attribute Selection Filters Data set Selected attributes in prediction experiments anneal surface_quality, family, ferro, thick, formability, chrom, condition, steel, non_ageing, phos, surface_finish, bf, enamelability, width, blue_bright_varn_clean, strength, carbon, shape, cbond, bw_me, exptl, lustre, bl Type, City_MPG, Highway_MPG, Air_Bags_standard, Drive_train_type, Number_of_cylinders, Engine_size, RPM, Engine_revolutions_per_mile, Manual_transmission_available, Fuel_tank_capacity, Passenger_capacity, Length, Wheelbase, Width, U_turn_space, Rear_seat_room, Luggage_capacity, Weight, Domestic grain_screened, ink_color, proof_on_ctd_ink, blade_mfg, cylinder_division , paper_type, ink_type, direct_steam, solvent_type, type_on_cylinder , press_type, press , unit_number , cylinder_size, paper_mill_location, plating_tank, proof_cut, viscosity, caliper, ink_temperature, humifity, roughness, blade_pressure, varnish_pct, press_speed, ink_pct, solvent_pct, ESA_Voltage, ESA_Amperage , wax, hardener, roller_durometer, current_density_num , anode_space_ratio, chrome_content pctilleg, pctkids2par, pctfam2par, racepctwhite, numilleg, pctyoungkids2par, pctteen2par, racepctblack, pctwinvinc, pctwpubasst, numunderpov, pctpopunderpov, femalepctdiv, pctpersdensehous, totalpctdiv, pctpersownoccup, pctunemployed, malepctdivorce, pcthousnophone, pctvacantboarded, pctnothsgrad, medfaminc, pcthousownocc, housvacant, pcthousless3br, pctless9thgrade, medincome, blackpercap, pctlarghousefam, numinshelters, numstreet, percapinc, agepct16t24, population, numburban, pctwofullplumb, pcthousoccup, lemaspctofficdrugun, pctemploy, mednumbr, pctoccupmgmtprof, pctimmigrec10, numimmig, pctbsormore, medrentpcthousinc, pctoccupmanu, pctimmigrec8, pctwwage, agepct12t29, pctlarghouseoccup, hisppercap, popdens, pctimmigrecent, pctimmigrec5 Age, surgery, mucous_membranes, to_number(pulse) as pulse, capillary_refill_time, to_number(packed_cell_volume) as packed_cell_volume, peristalsis, pain, abdominal_distension, temperature_of_extremities, to_number(rectal_temperature) on_thyroxine, query_on_thyroxine, on_antithyroid_medication, sick, pregnant, thyroid_surgery, I131_treatment, query_hypothyroid, query_hyperthyroid, lithium, goitre, tumor, hypopituitary, psych, to_number(replace(TSH, '?', null)) as TSH, to_number(replace(T3, '?', null)) as T3, to_number(replace(TT4, '?', null)) as TT4, to_number(replace(T4U, '?', null)) as T4U, to_number(replace(FTI, '?', null)) as FTI, referral_source auto93 bands communities horse hypothyroid parkinsons subject_no, age, sex, test_time, Jitter_prc, Jitter_abs, Jitter_rap, Jitter_PPQ5, Jitter_DDP, Shimmer, Shimmer_dB, Shimmer_APQ3, Shimmer_APQ5, Shimmer_APQ11, Shimmer_DDA, NHR, HNR, RPDE, DFA, PPE 133 pbc pharynx Z1, Z2, Z3, Z4, Z5, Z6, Z7, Z8, Z9, Z10, Z11, Z12, Z13, Z14, Z15, Z16, Z17 Inst, sex, Treatment, Grade, Age, Condition, Site, T, N, Status slump Cement, Slag, Fly_ash, Water, SP, Coarse_Aggr, Fine_Agg spectf f12s, f22r, f10s, f9s, f16s, f3s, f5s, f8s, f5r, f19s, f2s, f13s, f15s, f20s wisconsin radius, texture, perimeter, area, smoothness, compactness, concavity, concave_points, symmetry, fractal_dimension yeast mcg, gvh, alm, mit, erl, pox, vac, nuc A.1.2. Record Selection Filters Data set Selection criteria horse surgery <> '?' and mucous_membranes <> '?' and pulse <> '?' and capillary_refill_time <> '?' and packed_cell_volume <> '?' and peristalsis <> '?' and pain <> '?' and abdominal_distension <> '?' and temperature_of_extremities <> '?' and rectal_temperature <> '?' hypothyroid TSH <> '?' and T3 <> '?' and TT4 <> '?' and T4U <> '?' and FTI <> '?' 134 A.2. Database Queries A.2.1. Database Queries for Experiments with Sharp Target Variables Query 1: View for evaluation of experiment base data create table evaluation_basis3 as select dataset, method_tmp as aggregation, ifc, holdout_no, correlation, auc,mae,rmse, c_s, c_s_sig, c_k, c_k_sig from ( select ifcl_file,dataset,case when ifc is null then 'noifc' else ifc end as ifc, correlation, percent_rank() over(partition by dataset order by correlation) as r, --correlation as r, auc,mae,rmse, replace( replace(regexp_replace(replace(ifcl_file, './metainduction/'||dataset||'/'||ifc||'/', ''), '[[:digit:]]+_', ''),'.ifcl') , './metainduction/'||dataset||'/', '') as method_tmp, holdout_no, c_s, c_s_sig, c_k, c_k_sig from ( select ifcl_file,dataset,correlation,auc,mae,rmse, replace(replace(ifc_method_tmp, './metainduction/'||dataset||'/', ''), '/', '') as ifc, holdout_no, c_s, c_s_sig, c_k, c_k_sig from 135 ( SELECT ifcl_file,correlation,auc,mae,rmse, replace(replace(REGEXP_SUBSTR(ifcl_file, './metainduction/[[:alnum:]]+/'), './metainduction/', ''), '/', '') as dataset, REGEXP_SUBSTR(ifcl_file, './metainduction/([[:alnum:]]+)/([[:alpha:]])+/') as ifc_method_tmp, holdout_no, c_s, c_s_sig, c_k, c_k_sig from evaluations_repeated ) ) ) where method_tmp not like '%numeric%' and method_tmp not like '%ct%' create index ix1 on evaluation_basis3(dataset) create index ix2 on evaluation_basis3(ifc) Query 2: Average correlation per aggregation method select aggregation, avg(correlation) from evaluation_basis3 where ifc <> 'noifc' group by aggregation order by avg(correlation) desc Query 3: Test of significance in difference between logistic regression, linear regression and regression tree as aggregation methods select stats_wsr_test(corr_logreg, corr_regtree, 'ONE_SIDED_SIG'), stats_wsr_test(corr_logreg, corr_linreg, 'ONE_SIDED_SIG') from ( select dataset, 136 sum(case when aggregation = 'logreg' then correlation else 0 end) / sum(case when aggregation = 'logreg' then 1 else 0 end) as corr_logreg, sum(case when aggregation = 'regtree' then correlation else 0 end) / sum(case when aggregation = 'regtree' then 1 else 0 end) as corr_regtree, sum(case when aggregation = 'linreg' then correlation else 0 end) / sum(case when aggregation = 'linreg' then 1 else 0 end) as corr_linreg from evaluation_basis3 where ifc <> 'noifc' group by dataset ) Query 4: Average correlation per induction and aggregation method select ifc, aggregation, avg(correlation) from evaluation_basis3 group by ifc, aggregation order by avg(correlation) desc Query 5: Test of significance of difference between logistic regression using NLR-IAF, NLD-IAF and regression trees without IAF select stats_wsr_test(corr_logreg_nlr, corr_regtree_no_iaf, 'ONE_SIDED_SIG'), stats_wsr_test(corr_logreg_nlr, corr_logreg_nld, 'ONE_SIDED_SIG') from ( select dataset, sum(case when ifc||'.'||aggregation = 'nlr.logreg' then correlation 137 else 0 end) / sum(case when ifc||'.'||aggregation = 'nlr.logreg' then 1 else 0 end) as corr_logreg_nlr, sum(case when ifc||'.'||aggregation = 'nld.logreg' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'nld.logreg' then 1 else 0 end) as corr_logreg_nld, sum(case when ifc||'.'||aggregation = 'noifc.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'noifc.regtree' then 1 else 0 end) as corr_regtree_no_iaf from evaluation_basis3 where dataset is not null group by dataset ) A.2.2. Database Queries for Experiments with Numerical Target Variables Query 6: View for evaluation of experiment base data create table evaluation_basis_num as select dataset, method_tmp as aggregation, ifc, fuzzification, holdout_no, correlation, auc,mae,rmse, c_s, c_s_sig, c_k, c_k_sig from ( select ifcl_file,dataset, nvl(replace(substr(ifc, 1, regexp_instr(ifc, '._l|_p')),'_', ''), 'noifc') as ifc, 138 replace(substr(ifc, regexp_instr(ifc, '._l|_p')+1, length(ifc)), '_', '') as fuzzification, correlation, percent_rank() over(partition by dataset order by correlation) as r, --correlation as r, auc,mae,rmse, replace( replace(regexp_replace(replace(ifcl_file, './metainduction_num/'||dataset||'/'||ifc||'/', ''), '[[:digit:]]+_', ''),'.ifcl') , './metainduction_num/'||dataset||'/', '') as method_tmp, holdout_no, c_s, c_s_sig, c_k, c_k_sig from ( select ifcl_file,dataset,correlation,auc,mae,rmse, replace(replace(ifc_method_tmp, './metainduction_num/'||dataset||'/', ''), '/', '') as ifc, holdout_no, c_s, c_s_sig, c_k, c_k_sig from ( SELECT ifcl_file,correlation,auc,mae,rmse, replace(replace(REGEXP_SUBSTR(ifcl_file, './metainduction_num/[[:alnum:]]+/'), './metainduction_num/', ''), '/', '') as dataset, REGEXP_SUBSTR(ifcl_file, './metainduction_num/([[:alnum:]]+)/([[:print:]])+/') as ifc_method_tmp, holdout_no, c_s, c_s_sig, c_k, c_k_sig from evaluations_repeated_num ) ) ) create index ix3 on evaluation_basis_num(dataset) 139 create index ix4 on evaluation_basis_num(ifc) create index ix5 on evaluation_basis_num(fuzzification) Query 7: Average correlation per aggregation method, given linear ITF select aggregation, avg(correlation) from evaluation_basis_num where ifc <> 'noifc' group by aggregation order by avg(correlation) desc Query 8: Test of significance in difference between regression tree, linear regression and average select stats_wsr_test(corr_regtree, corr_avg, 'ONE_SIDED_SIG'), stats_wsr_test(corr_regtree, corr_linreg, 'ONE_SIDED_SIG') from ( select dataset, sum(case when aggregation = 'avg' then correlation else 0 end) / sum(case when aggregation = 'avg' then 1 else 0 end) as corr_avg, sum(case when aggregation = 'regtree' then correlation else 0 end) / sum(case when aggregation = 'regtree' then 1 else 0 end) as corr_regtree, sum(case when aggregation = 'linreg' then correlation else 0 end) / sum(case when aggregation = 'linreg' then 1 else 0 end) as corr_linreg from evaluation_basis_num where ifc <> 'noifc' group by dataset ) 140 Query 9: Average correlation per target fuzzification method select fuzzification, avg(correlation) from evaluation_basis_num where aggregation = 'regtree' group by fuzzification order by avg(correlation) desc Query 10: Test of significance between target fuzzification methods select stats_wsr_test(corr_l, corr_p, 'ONE_SIDED_SIG') from ( select dataset, sum(case when fuzzification = 'l' then correlation else 0 end) / sum(case when fuzzification = 'l' then 1 else 0 end) as corr_l, sum(case when fuzzification = 'p' then correlation else 0 end) / sum(case when fuzzification = 'p' then 1 else 0 end) as corr_p from evaluation_basis_num where aggregation = 'regtree' and ifc <> 'noifc' group by dataset ) Query 11: Average correlation per induction and aggregation method (given linear target fuzzification) select ifc, aggregation, avg(correlation) from evaluation_basis_num where (fuzzification = 'l' or fuzzification is null) and aggregation <> 'logreg_p' group by ifc, aggregation 141 order by avg(correlation) desc Query 12: Test of significance of difference between regression trees using NLD-IAF , NLDU-IAF and regression trees without IAF select stats_wsr_test(corr_regtree_nld, corr_regtree_no_iaf, 'ONE_SIDED_SIG'), stats_wsr_test(corr_regtree_nld, corr_regtree_nldu, 'ONE_SIDED_SIG'), stats_wsr_test(corr_regtree_nld, corr_regtree_nlr, 'ONE_SIDED_SIG') from ( select dataset, sum(case when ifc||'.'||aggregation = 'nld.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'nld.regtree' then 1 else 0 end) as corr_regtree_nld, sum(case when ifc||'.'||aggregation = 'nldu.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'nldu.regtree' then 1 else 0 end) as corr_regtree_nldu, sum(case when ifc||'.'||aggregation = 'npr.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'npr.regtree' then 1 else 0 end) as corr_regtree_nlr, sum(case when ifc||'.'||aggregation = 'noifc.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'noifc.regtree' then 1 else 0 end) 142 as corr_regtree_no_iaf from evaluation_basis_num where dataset is not null group by dataset ) A.2.3. Database Queries for Both Types of Experiments Query 13: Table with data parameters for each dataset -- Execute 00_load_all.bat first create or replace view data_params as select * from ( -- improvement select dataset, (corr_logreg_nlrcorr_logreg_no_iaf)/corr_logreg_no_iaf as improvement_by_iaf from ( select dataset, sum(case when ifc||'.'||aggregation = 'nlr.logreg' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'nlr.logreg' then 1 else 0 end) as corr_logreg_nlr, sum(case when ifc||'.'||aggregation = 'noifc.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'noifc.regtree' then 1 else 0 end) as corr_regtree_no_iaf, sum(case when ifc||'.'||aggregation = 'noifc.logreg' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'noifc.logreg' then 1 else 0 end) as corr_logreg_no_iaf 143 from evaluation_basis3 where dataset is not null group by dataset ) union select dataset, (corr_regtree_nld corr_regtree_no_iaf)/corr_regtree_no_iaf as improvement_by_iaf from ( select dataset, sum(case when ifc||'.'||aggregation = 'nld.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'nld.regtree' then 1 else 0 end) as corr_regtree_nld, sum(case when ifc||'.'||aggregation = 'noifc.regtree' then correlation else 0 end) / sum(case when ifc||'.'||aggregation = 'noifc.regtree' then 1 else 0 end) as corr_regtree_no_iaf from evaluation_basis_num where dataset is not null group by dataset ) ) join ( -- column count select replace(lower(table_name), '_tr', '') as dataset, sum(case when data_type in ('CHAR', 'VARCHAR2') then 1 else 0 end) as ncatcols, sum(case when data_type in ('NUMBER', 'FLOAT') then 1 else 0 end) as nnumcols, 144 sum(case when data_type in ('CHAR', 'VARCHAR2') then 1 else 0 end)/count(*) as pcatcols, sum(case when data_type in ('NUMBER', 'FLOAT') then 1 else 0 end)/count(*) as pnumcols, count(*) as ncols from sys.all_tab_columns where table_name in ('ABALONE_TR', 'ADULT_TR', 'ANNEAL_TR', 'BALANCE_TR', 'BANDS_TR', 'BREASTW_TR', 'CAR_TR', 'CMC_TR', 'CREDIT_TR', 'CREDITG_TR', 'DIABETES_TR', 'DIAGNOSIS_TR', 'GLASS_TR', 'HEART_TR', 'HEPATITIS_TR', 'HORSE_TR', 'HYPOTHYROID_TR', 'INTERNETADS_TR', 'IONOSPHERE_TR', 'IRIS_TR', 'LETTER_TR', 'LYMPH_TR', 'SEGMENT_TR', 'SONAR_TR', 'SPECTF_TR', 'VEHICLE_TR', 'WAVEFORM_TR', 'WINE_TR', 'WISCONSIN_TR', 'YEAST_TR', 'AUTO93_TR', 'AUTOHORSE_TR', 'BASKBALL_TR', 'BODYFAT_TR', 'BOLTS_TR', 'BREASTTUMOR_TR', 'CHOLESTROL_TR', 'CLOUD_TR', 'COMMUNITIES_TR', 'CPU_TR', 'ELUSAGE_TR', 'FISHCATCH_TR', 'FORESTFIRES_TR', 'FRUITFLY_TR', 'HOUSING_TR', 'LOWBWT_TR', 'META_TR', 'PARKINSONS_TR', 'PBC_TR', 'PHARYNX_TR', 'POLLUTION_TR', 'QUAKE_TR', 'SENSORY_TR', 'SERVO_TR', 'SLEEP_TR', 'SLUMP_TR', 'STRIKE_TR', 'VETERAN_TR', 'WINEQRED_TR', 'WINEQWHITE_TR') and column_name <> 'IS_TRAINING' and column_name <> 'Y' group by table_name ) using (dataset) join ( -- correlation of linear regression select dataset, avg(correlation) as avg_corr_linreg from evaluation_basis3 where ifc = 'noifc' and aggregation = 'linreg' group by dataset union select dataset, avg(correlation) from evaluation_basis_num where ifc = 'noifc' and aggregation = 'linreg' 145 group by dataset ) using (dataset) join ( -- row count select 'abalone' as dataset, 'c' as target_type, count(*) as nrows from abalone_tr union select 'abalone', 'c', count(*) from abalone_tr union select 'adult', 'c', count(*) from adult_tr union select 'anneal', 'c', count(*) from anneal_tr union select 'balance', 'c', count(*) from balance_tr union select 'bands', 'c', count(*) from bands_tr union select 'breastw', 'c', count(*) from breastw_tr union select 'car', 'c', count(*) from car_tr union select 'cmc', 'c', count(*) from cmc_tr union select 'credit', 'c', count(*) from credit_tr union select 'creditg', 'c', count(*) from creditg_tr union select 'diabetes', 'c', count(*) from diabetes_tr union select 'diagnosis', 'c', count(*) from diagnosis_tr union select 'glass', 'c', count(*) from glass_tr union select 'heart', 'c', count(*) from heart_tr union select 'hepatitis', 'c', count(*) from hepatitis_tr union select 'horse', 'c', count(*) from horse_tr union select 'hypothyroid', 'c', count(*) from hypothyroid_tr union select 'internetads', 'c', count(*) from internetads_tr union select 'ionosphere', 'c', count(*) from ionosphere_tr union select 'iris', 'c', count(*) from iris_tr union select 'letter', 'c', count(*) from letter_tr union select 'lymph', 'c', count(*) from lymph_tr union select 'segment', 'c', count(*) from segment_tr union select 'sonar', 'c', count(*) from sonar_tr union select 'spectf', 'c', count(*) from spectf_tr union select 'vehicle', 'c', count(*) from vehicle_tr union select 'waveform', 'c', count(*) from waveform_tr union select 'wine', 'c', count(*) from wine_tr union select 'wisconsin', 'c', count(*) from wisconsin_tr union select 'yeast', 'c', count(*) from yeast_tr union select 'auto93', 'n', count(*) from auto93_tr union select 'autohorse', 'n', count(*) from autohorse_tr 146 union select 'baskball', 'n', count(*) from baskball_tr union select 'bodyfat', 'n', count(*) from bodyfat_tr union select 'bolts', 'n', count(*) from bolts_tr union select 'breasttumor', 'n', count(*) from breasttumor_tr union select 'cholestrol', 'n', count(*) from cholestrol_tr union select 'cloud', 'n', count(*) from cloud_tr union select 'communities', 'n', count(*) from communities_tr union select 'cpu', 'n', count(*) from cpu_tr union select 'elusage', 'n', count(*) from elusage_tr union select 'fishcatch', 'n', count(*) from fishcatch_tr union select 'forestfires', 'n', count(*) from forestfires_tr union select 'fruitfly', 'n', count(*) from fruitfly_tr union select 'housing', 'n', count(*) from housing_tr union select 'lowbwt', 'n', count(*) from lowbwt_tr union select 'meta', 'n', count(*) from meta_tr union select 'parkinsons', 'n', count(*) from parkinsons_tr union select 'pbc', 'n', count(*) from pbc_tr union select 'pharynx', 'n', count(*) from pharynx_tr union select 'pollution', 'n', count(*) from pollution_tr union select 'quake', 'n', count(*) from quake_tr union select 'sensory', 'n', count(*) from sensory_tr union select 'servo', 'n', count(*) from servo_tr union select 'sleep', 'n', count(*) from sleep_tr union select 'slump', 'n', count(*) from slump_tr union select 'strike', 'n', count(*) from strike_tr union select 'veteran', 'n', count(*) from veteran_tr union select 'wineqred', 'n', count(*) from wineqred_tr union select 'wineqwhite', 'n', count(*) from wineqwhite_tr ) using (dataset) Query 14: Correlation of parameters with the performance improvement by IFC select target_type, CORR_S(improvement_by_iaf, ncatcols) c_ncatcols, CORR_S(improvement_by_iaf, ncatcols, 'TWO_SIDED_SIG') p_ncatcols, CORR_S(improvement_by_iaf, nnumcols) c_nnumcols, 147 CORR_S(improvement_by_iaf, nnumcols, 'TWO_SIDED_SIG') p_nnumcols, CORR_S(improvement_by_iaf, pcatcols) c_pcatcols, CORR_S(improvement_by_iaf, pcatcols, 'TWO_SIDED_SIG') p_pcatcols, CORR_S(improvement_by_iaf, pnumcols) c_pnumcols, CORR_S(improvement_by_iaf, pnumcols, 'TWO_SIDED_SIG') p_pnumcols, CORR_S(improvement_by_iaf, ncols) c_ncols, CORR_S(improvement_by_iaf, ncols, 'TWO_SIDED_SIG') p_ncols, CORR_S(improvement_by_iaf, avg_corr_linreg) c_avg_corr_linreg, CORR_S(improvement_by_iaf, avg_corr_linreg, 'TWO_SIDED_SIG') p_avg_corr_linreg, CORR_S(improvement_by_iaf, nrows) c_nrows, CORR_S(improvement_by_iaf, nrows, 'TWO_SIDED_SIG') p_nrows, CORR_S(improvement_by_iaf, nrows/ncols) c_nrowscols, CORR_S(improvement_by_iaf, nrows/ncols, 'TWO_SIDED_SIG') p_nrowscols from data_params group by target_type Query 15: Table for scatter plot visualization of association between linearity of data and improvement by IAF select dataset, target_type, avg_corr_linreg, improvement_by_iaf from data_params order by target_type, dataset 148 A.3. Experimental Results Tables Table 11 Average Prediction Correlations per Dataset/Aggregation Method for Binary Targets Dataset Avg. Corr. Log. Reg. Avg. Corr. Reg. Tree Avg. Corr. Lin. Reg. abalone 0.638687626 0.651733317 0.636539418 adult 0.659169773 0.651350849 0.636197273 anneal 0.823367865 0.846689248 0.797382729 balance 0.920732292 0.791055945 0.840604813 bands 0.454571882 0.351967704 0.426261177 breastw 0.939085094 0.927238012 0.931453636 car 0.914349826 0.933901714 0.811581886 cmc 0.425228906 0.421732164 0.418979581 credit 0.76498662 0.757839955 0.767073216 creditg 0.457163162 0.426897568 0.437699783 diabetes 0.54510302 0.533295481 0.546568895 diagnosis 1 0.868887353 0.965967697 glass 0.794456151 0.774937317 0.802028743 heart 0.695053211 0.643374806 0.684389592 hepatitis 0.291646106 0.383819768 0.388031234 horse 0.572088107 0.529928212 0.555148535 hypothyroid 0.792125757 0.892843929 0.79338345 internetads 0.805987731 0.753679747 0.78294501 ionosphere 0.819557055 0.799538551 0.798060482 iris 1 0.999193202 0.991252264 letter 0.626368894 0.798441855 0.480644051 lymph 0.695768921 0.672304979 0.726334152 segment 0.99999997 0.997855955 0.992554443 sonar 0.558775153 0.554680801 0.578090696 spectf 0.501761759 0.447468223 0.463867849 vehicle 0.475892051 0.460322532 0.437534698 waveform 0.82107583 0.785686835 0.74716555 wine 0.918745813 0.861574492 0.914019581 wisconsin 0.908323537 0.872676408 0.88191778 yeast 0.421658925 0.424627552 0.423187415 149 Table 12 Average Prediction Correlations per Dataset and Aggregation Method for Numerical Target Variables Dataset Avg. Corr. Avg. Avg. Corr. Reg. Tree Avg. Corr. Lin. Reg. auto93 0.691201303 0.857445115 0.738746817 autohorse 0.892653343 0.926796658 0.930059659 baskball 0.536398611 0.638183226 0.539943795 bodyfat 0.744776961 0.980938537 0.977226072 bolts 0.79763048 0.894265558 0.905782285 breasttumor 0.231971819 0.380396489 0.258283305 cholestrol 0.176404133 0.345649097 0.163348437 cloud 0.8020314 0.891982414 0.865600118 communities 0.731280609 0.79667882 0.780258252 cpu 0.788972056 0.814152614 0.794450568 elusage 0.865618561 0.897333607 0.872277672 fishcatch 0.845272306 0.918434123 0.894956413 forestfires 0.018454772 0.167736369 0.014733488 fruitfly -0.131668697 0.165425495 -0.067547186 housing 0.687542408 0.908550517 0.86615543 lowbwt 0.741678073 0.810166446 0.788121932 meta 0.27658433 0.448502931 0.353513754 parkinsons 0.326735958 0.881670195 0.605425938 pbc 0.475841958 0.67048775 0.487261393 pharynx 0.606093213 0.775843245 0.683939249 pollution 0.656915329 0.811606345 0.655832652 quake 0.057471089 0.081629626 0.058700741 sensory 0.290356213 0.400912946 0.337693928 servo 0.703894008 0.828999856 0.780728183 sleep 0.713524339 0.753362622 0.601491837 slump 0.422059811 0.668746482 0.500413959 strike 0.382845386 0.485543157 0.418259534 veteran 0.358292594 0.483294002 0.413181002 wineqred 0.544798097 0.615254318 0.595187199 wineqwhite 0.507972143 0.577186397 0.543279329 150 Table 13: Average Prediction Correlations for IAF on Data with Numerical Target Variables using Linear versus Percentile ITF Dataset Linear ITF Percentile ITF auto93 0.88105593 0.833834299 autohorse 0.946780795 0.90681252 baskball 0.635879956 0.640486496 bodyfat 0.984034462 0.977842612 bolts 0.921790457 0.866740659 breasttumor 0.385000153 0.375792824 cholestrol 0.365144244 0.32615395 cloud 0.926867976 0.857096852 communities 0.82334782 0.770009819 cpu 0.937630747 0.690674481 elusage 0.905249904 0.88941731 fishcatch 0.94873926 0.888128986 forestfires 0.210580299 0.124892439 fruitfly 0.157932749 0.172918241 housing 0.940547453 0.87655358 lowbwt 0.814805887 0.805527006 meta 0.577117175 0.319888686 parkinsons 0.880604806 0.882735584 pbc 0.666064677 0.674910823 pharynx 0.785763664 0.765922825 pollution 0.807044092 0.816168598 quake 0.078567569 0.084691683 sensory 0.399599899 0.402225994 servo 0.872101166 0.785898546 sleep 0.783933708 0.722791535 slump 0.708012949 0.629480015 strike 0.522569102 0.448517211 veteran 0.515124253 0.45146375 wineqred 0.619064343 0.611444293 wineqwhite 0.58210665 0.572266143 151 Table 14 Average Prediction Correlations per Dataset and Three Different Combinations of Supervised Aggregation and IAF for Binary Target Variables Dataset Avg. Corr. Log. Reg. + IAF NLR Avg. Corr. Log. Reg. + IAF NLD Avg. Corr. Reg. Tr w/o IAF abalone 0.633651826 0.636990551 0.64867844 adult 0.662989858 0.656921233 0.608645742 anneal 0.826555309 0.823932942 0.853821973 balance 0.923393427 0.924777928 0.79058825 bands 0.461862401 0.450021541 0.343296921 breastw 0.939701709 0.939464426 0.930009743 car 0.91660187 0.913644582 0.948927087 cmc 0.42385694 0.423400481 0.455736376 credit 0.768277547 0.763557368 0.751852807 creditg 0.464104961 0.469843758 0.393045324 diabetes 0.556749014 0.545362638 0.50644607 diagnosis 1 1 0.944533837 glass 0.802605598 0.804760434 0.824622632 heart 0.694785041 0.70493683 0.657426735 hepatitis 0.374975313 0.311577565 0.339608818 horse 0.569485166 0.55213211 0.541218353 hypothyroid 0.80477987 0.800429276 0.898883448 internetads 0.807436554 0.808116466 0.798458728 ionosphere 0.809199099 0.819279121 0.764848148 iris 1 1 0.996991022 letter 0.6349395 0.62902598 0.727029396 lymph 0.68823764 0.690663197 0.727999682 segment 0.999999995 0.999999953 0.997593306 sonar 0.559774797 0.557996241 0.496422658 spectf 0.505452116 0.510651243 0.473619771 vehicle 0.46898602 0.496373681 0.559857093 waveform 0.826959479 0.821165897 0.783760649 wine 0.912450784 0.923247357 0.90726124 wisconsin 0.912735058 0.907890135 0.873368639 yeast 0.420860794 0.420696658 0.412990274 152 Table 15 Average Prediction Correlations per Dataset and Three Different Combinations of Supervised Aggregation and IAF for Numerical Target Variables Dataset Avg. Corr. Reg. Tr. + IAF NLD Avg. Corr. Reg. Tr. + IAF NLDU Avg. Corr. Reg. Tr. w/o IAF auto93 0.851949884 0.854702482 0.783694291 autohorse 0.934671172 0.932528555 0.939116035 baskball 0.651189815 0.65121036 0.623578896 bodyfat 0.989455545 0.990353116 0.991221665 bolts 0.899445255 0.90182103 0.921987087 breasttumor 0.379942005 0.37994241 0.252316629 cholestrol 0.354768905 0.346985441 0.193892889 cloud 0.901964456 0.89860454 0.913436117 communities 0.798599026 0.796617813 0.791015749 cpu 0.843304898 0.842528113 0.97677025 elusage 0.905475647 0.905995831 0.881926102 fishcatch 0.932313534 0.933532791 0.985895292 forestfires 0.175728178 0.169928128 0.014991533 fruitfly 0.212102028 0.212108602 -0.064702557 housing 0.918270044 0.917557553 0.900012141 lowbwt 0.813006747 0.812914092 0.783555356 meta 0.462534319 0.454644049 0.368499477 parkinsons 0.887066413 0.881144444 0.941203949 pbc 0.66081537 0.660799138 0.534079402 pharynx 0.773127351 0.773127987 0.708101866 pollution 0.829522663 0.831729118 0.60923147 quake 0.082057392 0.08047578 0.073515546 sensory 0.404010864 0.404010206 0.386991775 servo 0.82787911 0.827815554 0.912721346 sleep 0.77696476 0.764053467 0.730795112 slump 0.690106281 0.689333045 0.493530846 strike 0.496745411 0.496578249 0.415429314 veteran 0.471382971 0.471232266 0.437471553 wineqred 0.611102154 0.615205884 0.580172859 wineqwhite 0.572707345 0.573805738 0.563459737 153 Table 16 Relationship between Target Linearity and Improvement of Logistic Regression by IAF NLR for Binary Target Variables Dataset Target linearity Improvement by IAF NLR abalone 0.632486147 -0.015038624 adult 0.601697632 0.003616993 anneal 0.847487701 0.265333889 balance 0.839927779 0.010051891 bands 0.397808335 0.101084917 breastw 0.92253246 -0.001225938 car 0.814076541 -0.001610122 cmc 0.351769834 0.17744701 credit 0.733383933 0.063102998 creditg 0.425527893 0.040954514 diabetes 0.527451425 0.044424251 diagnosis 0.972462485 -5.2E-16 glass 0.817993073 -0.003667589 heart 0.6866888 0.030462835 hepatitis 0.381693585 0.098467765 horse 0.552106327 0.111811784 hypothyroid 0.584747606 0.080805492 internetads 0.784842676 0.00291351 ionosphere 0.681485184 0.15769256 iris 0.97214843 2.97338E-06 letter 0.45185967 0.10947387 lymph 0.723280478 0.033367981 segment 0.956800746 -4.89814E-09 sonar 0.512394902 0.043510527 spectf 0.452480291 -0.002580917 vehicle 0.57214217 -0.206318239 waveform 0.724956824 0.075158502 wine 0.910709707 154 -0.029160901 Table 17 Relationship between Target Linearity and Improvement of Regression Trees by IAF NLD for Numerical Target Variables Dataset Target linearity Improvement by IAF NLD autohorse 0.946606415 -0.004733029 auto93 0.736992093 0.087094666 baskball 0.623578896 0.044278148 bodyfat 0.990358409 -0.001781761 bolts 0.85950705 -0.024449184 breasttumor 0.236969021 0.505814364 cholestrol 0.170467119 0.82971591 cloud 0.912861146 -0.0125588 communities 0.797109195 0.009586759 cpu 0.930542139 -0.136639452 elusage 0.845600615 0.026702402 fishcatch 0.963623075 -0.054348325 forestfires 0.073146275 10.72182832 fruitfly -0.064702557 -4.278108926 housing 0.854820529 0.020286285 lowbwt 0.778262746 0.037586867 meta 0.383398892 0.25518311 parkinsons 0.459022397 -0.057519453 pbc 0.530953313 0.237297988 pharynx 0.699754036 0.091830693 pollution 0.679583809 0.361588664 quake 0.066085663 0.116191016 sensory 0.336532333 0.043977907 servo 0.836899785 -0.092955244 sleep 0.709322872 0.063177281 slump 0.436362228 0.398304251 strike 0.42286715 0.195739913 veteran 0.397186311 0.077516853 wineqred 0.583219599 0.053310483 wineqwhite 0.52352684 0.01641219 155 156 A.4. IFCL Syntax and Application A.4.1. Grammar Notation The syntax will be described with the aid of meta-linguistic formulae in the Backus-Naur Form (BNF) as proposed by Backus, et al. (1960): <…> ::= … | … <character> <string> <integer> <decimal> <empty> Meta-linguistic variables: nodes of the grammar tree Definition of syntax element placeholders Alternative choice of syntax elements A symbol Any sequence of symbols A sequence of digits (0-9) A sequence of digits (0-9) containing one decimal point An empty set of symbols: no symbol at all All symbols that are not inside of less than / greater than signs ( <…> ) except the symbols ::= and | are used literally in the IFCL language. Different meta-linguistic variable declarations are separated by an empty line. Comments can be made in IFCL files as a sequence of symbols enclosed by the character #. In fact, comments are filtered out before parsing, so they are not part of the language itself. A.4.2. IFCL File Structure <IFCL file> ::= <IFCL file node> <IFCL file leaf> <IFCL file node> ::= <IFCL file execution call> <IFCL file node> | <empty> <IFCL file leaf> ::= <connect to database> <IFCL leaf action sequence> <IFCL leaf action sequence> ::= <IFCL leaf action> <IFCL action sequence> | <empty> <IFCL leaf action> ::= | <drop database table> | <execute sql> | <load database> 157 | <induce membership function> | <aggregate multiple variables> | <data classification> | <evaluate correlations> A.4.3. o Executing IFCL Files in Batch Mode Syntax <IFCL file execution call> ::= @execifcl: file == <ifcl file path>; <ifcl file path> ::= <string> o Example @execifcl: file == ./metainduction/abalone/load.ifcl; A.4.4. o Connecting to the Database Syntax <connect to database> ::= @connect: hostname == <database host name> SID port ; == <database service identifier> ; == <database server port number> ; username == <user name> ; password == <password> ; <database host name> ::= <string> <database service identifier>::= <string> <database server port number> ::= <integer> <user name> ::= <string> <password> ::= <string> o Example @connect: HostName 158 == localhost; SID == XE; Port == 1521; UserName == ME; Password == passwort; A.4.5. o Droping a Database Table Syntax <drop database table> ::= @drop: table == <table name>; <table name>::= <string> o Example @drop: table == abalone_te_nlr_regtree A.4.6. o Executing an SQL Script Syntax <execute sql> ::= @execsql: <sql> <sql> ::= file == <sql script file path> ; | command == <sql statement> ; <sql script file path> ::= <string> <sql statement> ::= <string> o Example @execsql: file == metainduction/abalone/01_sql/table_split.sql; A.4.7. o Loading Data into The Database Syntax <load database> ::= <SQL*Loader> | <IFCL loader> <SQL*Loader>::= @sqlldr: data == <path to data file> ; control == <path to sqlldr control file> ; skip == <number of rows to skip>; 159 errors == <number of errors to accept>; <path to data file> ::= <string> <path to sqlldr control file> ::= <string> <number of rows to skip> ::= <integer> <number of errors to accept> ::= <integer> o Example @sqlldr: data == metainduction/abalone/00_data/data.csv; control == metainduction/abalone/00_data/sqlldr.ctl; skip == 0; errors == 10000; o Syntax <IFCL loader> ::= @load: file == <path to data file> ; table == <table to be created and loaded> ; delimiter == <data delimiting character> ; attributes == <attribute list> ; types == <type list>; <path to data file> ::= <string> <data delimiting character> ::= <character> <attribute list> ::= <attribute> <attribute list> | <empty> <attribute> ::= <string> <type list> ::= <type> <type list> | <empty> <type> ::= n | c 160 o Example @load: file == ./metainduction/balance/00_data/00_data.csv; table == balance_load; delimiter == ,; attributes == left_weight, left_distance, right_weight, right_distance, class; types == n, n, n, n, c; A.4.8. o Inducing a Membership Function Syntax <induce membership function> ::= @inducemf: table == <table for membership function induction> ; target variable == <table column> ; analytic variables == <column> | <column list> | <all columns> induction == <induction type> ; output file == <path to output file for sql classification template> ; <optional skipped columns> <optional left out columns> <table for membership function induction> ::= <string> <column> ::= <string> <all columns> ::= * <column list> ::= <column> <column list or empty> <column list or empty> = <column> <column list or empty> | <empty> <induction type> ::= l | nlr | nlru | nld | nldu | cp | npr | npru | npd | npdu | iff | nc | mm <path to output file for sql classification template> ::= <string> 161 <optional skipped columns> ::= skip columns == <column list> ; <optional left out columns> ::= let columns == <column list> ; o Example @inducemf: table == abalone_tr; target variable == y; analytic variables == *; induction == cp; output file == ./metainduction/abalone/02_output/cp.sql.template; A.4.9. o Classification of Data Syntax <data classification> ::= @classify: classified table == <table to be classified> ; template file == <path to sql classification template file> ; output table == <table for storing resulting classification> ; <table to be classified> ::= <string> <path to sql classification template file> ::= <string> <table for storing resulting classification> ::= <string> o Example @classify: classified table == abalone_te; template file == ./metainduction/regtree.sql.template; output table == abalone_te_rt; A.4.10. Aggregating Multiple Variables o Syntax <aggregate multiple variables> ::= @aggregatemv: 162 table == <table to be aggregated> ; analytic variables == <column> | <column list> | <all columns> ; aggregation operator == <aggretation operator>; target variable == <table column>; output file == <path to output file>; column alias == <name of column containing aggregated membership value>; <table to be aggregated> ::= <string> <aggretation operator> ::= linreg | logreg | regtree | min | max | ap | as | avg <name of column containing aggregated membership value> ::= <string> o Example @aggregatemv: table == abalone_tr_s; analytic variables == *; aggregation operator == linreg; target variable == y; output file == ./metainduction/abalone/02_output/linreg.sql.template; column alias == mfc; A.4.11. Evaluating Predictions o Syntax <evaluate correlations> ::= @evaluate: table == <table containing resulting prediction> ; <analytic variables> ::= <column> | <column list> | <all columns> ; target variable == <table column>; <optional result table> <optional output file> <optional result table> ::= 163 result table == <table for storing evaluation results> | <empty> ; <table for storing evaluation results> ::= <string> <optional output file> ::= output file == <file for storing evaluation results> | <empty> ; <file for storing evaluation results> ::= <string> o Example @evaluate: table == abalone_te_rt; target variable == y; analytic variable == mfc; result table == evaluations; output file == metainduction/anneal/attreval.csv; A.4.12. Data Preparation The following example IFCL code shows how to prepare datasets for training and evaluation using the IFCL load action. It shows also how to split training and test data randomly. o Example: @connect: HostName == localhost; SID == XE; Port == 1521; UserName == ME; Password == passwort; @load: file == ./metainduction_num/sensory/00_data/00_data.csv; table == sensory_load; delimiter == ,; attributes == Occasion,Judges,Interval,Sittings,Position,Squares,Rows_,Columns, Halfplot,Trellis,Method,Score; types == c,c,c,c,c,c,c,c,c,c,c,n; null == ?; 164 @drop: table == sensory_split; @execsql: command == create table sensory_split as select Occasion,Judges,Interval,Sittings,Position,Squares,Rows_,Columns, Halfplot,Trellis,Method,Score as y, # percent rank fuzzification # percent_rank() over (order by score) as y_p, # linear fuzzification # (score - min(score) over()) / (max(score) over()-min(score) over()) as y_l, case when dbms_random.value <= 0.667 then 1 else 0 end as is_training # random split of data for training and test sets # from sensory_load; @drop: table == sensory_tr; # create table with training set # @execsql: command == create table sensory_tr as select * from sensory_split where is_training = 1; @drop: table == sensory_te; # create table with test set # @execsql: command == create table sensory_te as select * from sensory_split where is_training = 0; A.4.13. Attribute Selection The following example IFCL code shows how to select relevant attributes from a relation regarding a target attribute. o Example: @connect: HostName == localhost; 165 SID == XE; Port == 1521; UserName == ME; Password == passwort; @inducemf: # membership function induction # table == sensory_tr; target variable == y; analytic variables == *; induction == nlr; output file == ./metainduction_num/sensory/02_output/nlr.sql.template; @classify: # inductive attribute fuzzification # classified table == sensory_tr; template file == ./metainduction_num/sensory/02_output/nlr.sql.template; output table == sensory_tr_nlr; @drop: table == attribute_selection; @evaluate: # ranking of correlation of fuzzified attribute values and target # table == sensory_tr_nlr; target variable == y; analytic variables == *; output file == metainduction_num/sensory/attreval.csv; result table == attribute_selection; 166 A.5. Curriculum Vitae Michael Alexander Kaufmann from Recherswil SO born on September 12th, 1978: 1993 – 1998: 1998 – 1999: 1999 – 2003: 2003 – 2005: 2005 – 2008: 2008 – 2011: 2011 – 2012: 2008 – 2012: College of Economics, Bern-Neufeld International Studies in Los Angeles, California Bachelor of Science in Computer Science, Fribourg Master of Science in Computer Science, Fribourg Data Warehouse Analyst at PostFinance, Bern Data Architect at Swiss Mobiliar Insurance, Bern Consultant at Nationale Suisse Insurance, Basel PhD in Computer Science, Fribourg A.6. Key Terms and Definitions IFC Inductive fuzzy classification: Assigning individuals to fuzzy sets for which membership function is generated from data so that the membership degrees provide support for an inductive inference. ITF Inductive target fuzzification: Transformation of a numerical target variable into a membership degree to a fuzzy class. MFI Membership function induction: Derivation of an inductive mapping from attribute values to membership degrees based on available data. IFCL Inductive fuzzy classification language: Software prototype implementing IFC. NLR Normalized likelihood ratio: A proposed formula for MFI. NLD Normalized likelihood difference: Another proposed formula for MFI. IAF Inductive attribute fuzzification: Transformation of attribute values into membership degrees using induced membership functions. Zadehan Variable A variable with a range of [0,1] representing truth values, analogous to Boolean variables with a range of {0,1}. Zadehan Logic A framework for reasoning with gradual truth values in the interval between 0 and 1. 167