Fahd Nasir A. Alwesabi †, and Ahmed Sultan Al-Hegami ††
† University of science and technology, Computer science & information technology department-Sana'a - Yemen
†† Assistant professor of Artificial Intelligence and Intelligent Information Systems, Sana’a University, Yemen
Abstract
Data mining deals with the problem of discovering novel and interesting knowledge from large amount of data. This problem is often performed heuristically when the extraction of patterns is difficult using standard query mechanisms or classical statistical methods. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this study, we pushed the novelty measure into a genetic algorithm to form constraints to the algorithm to discover only novel and hence interesting patterns. The proposed approach has a flexible chromosome encoding technique that uses Bayesian theorem where each chromosome corresponds to a classification rule. The proposed approach makes use of a hybrid approach that uses objective and subjective measures to quantify novelty of rules during the discovery process in terms of their deviations from the known rules. We experiment the proposed framework with some public dataset and tested using real life applications. The experimental results are quite promising.
Keywords:
Data mining, KDD, Classification, Genetic algorithm, Interestingness, Rule Discovery, Novelty Measures.