Παρασκευή 25 Ιανουαρίου 2019

A new rule-based knowledge extraction approach for imbalanced datasets

Abstract

Classification consists of extracting a classifier from large datasets. A dataset is imbalanced if it contains more instances in one class compared to the others. An imbalanced dataset contains majority instances and minority ones. It is worth noting that classical learning algorithms have a bias toward majority instances. If classification is applied to imbalanced datasets, it is called partial classification. Its approaches are generally based on sampling methods or algorithmic methods. In this paper, we propose a new hybrid approach using a three-phase-rule-based extraction process. Initially, the first classifier is extracted; it contains classification rules representing only majority instances. Then, we delete the majority instances, which are well classified by these rules, to produce a balanced dataset. The deleted majority instances are replaced by the extracted classification rules, which prevent any information loss. Subsequently, our algorithm is applied to the obtained balanced dataset to produce the second classifier which contains rules that represent both majority and minority instances. Finally, we add the rules of the first classifier to the second classifier to obtain the final classifier, which will be post-processed. Our approach has been tested on several imbalanced binary datasets. The obtained results show its efficiency compared to other results.



http://bit.ly/2S56fpo

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου