سال انتشار: ۱۳۹۱

محل انتشار: بیستمین کنفرانس مهندسی برق ایران

تعداد صفحات: ۶

نویسنده(ها):

Adel Ghazikhani – PhD student, Ferdowsi University of Mashhad and Lecturer at Imam Reza University Mashhad
Reza Monsefi – Assistant professor, Computer Engineering Department, Ferdowsi University of Mashhad
Hadi Sadoghi Yazdi – Associate professor, Computer Engineering Department, Ferdowsi University of Mashhad

چکیده:

We propose a novel algorithm for handling class imbalance in the k-NN classifier. Class imbalance is a problem occurring in some valuable data such as medical diagnosis,fraud detection, oil spills and etc. The problem influences all supervised classification algorithms therefore a large amount ofresearch is being done. We tackle the problem by preprocessing the data using oversampling techniques. A two phase algorithm, based on Support Vector Data Description (SVDD) is proposed.SVDD is a tool for data description. In our approach we firstly describe data from the minority class i.e. the class with lessdata using SVDD. This is followed by oversampling of the support vectors, which is suitable for k-NN. We evaluate ourmethod using real world datasets with different imbalance ratios and compare it with four other oversampling methods namely SMOTE, Borderline SMOTE, random oversampling and cluster based sampling. The results show that the proposed algorithm is a suitable preprocessing method for the k-NN classifier