سال انتشار: ۱۳۸۷

محل انتشار: دومین کنگره بین المللی علوم و فناوری نانو

تعداد صفحات: ۱

نویسنده(ها):

Eghbal Mansoori – Department of Computer Science and Engineering,School of Engineering, Shiraz University, Shiraz, Iran

چکیده:

In this paper, we have proposed a fuzzy rule-based classifier for assigning amino acid sequences into different superfamilies of proteins. While the most popular methods for protein classification rely on sequence alignment, our approach is alignmentfree. It is based on the distribution of contiguous patterns of n amino acids (n-grams) in the sequences. The proposed approach first extracts some features from a set of trainingsequences, then selects only 60 best of them using some feature ranking methods. The extracted features take into consideration the occurrence probabilities of n-grams in the sequences. Thereafter using these features, a novel steady-state genetic algorithm for extracting fuzzy classification rules from training data is used to generate a compact set of interpretable fuzzy rules. The generated rules are simple and human readable such that thebiologists can utilize them for classification purposes or even incorporating their expertise to interpret or modify them. To evaluate the performance of our fuzzy rule-based classifier, we have compared it with the conventional non-fuzzy decision tree-based C4.5 algorithm. This comparative study is conducted through classifying protein sequences of five classes of superfamilies downloaded from a public domain database. Results show that the generated fuzzy rules are more interpretable with acceptable improvement in classification accuracy.