國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,蛋白質折疊辨識之特徵選取演算法,An Effective Feature Selection for Protein Fold Recognition

論文名稱 Title	蛋白質折疊辨識之特徵選取演算法 An Effective Feature Selection for Protein Fold Recognition
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	96 學年度第 1 學期 The fall semester of Academic Year 96	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	57
研究生 Author	林俊雄 Jyun-syong Lin
指導教授 Advisor	楊昌彪 Chang-biau Yang
召集委員 Convenor	楊佳寧 Chia-ning Yang
口試委員 Advisory Committee	蕭學宏 Shyue-horng Shiau
口試日期 Date of Exam	2007-10-05	繳交日期 Date of Submission	2007-10-11
關鍵字 Keywords	支援向量機、折疊、特徵、蛋白質 support vector machine, fold, feature, protein
統計 Statistics	本論文已被瀏覽 5679 次，被下載 1589 次 The thesis/dissertation has been browsed 5679 times, has been downloaded 1589 times.

中文摘要
在生物物理學中，蛋白質折疊辨識問題是一項重要的課題，蛋白質一級結構有助於描繪出它的三維空間結構。假設有一條已知序列的蛋白質，蛋白質折疊辨識問題就是決定此蛋白質在蛋白質資料庫中是屬於哪一種折疊群，由於蛋白質資料庫中有超過兩種以上的折疊種類，所以此問題是一個多類分類的問題。最近，許多科學家使用類神經網路和支援向量機這兩種機器學習工具去解決此問題。在本論文中，我們是使用支援向量機這個分類工具。此論文目的在於找出有效的特徵來引導我們有效解決這個分類問題，我們先建立特徵優先選擇表(the feature preference table)幫助我們快速的找出有效的特徵組合，我們取用了SCOP蛋白質資料庫中的27種有名的折疊群當作我們的資料集，實驗結果顯示我們提出的方法達到61.4%的準確率，這是優於先前的方法(56%)，在相同的特徵組合下比較，我們的準確率也是高於先前的方法，這些實驗結果皆顯示我們提出的方法的確有效的解決蛋白質折疊辨識問題。
Abstract
The protein fold recognition problem is one of the important topics in biophysics. It is believed that the primary structure of a protein is helpful to drawing its three-dimensional (3D) structure. Given a target protein (a sequence of amino acids), the protein fold recognition problem is to decide which fold group of some protein structure database the target protein belongs to. Since more than two fold groups are to be located in this problem, it is a multi-class classification problem. Recently, many researchers have solved this problem by using the popular machine learning tools, such as neural networks (NN) and support vector machines (SVM). In this thesis, we use the SVM tool to solve this problem. Our strategy is to find out the effective features which can be used as an efficient guide to the classification problem. We build the feature preference table to help us to find out effective feature combinations quickly. We take 27 well-known fold groups in SCOP (Structural Classification of Proteins) as our data set. Our experimental results show that our method achieves the overall prediction accuracy of 61.4%, which is better than the previous method (56.5%). With the same feature combinations, our prediction accuracy is also higher than the previous results. These results show that our method is indeed effective for the fold recognition problem.

目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Amino Acid Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 The Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Soft Margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Multi-class Classi‾cation . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Structural Classi‾cation of Proteins . . . . . . . . . . . . . . . . . . . 16 Chapter 3. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 The Feature Preference Table . . . . . . . . . . . . . . . . . . . . . . 20 3.3 The Overview of Our Algorithm . . . . . . . . . . . . . . . . . . . . . 23 3.4 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 Accuracy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

參考文獻 References
[1] P. Baldi, S. Brunak, Y. Chauvin, C. Andersen, and H. Nielsen, Assessing the accuracy of prediction algorithms for classi‾cation: an overview," Bioinformat- ics, Vol. 16, No. 5, pp. 412{424, 2000. [2] A. Ben-Hur, D. Horn, H.T.Siegelmann, and V. Vapnik, Support vector clus- tering," Machine Learning, Vol. 2, pp. 125{137, 2001. [3] C. C. Chang and C. J. Lin, LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/»cjlin/libsvm. [4] C. Cortes and V. Vapnik, Support-vector networks," Machine Learning, Vol. 20, No. 3, pp. 273{297, 1995. [5] C. H. Q. Ding and I. Dubchak, Multi-class protein fold recognition using support vector machines and neural networks," Bioinformatics, Vol. 17, No. 4, pp. 349{358, 2001. [6] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, Prediction of protein folding class using global description of amino acid sequence," Proceedings of the National Academy of Sciences, Vol. 92, pp. 8700{8704, 1995. [7] I. Dubchak, I. Muchnik, C. Mayor, I. Dralyuk, and S. H. Kim, Recognition of a protein fold in the context of the scop classi‾cation," Proteins: Structure, Function, and Genetics, Vol. 35, No. 4, pp. 401{407, 1999. [8] J. Guo, H. Chen, Z. Sun, and Y. Lin, A novel method for protein secondary structure prediction using dual-layer svm and pro‾les," Proteins: Structure, Function, and Genetics, Vol. 54, No. 4, pp. 738{743, 2004. [9] C. W. Hsu, C. C. Chang, and C. J. Lin, A practical guide to support vec- tor classi‾cation." http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf, 2004. [10] C. W. Hsu and C. J. Lin, A comparison of methods for multiclass support vec- tor machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp. 415{ 425, 2002. [11] T. Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Sch

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內立即公開，校外一年後公開 off campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-1011107-054209.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS