Responsive image
博碩士論文 etd-1011107-054209 詳細資訊
Title page for etd-1011107-054209
論文名稱
Title
蛋白質折疊辨識之特徵選取演算法
An Effective Feature Selection for Protein Fold Recognition
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2007-10-05
繳交日期
Date of Submission
2007-10-11
關鍵字
Keywords
支援向量機、折疊、特徵、蛋白質
support vector machine, fold, feature, protein
統計
Statistics
本論文已被瀏覽 5679 次,被下載 1589
The thesis/dissertation has been browsed 5679 times, has been downloaded 1589 times.
中文摘要
在生物物理學中,蛋白質折疊辨識問題是一項重要的課題,蛋白質一級結構有助於描繪出它的三維空間結構。假設有一條已知序列的蛋白質,蛋白質折疊辨識問題就是決定此蛋白質在蛋白質資料庫中是屬於哪一種折疊群,由於蛋白質資料庫中有超過兩種以上的折疊種類,所以此問題是一個多類分類的問題。
最近,許多科學家使用類神經網路和支援向量機這兩種機器學習工具去解決此問題。在本論文中,我們是使用支援向量機這個分類工具。此論文目的在於找出有效的特徵來引導我們有效解決這個分類問題,我們先建立特徵優先選擇表(the feature preference table)幫助我們快速的找出有效的特徵組合,我們取用了SCOP蛋白質資料庫中的27種有名的折疊群當作我們的資料集,實驗結果顯示我們提出的方法達到61.4%的準確率,這是優於先前的方法(56%),在相同的特徵組合下比較,我們的準確率也是高於先前的方法,這些實驗結果皆顯示我們提出的方法的確有效的解決蛋白質折疊辨識問題。
Abstract
The protein fold recognition problem is one of the important topics in biophysics.
It is believed that the primary structure of a protein is helpful to drawing its three-dimensional (3D) structure.
Given a target protein (a sequence of amino acids), the
protein fold recognition problem is to decide which fold group
of some protein structure database the target protein belongs to.
Since more than two fold groups are to be located in this problem, it
is a multi-class classification problem.
Recently, many researchers have solved this problem by using the
popular machine learning tools, such as neural networks (NN) and support
vector machines (SVM). In this thesis, we use the SVM tool to solve this
problem. Our strategy is to find out the effective features which
can be used as an efficient guide to the classification problem.
We build the feature preference table to
help us to find out effective feature combinations quickly.
We take 27 well-known fold groups
in SCOP (Structural Classification of Proteins) as our data set. Our
experimental results show that our method achieves the overall prediction
accuracy of 61.4%, which is better than the previous method (56.5%).
With the same feature combinations, our prediction accuracy is also
higher than the previous results. These results show that our method
is indeed effective for the fold recognition problem.
目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Amino Acid Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Soft Margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Multi-class Classi‾cation . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Structural Classi‾cation of Proteins . . . . . . . . . . . . . . . . . . . 16
Chapter 3. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 The Feature Preference Table . . . . . . . . . . . . . . . . . . . . . . 20
3.3 The Overview of Our Algorithm . . . . . . . . . . . . . . . . . . . . . 23
3.4 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Accuracy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
參考文獻 References
[1] P. Baldi, S. Brunak, Y. Chauvin, C. Andersen, and H. Nielsen, Assessing the
accuracy of prediction algorithms for classi‾cation: an overview," Bioinformat-
ics, Vol. 16, No. 5, pp. 412{424, 2000.
[2] A. Ben-Hur, D. Horn, H.T.Siegelmann, and V. Vapnik, Support vector clus-
tering," Machine Learning, Vol. 2, pp. 125{137, 2001.
[3] C. C. Chang and C. J. Lin, LIBSVM: a library for support vector machines,
2001. Software available at http://www.csie.ntu.edu.tw/»cjlin/libsvm.
[4] C. Cortes and V. Vapnik, Support-vector networks," Machine Learning,
Vol. 20, No. 3, pp. 273{297, 1995.
[5] C. H. Q. Ding and I. Dubchak, Multi-class protein fold recognition using
support vector machines and neural networks," Bioinformatics, Vol. 17, No. 4,
pp. 349{358, 2001.
[6] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, Prediction of protein
folding class using global description of amino acid sequence," Proceedings of
the National Academy of Sciences, Vol. 92, pp. 8700{8704, 1995.
[7] I. Dubchak, I. Muchnik, C. Mayor, I. Dralyuk, and S. H. Kim, Recognition
of a protein fold in the context of the scop classi‾cation," Proteins: Structure,
Function, and Genetics, Vol. 35, No. 4, pp. 401{407, 1999.
[8] J. Guo, H. Chen, Z. Sun, and Y. Lin, A novel method for protein secondary
structure prediction using dual-layer svm and pro‾les," Proteins: Structure,
Function, and Genetics, Vol. 54, No. 4, pp. 738{743, 2004.
[9] C. W. Hsu, C. C. Chang, and C. J. Lin, A practical guide to support vec-
tor classi‾cation." http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf,
2004.
[10] C. W. Hsu and C. J. Lin, A comparison of methods for multiclass support vec-
tor machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp. 415{
425, 2002.
[11] T. Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel
Methods - Support Vector Learning, B. Sch
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code