論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
基於蛋白質種類狀態預測之雙硫鍵鍵結狀態之預測 Disulfide Bonding State Prediction with SVM Based on Protein Types |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
58 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2010-07-07 |
繳交日期 Date of Submission |
2010-08-18 |
關鍵字 Keywords |
預測、蛋白質、雙硫鍵 prediction, disulfide bond, protein |
||
統計 Statistics |
本論文已被瀏覽 5664 次,被下載 1216 次 The thesis/dissertation has been browsed 5664 times, has been downloaded 1216 times. |
中文摘要 |
在預測蛋白質的三維結構及功能上,雙硫鍵 (Disulfide bond) 扮演著關鍵的角色。本篇論文中提出了二個預測蛋白質中所有半胱氨酸 (Cysteine) 之氧化狀態的演算法。這些方法是基於使用多個支持向量機 (support vector machine)的多階段架構的演算法。第一各演算法在資料集PDB4136 上預測半胱氨酸之氧化狀態得到了94% 的準確度,但是在其他資料集的預測結果並不如預期。因此,設計第二各演算法提高同時擁有氧化及還原狀態之半胱氨酸的蛋白質的預測準確度。此外,更提出一個新的訓練策略來提升預測精度。這個訓練策略將支持向量機所找出的機率增加到目前現有的特徵中,然後開始一輪新的訓練,以此來得到更好的預測效能。所有實驗的資料集都是由眾所皆知資料庫所產生的,例如Protein Data Bank 及SWISS-PROT 等資料庫。結果在資料集PDB4136 上預測半胱氨酸之氧化狀態得到了94.3% 的準確度,比先前最好的結果90.7% 提高了3.6% 的預測精度。 |
Abstract |
Disulfide bonds play crucial roles to predict the three-dimensional structure and the function of a protein. This thesis develops two algorithms to predict the disulfide bonding state of each cysteine in a protein sequence. These methods are based on the multi-stage framework and the multi-classifier of the support vector machine (SVM). The first algorithm achieves 94.0% accuracy of cysteine state prediction for dataset PDB4136, but in some datasets the results are not as good as our expectation. Thus the second algorithm is designed to improve the predicting ability for the proteins which have oxidized and reduced cysteines simultaneously. In addition, a new training strategy is also developed to increase the prediction accuracy. It appends the probabilities which are obtained from the SVM to the existing features and then starts a new training procedure repeatedly to get better performance. The experiments are performed on the datasets derived from well-known databases, such as Protein Data Bank and SWISS-PROT. It gets 94.3% accuracy for predicting disulfide bonding state on dataset PDB4136, which gets improvement 3.6% compared with the previously best result 90.7%. |
目次 Table of Contents |
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Position Specific Scoring Matrix . . . . . . . . . . . . . . . . . . . . . 4 2.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Hidden Neural Networks . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Cysteine State Sequence . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 APTK and DISULFIND . . . . . . . . . . . . . . . . . . . . . 13 2.4 The Two-stage System . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 State Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 3. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Algorithm One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Algorithm Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Probability Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.1 Type Classification . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.2 Mix and State Classification . . . . . . . . . . . . . . . . . . . 27 3.4.3 Normalization of Features . . . . . . . . . . . . . . . . . . . . 28 Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 30 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 |
參考文獻 References |
[1] V. I. Abkevich and E. I. Shakhnovich, “What can disulfide bonds tell us about protein energetics, function and folding: Simulations and bioninformatics analysis,” Journal of Molecular Biology, Vol. 300, pp. 975–985, 2000. [2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipmanl, “Basic local alignment search tool,” Journal of Molecular Biology, Vol. 215, No. 3, pp. 403–410, 1990. [3] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, Vol. 25, No. 17, pp. 3389–3402, 1997. [4] M. K. Campbell and S. O. Farrell, Biochemistry. Thomson-Brooks/Cole, fourth ed., 2003. [5] A. Ceroni, P. Frasconi, A. Passerini, and A. Vullo, “Predicting the disulfide bonding state of cysteines with combinations of kernel machines,” Journal of VLSI Signal Processing, Vol. 35, pp. 287–295, 2003. [6] A. Ceroni, A. Passerini, A. Vullo, and P. Frasconi, “DISULFIND: a disulfide bonding state and cysteine connectivity prediction server,” Nucleic Acids Research, Vol. 34, pp. 177–181, 2006. [7] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,” 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [8] Y.-C. Chen, Y.-S. Lin, C.-J. Lin, and J.-K. Hwang, “Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences,” Proteins: Structure, Function, and Bioinformatics, Vol. 55, pp. 1036–1042, 2004. [9] J. Cheng, H. Saigo, and P. Baldi, “Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching,” Proteins: Structure, Function, and Bioinformatics, Vol. 62, pp. 617–629, 2006. [10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, “Relationship between protein structures and disulfide-bonding patterns,” Proteins: Structure, Function, and Bioinformatics, Vol. 53, pp. 1–5, 2003. [11] W.-C. Chung, “A multi-phase approach for disulfide bond prediction,” Master’s Thesis, Department of Computer Science and Engineering, National Sun Yat- Sen University, Kaohsiung, Taiwan, 2009. [12] W.-C. Chung, C.-B. Yang, and C.-Y. Hor, “An effective tuning method for cysteine state classification,” Proc. of National Computer Symposium, Workshop on Algorithms and Bioinformatics, Taipei, Taiwan, Nov. 27-28, 2009. [13] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. [14] K. Gurney and K. N. Gurney, An introduction to neural networks. MIT Press, 1995. [15] T. Joachims, Making large-scale support vector machine learning practical. MIT Press, 1999. [16] J. Kyte and R. F. Doolittle, “A simple method for displaying the hydropathic character of a protein,” Journal of Molecular Biology, Vol. 157, pp. 105–132, 1982. [17] P. L. Martelli, P. Fariselli, L. Malaguti, and R. Casadio, “Prediction of the disulfide-bonding state of cysteines in proteins at 88% accuracy,” Protein Science, Vol. 11, pp. 2735–2739, 2002. [18] J. Meiler, M. Muller, A. Zeidler, and F. Schmaschke, “Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks,” Journal of Molecular Modeling, Vol. 7, No. 9, pp. 360–369, 2001. [19] T. Noguchi, H. Matsuda, and Y. Akiyama, “PDB-REPRDB: a database of representative protein chains from the protein data bank (PDB),” Nucleic Acids Research, Vol. 29, No. 1, pp. 219–220, 2001. [20] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, Vol. 77, No. 2, pp. 257–286, 1989. [21] R. Rubinstein and A. Fiser, “Predicting disulfide bond connectivity in proteins by correlated mutations analysis,” Bioinformatics, Vol. 24, No. 4, pp. 498–504, 2008. [22] M. Vincent, A. Passerini, M. Labbe, and P. Frasconi, “A simplified approach to disulfide connectivity prediction from protein sequences,” BMC Bioinformatics, Vol. 9, No. 1, p. 20, 2008. [23] A. Vullo and P. Frasconi, “Disulfide connectivity prediction using recursive neural networks and evolutionary information,” Bioinformatics, Vol. 20, No. 5, pp. 653–659, 2004. [24] J. Zurada, Introduction to artificial neural systems. St. Paul, MN, USA: West Publishing Co., 1992. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |