Responsive image
博碩士論文 etd-0818110-174218 詳細資訊
Title page for etd-0818110-174218
論文名稱
Title
基於蛋白質種類狀態預測之雙硫鍵鍵結狀態之預測
Disulfide Bonding State Prediction with SVM Based on Protein Types
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
58
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2010-07-07
繳交日期
Date of Submission
2010-08-18
關鍵字
Keywords
預測、蛋白質、雙硫鍵
prediction, disulfide bond, protein
統計
Statistics
本論文已被瀏覽 5664 次,被下載 1216
The thesis/dissertation has been browsed 5664 times, has been downloaded 1216 times.
中文摘要
在預測蛋白質的三維結構及功能上,雙硫鍵 (Disulfide
bond) 扮演著關鍵的角色。本篇論文中提出了二個預測蛋白質中所有半胱氨酸 (Cysteine) 之氧化狀態的演算法。這些方法是基於使用多個支持向量機 (support vector machine)的多階段架構的演算法。第一各演算法在資料集PDB4136 上預測半胱氨酸之氧化狀態得到了94% 的準確度,但是在其他資料集的預測結果並不如預期。因此,設計第二各演算法提高同時擁有氧化及還原狀態之半胱氨酸的蛋白質的預測準確度。此外,更提出一個新的訓練策略來提升預測精度。這個訓練策略將支持向量機所找出的機率增加到目前現有的特徵中,然後開始一輪新的訓練,以此來得到更好的預測效能。所有實驗的資料集都是由眾所皆知資料庫所產生的,例如Protein Data Bank 及SWISS-PROT 等資料庫。結果在資料集PDB4136 上預測半胱氨酸之氧化狀態得到了94.3% 的準確度,比先前最好的結果90.7% 提高了3.6% 的預測精度。
Abstract
Disulfide bonds play crucial roles to predict the three-dimensional structure and the function of a protein. This thesis develops two algorithms to predict the disulfide bonding state of each cysteine in a protein sequence. These methods are based on the multi-stage framework and the multi-classifier of the support vector machine (SVM). The first algorithm achieves 94.0% accuracy of cysteine state prediction for dataset PDB4136, but in some datasets the results are not as good as our expectation. Thus the second algorithm is designed to improve the predicting ability for the proteins which have oxidized and reduced cysteines simultaneously. In addition,
a new training strategy is also developed to increase the prediction accuracy. It appends the probabilities which are obtained from the SVM to the existing features and then starts a new training procedure repeatedly to get better performance. The experiments are performed on the datasets derived from well-known databases, such as Protein Data Bank and SWISS-PROT. It gets 94.3% accuracy for predicting disulfide bonding state on dataset PDB4136, which gets improvement 3.6% compared with the previously best result 90.7%.
目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Position Specific Scoring Matrix . . . . . . . . . . . . . . . . . . . . . 4
2.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Hidden Neural Networks . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Cysteine State Sequence . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 APTK and DISULFIND . . . . . . . . . . . . . . . . . . . . . 13
2.4 The Two-stage System . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 State Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 3. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Algorithm One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Algorithm Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Probability Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 Type Classification . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Mix and State Classification . . . . . . . . . . . . . . . . . . . 27
3.4.3 Normalization of Features . . . . . . . . . . . . . . . . . . . . 28
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
參考文獻 References
[1] V. I. Abkevich and E. I. Shakhnovich, “What can disulfide bonds tell us about
protein energetics, function and folding: Simulations and bioninformatics analysis,”
Journal of Molecular Biology, Vol. 300, pp. 975–985, 2000.
[2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipmanl, “Basic
local alignment search tool,” Journal of Molecular Biology, Vol. 215, No. 3,
pp. 403–410, 1990.
[3] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller,
and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs,” Nucleic Acids Research, Vol. 25, No. 17,
pp. 3389–3402, 1997.
[4] M. K. Campbell and S. O. Farrell, Biochemistry. Thomson-Brooks/Cole,
fourth ed., 2003.
[5] A. Ceroni, P. Frasconi, A. Passerini, and A. Vullo, “Predicting the disulfide
bonding state of cysteines with combinations of kernel machines,” Journal of
VLSI Signal Processing, Vol. 35, pp. 287–295, 2003.
[6] A. Ceroni, A. Passerini, A. Vullo, and P. Frasconi, “DISULFIND: a disulfide
bonding state and cysteine connectivity prediction server,” Nucleic Acids Research,
Vol. 34, pp. 177–181, 2006.
[7] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,”
2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[8] Y.-C. Chen, Y.-S. Lin, C.-J. Lin, and J.-K. Hwang, “Prediction of the bonding
states of cysteines using the support vector machines based on multiple feature
vectors and cysteine state sequences,” Proteins: Structure, Function, and
Bioinformatics, Vol. 55, pp. 1036–1042, 2004.
[9] J. Cheng, H. Saigo, and P. Baldi, “Large-scale prediction of disulphide bridges
using kernel methods, two-dimensional recursive neural networks, and weighted
graph matching,” Proteins: Structure, Function, and Bioinformatics, Vol. 62,
pp. 617–629, 2006.
[10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, “Relationship
between protein structures and disulfide-bonding patterns,” Proteins:
Structure, Function, and Bioinformatics, Vol. 53, pp. 1–5, 2003.
[11] W.-C. Chung, “A multi-phase approach for disulfide bond prediction,” Master’s
Thesis, Department of Computer Science and Engineering, National Sun Yat-
Sen University, Kaohsiung, Taiwan, 2009.
[12] W.-C. Chung, C.-B. Yang, and C.-Y. Hor, “An effective tuning method for cysteine
state classification,” Proc. of National Computer Symposium, Workshop
on Algorithms and Bioinformatics, Taipei, Taiwan, Nov. 27-28, 2009.
[13] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines
and other kernel-based learning methods. Cambridge University Press,
2000.
[14] K. Gurney and K. N. Gurney, An introduction to neural networks. MIT Press,
1995.
[15] T. Joachims, Making large-scale support vector machine learning practical. MIT
Press, 1999.
[16] J. Kyte and R. F. Doolittle, “A simple method for displaying the hydropathic
character of a protein,” Journal of Molecular Biology, Vol. 157, pp. 105–132,
1982.
[17] P. L. Martelli, P. Fariselli, L. Malaguti, and R. Casadio, “Prediction of the
disulfide-bonding state of cysteines in proteins at 88% accuracy,” Protein Science,
Vol. 11, pp. 2735–2739, 2002.
[18] J. Meiler, M. Muller, A. Zeidler, and F. Schmaschke, “Generation and evaluation
of dimension-reduced amino acid parameter representations by artificial
neural networks,” Journal of Molecular Modeling, Vol. 7, No. 9, pp. 360–369,
2001.
[19] T. Noguchi, H. Matsuda, and Y. Akiyama, “PDB-REPRDB: a database of
representative protein chains from the protein data bank (PDB),” Nucleic Acids
Research, Vol. 29, No. 1, pp. 219–220, 2001.
[20] L. R. Rabiner, “A tutorial on hidden markov models and selected applications
in speech recognition,” Proceedings of the IEEE, Vol. 77, No. 2, pp. 257–286,
1989.
[21] R. Rubinstein and A. Fiser, “Predicting disulfide bond connectivity in proteins
by correlated mutations analysis,” Bioinformatics, Vol. 24, No. 4, pp. 498–504,
2008.
[22] M. Vincent, A. Passerini, M. Labbe, and P. Frasconi, “A simplified approach to
disulfide connectivity prediction from protein sequences,” BMC Bioinformatics,
Vol. 9, No. 1, p. 20, 2008.
[23] A. Vullo and P. Frasconi, “Disulfide connectivity prediction using recursive
neural networks and evolutionary information,” Bioinformatics, Vol. 20, No. 5,
pp. 653–659, 2004.
[24] J. Zurada, Introduction to artificial neural systems. St. Paul, MN, USA: West
Publishing Co., 1992.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code