Responsive image
博碩士論文 etd-0912112-141458 詳細資訊
Title page for etd-0912112-141458
論文名稱
Title
利用支持向量機與行為知識空間之雙硫鍵連結預測方法
The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
64
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-09-07
繳交日期
Date of Submission
2012-09-12
關鍵字
Keywords
行為知識空間、連結模式、支持向量機、半胱胺酸、雙硫鍵
behavior knowledge space, support vector machine, connectivity pattern, disulfide bond, cysteine
統計
Statistics
本論文已被瀏覽 5669 次,被下載 725
The thesis/dissertation has been browsed 5669 times, has been downloaded 725 times.
中文摘要
在蛋白質中,雙硫鍵是一種由兩個半胱胺酸氧化所形成的單共價鍵,它在蛋白質的折疊和結構穩定性上扮演了重要的角色,並且可以調節蛋白質的功能。雙硫鍵的連結預測問題困難點在於所有可能的連結模式數量會隨著半胱胺酸的數量增加而劇增。我們在許多方法中發現了一些可區別出具有高準確率的連結模式的條件規則,我們實作了多個基於支持向量機的方法,並且運用行為知識空間來融合這些分類器。為了與前人的方法比較,我們採用了SP39這個資料集並搭配4次交叉驗證來評估我們混合方法的效能;我們提升整體準確率達到71.5%,相較於前人最好的準確率65.9%有了顯著的改善。
Abstract
The disulfide bond in a protein is a single covalent bond formed from the oxidation of two cysteines. It plays an important role in the folding and structure stability, and may regulate protein functions. The connectivity prediction problem is difficult because the number of possible patterns grows rapidly with respect to the number of cysteines. We discover some rules to discriminate the patterns with high accuracy in many methods. We implement multiple SVM methods, and utilize the BKS to fuse these classifiers. We apply the hybrid method to SP39 dataset with 4-fold cross-validation for the comparison with the previous works. We raise the accuracy to 71.5%, which improves significantly that of the best previous work, 65.9%.
目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Amino Acids and Proteins . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Position-Specific Scoring Matrix . . . . . . . . . . . . . . . . . . . . . 8
2.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Behavior Knowledge Space . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 Zhao’s Method by Using Cysteine Separations Profiles . . . . 12
2.5.2 Chen’s Method by Using a Two-Level Model . . . . . . . . . . 13
2.5.3 Lu’s Method by Using SVM with Feature Selection by GA . . 14
2.5.4 Chen’s Method with Sequence Alignment Method and Machine Learning Method . . . . . . . . . . . . . . . . . . . . . . 15
2.5.5 Wang’s Method with Hybrid Models . . . . . . . . . . . . . . 16
Chapter 3. Our Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Statistics and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 The Ratio of Disulfide Proteins in PDB . . . . . . . . . . . . . 19
3.2.2 The Number of Real Patterns . . . . . . . . . . . . . . . . . . 19
3.3 The Sequence Alignment Method . . . . . . . . . . . . . . . . . . . . 20
3.4 The SVM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 CP1 Representation with 512 Features . . . . . . . . . . . . . 29
3.4.2 CP1 Representation with 623 Features . . . . . . . . . . . . . 30
3.4.3 CP2 Representation with 1046 Features . . . . . . . . . . . . . 31
3.5 The Behavior Knowledge Space Method . . . . . . . . . . . . . . . . 31
3.6 Our Hybrid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 36
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 The Sequence Alignment Method . . . . . . . . . . . . . . . . . . . . 37
4.4 The SVM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 The Behavior Knowledge Space Method . . . . . . . . . . . . . . . . 39
4.6 Our Hybrid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
參考文獻 References
[1] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller,
and D. J. Lipman, “Gapped blast and psi-blast: a new generation of protein
database search programs,” Nucleic Acids Research, Vol. 25, No. 17, pp. 3389–
3402, 1997.
[2] P. Baldi, J. Cheng, and A. Vullo, “Large-scale prediction of disulphide bond
connectivity,” Advances in Neural Information Processing Systems 17, Cambridge, MA, pp. 97–104, MIT Press, 2005.
[3] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,”
2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[4] B.-J. Chen, C.-H. Tsai, C. hsiung Chan, and C.-Y. Kao, “Disulfide connectivity
prediction with 70% accuracy using two-level models,” PROTEINS: Structure,
Function, and Genetics, Vol. 64, pp. 246–252, 2006.
[5] G. Chen, H. Deng, Y. Gui, Y. Pan, and X. Wang, “Cysteine separations profiles on protein secondary structure infer disulfide connectivity,” 2006 IEEE
International Conference on Granular Computing, pp. 663–665, May 2006.
[6] Y.-C. Chen, Prediction of Disulfide Connectivity from Protein Sequences. Ph.
D. dissertation, National Chiao Tung University, Hsinchu, Taiwan, 2007.
[7] Y.-C. Chen and J.-K. Hwang, “Prediction of disulfide connectivity from protein
sequences,” PROTEINS: Structure, Function, and Genetics, Vol. 61, pp. 507–
512, 2005.
[8] Y.-C. Chen, Y.-S. Lin, C.-J. Lin, and J.-K. Hwang, “Prediction of the bonding
states of cysteines using the support vector machines based on multiple feature
vectors and cysteine state sequences,” PROTEINS: Structure, Function, and
Genetics, Vol. 55, pp. 1036–1042, 2004.
[9] J. Cheng, H. Saigo, and P. Baldi, “Large-scale prediction of disulphide bridges
using kernel methods, two-dimensional recursive neural networks, and weighted
graph matching,” PROTEINS: Structure, Function, and Genetics, Vol. 62,
pp. 617–629, 2006.
[10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, “Relationship between protein structures and disulfide-bonding patterns,” PROTEINS:
Structure, Function, and Genetics, Vol. 53, pp. 1–5, 2003.
[11] W.-C. Chung, C.-B. Yang, and C.-Y. Hor, “An effective tuning method for cysteine state classification,” Proc. of National Computer Symposium, Workshop
on Algorithms and Bioinformatics, Taipei, Taiwan, Nov. 27-28, 2009.
[12] M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt, “A model of evolutionary
change in proteins,” Atlas of Protein Sequence and Structure (M. O. Dayhoff,
ed.), pp. 345–352, Nat. Biomed. Research Foundation, 1978.
[13] P. Fariselli, P. Riccobelli, and R. Casadio, “Role of evolutionary information
in predicting the disulfide-bonding state of cysteine in proteins,” PROTEINS:
Structure, Function, and Genetics, Vol. 36, pp. 340–346, 1999.
[14] F. Ferre and P. Clote, “Disulfide connectivity prediction using secondary structure information and diresidue frequencies,” Bioinformatics, Vol. 21, No. 10,
pp. 2336–2346, 2005.
[15] P. Frasconi, A. Passerini, and A. Vullo, “A two-stage svm architecture for
predicting the disulfide bonding state of cysteines,” Proceedings of the IEEE
Workshop on Neural Networks for Signal Processing, pp. 25–34, 2002.
[16] G. H. Gonnet, M. A. Cohen, and S. A. Benner, “Exhaustive matching of the
entire protein sequence database,” Science, Vol. 256, pp. 1443–1445, 1992.
[17] P. M. Harrison and M. J. E. Sternberg, “Analysis and classification of disulphide connectivity in proteins : The entropic effect of cross-linkage,” Journal
of Molecular Biology, Vol. 244, No. 4, pp. 448–463, 1994.
[18] S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein
blocks,” Proceedings of the National Academy of Sciences of the United States
of America, Vol. 89, No. 22, pp. 10915–10919, 1992.
[19] D. T. Jones, “Protein secondary structure prediction based on position-specific
scoring matrices,” Journal of Molecular Biology, Vol. 292, No. 2, pp. 195–202,
1999.
[20] W. Kabsch and C. Sander, “Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features,” Biopolymers,
Vol. 22, pp. 2577–2637, 1983.
[21] J. R. G. L., A. P. Shilton, M. M. Parker, and M. Palaniswami, “Prediction
of cystine connectivity using svm,” Bioinformation, Vol. 1, No. 2, pp. 69–74,
2005.
[22] H.-L. Liu and S.-C. Chen, “Prediction of disulfide connectivity in proteins with
support vector machine,” Journal of the Chinese Institute of Chemical Engineers, Vol. 38, No. 1, pp. 63–70, 2007.
[23] C.-H. Lu, Y.-C. Chen, C.-S. Yu, and J.-K. Hwang, “Predicting disulfide connectivity patterns,” PROTEINS: Structure, Function, and Genetics, Vol. 67,
pp. 262–270, 2007.
[24] P. L. Martelli, P. Fariselli, L. Malaguti, and R. Casadio, “Prediction of the
disulfide-bonding state of cysteines in proteins at 88% accuracy,” Protein Science, Vol. 11, pp. 2735–2739, 2002.
[25] L. A. Mirny and E. I. Shakhnovich, “How to derive a protein folding potential?
a new approach to an old problem,” Journal of Molecular Biology, Vol. 264,
No. 5, pp. 1164–1179, 1996.
[26] A. Moustafa, “JAligner: Open source java implementation of smith-waterman,”
2005. Software available at http://jaligner.sourceforge.net.
[27] S. Raudys and F. Roli, “The behavior knowledge space fusion method: Analysis
of generalization error and strategies for performance improvement,” In Proc.
Int. Workshop on Multiple Classifier Systems (LNCS 2709, pp. 55–64, Springer,
2003.
[28] R. Rubinstein and A. Fiser, “Predicting disulfide bond connectivity in proteins
by correlated mutations analysis,” Bioinformatics, Vol. 24, No. 4, pp. 498–504,
2008.
[29] J. Song, Z. Yuan, H. Tan, T. Huber, and K. Burrage, “Predicting disulfide
connectivity from protein sequence using multiple sequence feature vectors and
secondary structure,” Bioinformatics, Vol. 23, No. 23, pp. 3147–3154, 2007.
[30] C.-H. Tsai, B.-J. Chen, C.-H. Chan, H.-L. Liu, and C.-Y. Kao, “Improving
disulfide connectivity prediction with sequential distance between oxidized cysteines,” Bioinformatics, Vol. 21, No. 24, pp. 4416–4419, 2005.
[31] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1999.
[32] M. Vincent, A. Passerini, M. Labbe, and P. Frasconi, “A simplified approach to
disulfide connectivity prediction from protein sequences,” BMC Bioinformatics,
Vol. 9, No. 1, p. 20, 2008.
[33] A. Vullo and P. Frasconi, “Disulfide connectivity prediction using recursive
neural networks and evolutionary information,” Bioinformatics, Vol. 20, No. 5,
pp. 653–659, 2004.
[34] C.-J. Wang, C.-B. Yang, C.-Y. Hor, and K.-T. Tseng, “Disulfide bond prediction with hybrid models,” Proc. of the 2012 International Conference on
Computing and Security (ICCS12), July 2012.
[35] E. Zhao, H.-L. Liu, C.-H. Tsai, H.-K. Tsai, C.-H. Chan, and C.-Y. Kao, “Cysteine separations profiles on protein sequences infer disulfide connectivity,”
Bioinformatics, Vol. 21, No. 8, pp. 1415–1420, 2005.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code