國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,利用支持向量機與行為知識空間之雙硫鍵連結預測方法 ,The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space

論文名稱 Title	利用支持向量機與行為知識空間之雙硫鍵連結預測方法 The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	101 學年度第 1 學期 The fall semester of Academic Year 101	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	64
研究生 Author	陳泓宇 Hong-Yu Chen
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	楊佳寧 Chia-Ning Yang
口試委員 Advisory Committee	曾國尊, 薛佑玲, 陳世中 Kuo-Tsung Tseng; Yow-Ling Shiue; Shih-Chung Chen
口試日期 Date of Exam	2012-09-07	繳交日期 Date of Submission	2012-09-12
關鍵字 Keywords	行為知識空間、連結模式、支持向量機、半胱胺酸、雙硫鍵 behavior knowledge space, support vector machine, connectivity pattern, disulfide bond, cysteine
統計 Statistics	本論文已被瀏覽 5669 次，被下載 725 次 The thesis/dissertation has been browsed 5669 times, has been downloaded 725 times.

中文摘要
在蛋白質中，雙硫鍵是一種由兩個半胱胺酸氧化所形成的單共價鍵，它在蛋白質的折疊和結構穩定性上扮演了重要的角色，並且可以調節蛋白質的功能。雙硫鍵的連結預測問題困難點在於所有可能的連結模式數量會隨著半胱胺酸的數量增加而劇增。我們在許多方法中發現了一些可區別出具有高準確率的連結模式的條件規則，我們實作了多個基於支持向量機的方法，並且運用行為知識空間來融合這些分類器。為了與前人的方法比較，我們採用了SP39這個資料集並搭配4次交叉驗證來評估我們混合方法的效能；我們提升整體準確率達到71.5%，相較於前人最好的準確率65.9%有了顯著的改善。
Abstract
The disulfide bond in a protein is a single covalent bond formed from the oxidation of two cysteines. It plays an important role in the folding and structure stability, and may regulate protein functions. The connectivity prediction problem is difficult because the number of possible patterns grows rapidly with respect to the number of cysteines. We discover some rules to discriminate the patterns with high accuracy in many methods. We implement multiple SVM methods, and utilize the BKS to fuse these classifiers. We apply the hybrid method to SP39 dataset with 4-fold cross-validation for the comparison with the previous works. We raise the accuracy to 71.5%, which improves significantly that of the best previous work, 65.9%.

目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Amino Acids and Proteins . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Position-Specific Scoring Matrix . . . . . . . . . . . . . . . . . . . . . 8 2.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Behavior Knowledge Space . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5.1 Zhao’s Method by Using Cysteine Separations Profiles . . . . 12 2.5.2 Chen’s Method by Using a Two-Level Model . . . . . . . . . . 13 2.5.3 Lu’s Method by Using SVM with Feature Selection by GA . . 14 2.5.4 Chen’s Method with Sequence Alignment Method and Machine Learning Method . . . . . . . . . . . . . . . . . . . . . . 15 2.5.5 Wang’s Method with Hybrid Models . . . . . . . . . . . . . . 16 Chapter 3. Our Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Statistics and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 The Ratio of Disulfide Proteins in PDB . . . . . . . . . . . . . 19 3.2.2 The Number of Real Patterns . . . . . . . . . . . . . . . . . . 19 3.3 The Sequence Alignment Method . . . . . . . . . . . . . . . . . . . . 20 3.4 The SVM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4.1 CP1 Representation with 512 Features . . . . . . . . . . . . . 29 3.4.2 CP1 Representation with 623 Features . . . . . . . . . . . . . 30 3.4.3 CP2 Representation with 1046 Features . . . . . . . . . . . . . 31 3.5 The Behavior Knowledge Space Method . . . . . . . . . . . . . . . . 31 3.6 Our Hybrid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 36 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 The Sequence Alignment Method . . . . . . . . . . . . . . . . . . . . 37 4.4 The SVM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5 The Behavior Knowledge Space Method . . . . . . . . . . . . . . . . 39 4.6 Our Hybrid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

參考文獻 References
[1] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped blast and psi-blast: a new generation of protein database search programs,” Nucleic Acids Research, Vol. 25, No. 17, pp. 3389– 3402, 1997. [2] P. Baldi, J. Cheng, and A. Vullo, “Large-scale prediction of disulphide bond connectivity,” Advances in Neural Information Processing Systems 17, Cambridge, MA, pp. 97–104, MIT Press, 2005. [3] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [4] B.-J. Chen, C.-H. Tsai, C. hsiung Chan, and C.-Y. Kao, “Disulfide connectivity prediction with 70% accuracy using two-level models,” PROTEINS: Structure, Function, and Genetics, Vol. 64, pp. 246–252, 2006. [5] G. Chen, H. Deng, Y. Gui, Y. Pan, and X. Wang, “Cysteine separations profiles on protein secondary structure infer disulfide connectivity,” 2006 IEEE International Conference on Granular Computing, pp. 663–665, May 2006. [6] Y.-C. Chen, Prediction of Disulfide Connectivity from Protein Sequences. Ph. D. dissertation, National Chiao Tung University, Hsinchu, Taiwan, 2007. [7] Y.-C. Chen and J.-K. Hwang, “Prediction of disulfide connectivity from protein sequences,” PROTEINS: Structure, Function, and Genetics, Vol. 61, pp. 507– 512, 2005. [8] Y.-C. Chen, Y.-S. Lin, C.-J. Lin, and J.-K. Hwang, “Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences,” PROTEINS: Structure, Function, and Genetics, Vol. 55, pp. 1036–1042, 2004. [9] J. Cheng, H. Saigo, and P. Baldi, “Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching,” PROTEINS: Structure, Function, and Genetics, Vol. 62, pp. 617–629, 2006. [10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, “Relationship between protein structures and disulfide-bonding patterns,” PROTEINS: Structure, Function, and Genetics, Vol. 53, pp. 1–5, 2003. [11] W.-C. Chung, C.-B. Yang, and C.-Y. Hor, “An effective tuning method for cysteine state classification,” Proc. of National Computer Symposium, Workshop on Algorithms and Bioinformatics, Taipei, Taiwan, Nov. 27-28, 2009. [12] M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt, “A model of evolutionary change in proteins,” Atlas of Protein Sequence and Structure (M. O. Dayhoff, ed.), pp. 345–352, Nat. Biomed. Research Foundation, 1978. [13] P. Fariselli, P. Riccobelli, and R. Casadio, “Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins,” PROTEINS: Structure, Function, and Genetics, Vol. 36, pp. 340–346, 1999. [14] F. Ferre and P. Clote, “Disulfide connectivity prediction using secondary structure information and diresidue frequencies,” Bioinformatics, Vol. 21, No. 10, pp. 2336–2346, 2005. [15] P. Frasconi, A. Passerini, and A. Vullo, “A two-stage svm architecture for predicting the disulfide bonding state of cysteines,” Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pp. 25–34, 2002. [16] G. H. Gonnet, M. A. Cohen, and S. A. Benner, “Exhaustive matching of the entire protein sequence database,” Science, Vol. 256, pp. 1443–1445, 1992. [17] P. M. Harrison and M. J. E. Sternberg, “Analysis and classification of disulphide connectivity in proteins : The entropic effect of cross-linkage,” Journal of Molecular Biology, Vol. 244, No. 4, pp. 448–463, 1994. [18] S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 89, No. 22, pp. 10915–10919, 1992. [19] D. T. Jones, “Protein secondary structure prediction based on position-specific scoring matrices,” Journal of Molecular Biology, Vol. 292, No. 2, pp. 195–202, 1999. [20] W. Kabsch and C. Sander, “Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features,” Biopolymers, Vol. 22, pp. 2577–2637, 1983. [21] J. R. G. L., A. P. Shilton, M. M. Parker, and M. Palaniswami, “Prediction of cystine connectivity using svm,” Bioinformation, Vol. 1, No. 2, pp. 69–74, 2005. [22] H.-L. Liu and S.-C. Chen, “Prediction of disulfide connectivity in proteins with support vector machine,” Journal of the Chinese Institute of Chemical Engineers, Vol. 38, No. 1, pp. 63–70, 2007. [23] C.-H. Lu, Y.-C. Chen, C.-S. Yu, and J.-K. Hwang, “Predicting disulfide connectivity patterns,” PROTEINS: Structure, Function, and Genetics, Vol. 67, pp. 262–270, 2007. [24] P. L. Martelli, P. Fariselli, L. Malaguti, and R. Casadio, “Prediction of the disulfide-bonding state of cysteines in proteins at 88% accuracy,” Protein Science, Vol. 11, pp. 2735–2739, 2002. [25] L. A. Mirny and E. I. Shakhnovich, “How to derive a protein folding potential? a new approach to an old problem,” Journal of Molecular Biology, Vol. 264, No. 5, pp. 1164–1179, 1996. [26] A. Moustafa, “JAligner: Open source java implementation of smith-waterman,” 2005. Software available at http://jaligner.sourceforge.net. [27] S. Raudys and F. Roli, “The behavior knowledge space fusion method: Analysis of generalization error and strategies for performance improvement,” In Proc. Int. Workshop on Multiple Classifier Systems (LNCS 2709, pp. 55–64, Springer, 2003. [28] R. Rubinstein and A. Fiser, “Predicting disulfide bond connectivity in proteins by correlated mutations analysis,” Bioinformatics, Vol. 24, No. 4, pp. 498–504, 2008. [29] J. Song, Z. Yuan, H. Tan, T. Huber, and K. Burrage, “Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure,” Bioinformatics, Vol. 23, No. 23, pp. 3147–3154, 2007. [30] C.-H. Tsai, B.-J. Chen, C.-H. Chan, H.-L. Liu, and C.-Y. Kao, “Improving disulfide connectivity prediction with sequential distance between oxidized cysteines,” Bioinformatics, Vol. 21, No. 24, pp. 4416–4419, 2005. [31] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1999. [32] M. Vincent, A. Passerini, M. Labbe, and P. Frasconi, “A simplified approach to disulfide connectivity prediction from protein sequences,” BMC Bioinformatics, Vol. 9, No. 1, p. 20, 2008. [33] A. Vullo and P. Frasconi, “Disulfide connectivity prediction using recursive neural networks and evolutionary information,” Bioinformatics, Vol. 20, No. 5, pp. 653–659, 2004. [34] C.-J. Wang, C.-B. Yang, C.-Y. Hor, and K.-T. Tseng, “Disulfide bond prediction with hybrid models,” Proc. of the 2012 International Conference on Computing and Security (ICCS12), July 2012. [35] E. Zhao, H.-L. Liu, C.-H. Tsai, H.-K. Tsai, C.-H. Chan, and C.-Y. Kao, “Cysteine separations profiles on protein sequences infer disulfide connectivity,” Bioinformatics, Vol. 21, No. 8, pp. 1415–1420, 2005.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0912112-141458.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS