論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
雙硫鍵預測之多階段方法 A Multi-phase Approach for Disulfide Bond Prediction |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
67 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2009-07-07 |
繳交日期 Date of Submission |
2009-07-25 |
關鍵字 Keywords |
多階段、預測、雙硫鍵、半胱氨酸 disulfide bond, prediction, multi-phase, cysteine |
||
統計 Statistics |
本論文已被瀏覽 5674 次,被下載 1443 次 The thesis/dissertation has been browsed 5674 times, has been downloaded 1443 times. |
中文摘要 |
雙硫鍵的資訊可以用來幫助蛋白質二級結構、三級結構以及全原子座標的預測。前人的研究大多只著重在半胱氨酸的狀態分類或是雙硫鍵的鍵結預測,其中鍵結預測的部份更藉由加入些許的限制讓整個過程能在現實中達成。在這篇論文中,我們提出了一個多階段的方法來解決這些問題。在半胱氨酸狀態分類的部份,我們的方法除了可以輸出鍵結的數量外,更達到了 90.7 % 的準確率。我們利用這些預測的狀態資訊,在鍵結預測的階段選取適當的配對組合。而為了解決資料分布的不均衡,我們也提出了縮減採樣率(down-sample)的方法以達到降低處理時間的要求。最後,我們使用最大權重配對問題(weighted graph matching)的演算法來完成鍵結的配對方式,達到了63.5% 的準確率。在針對整體系統的效能方面,更獲得了近 48% 的準確率。我們使用了來自 SWISS-PROT 與 PDB 等著名資料庫的資料集作為驗證,並證明效能都優於前人的方法。 |
Abstract |
Disulfide bond information can help the prediction of protein secondary structure, tertiary structure and all-atom coordinates. Most of previous works focused on either state classification or connectivity prediction with some assumption that some constraints were added to make the problem solvable in reality. In this thesis, we propose a multi-phase approach to solve the problem. Our method can export the number of bonds and achieve 90.7% accuracy in the state classification. For the connectivity prediction problem, we use the number of bonds we predict as a base to decide bond pairs. For overcoming the ratio imbalance of samples, we propose a down-sampling method to reducing processing time. Finally, we perform the weighted graph matching algorithm to obtain the bonding pattern, which achieves 63.5% accuracy. We also achieve 48% accuracy for the thorough prediction. Our method is validated by the datasets derived from SWISS-PROT and PDB. The results are better than the previous works. |
目次 Table of Contents |
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Amino Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 The Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Position-Specific Score Matrix . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 Martelli's Method for State Classification . . . . . . . . . . . . 14 2.4.2 Chen's Method for State Classification . . . . . . . . . . . . . 15 2.4.3 Tsai's Method for Connectivity Prediction . . . . . . . . . . . 16 Chapter 3. Algorithms for Disulfide Bond Prediction . . . . . . . . . 18 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 State Classification . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Connectivity Prediction . . . . . . . . . . . . . . . . . . . . . 32 Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 38 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Page 4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 State Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4 Connectivity Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5 Overall System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 |
參考文獻 References |
[1] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped blast and psi-blast: a new generation of protein database search programs," Nucleic Acids Research, Vol. 25, No. 17, pp. 3389- 3402, 1997. [2] P. Baldi, J. Cheng, and A. Vullo, "Large-scale prediction of disulphide bond connectivity," Advances in Neural Information Processing Systems 17, Cam- bridge, MA, pp. 97-104, MIT Press, 2005. [3] A. Ceroni, P. Frasconi, A. Passerini, and A. Vullo, "Predicting the disul‾de bonding state of cysteines with combinations of kernel machines," Journal of VLSI Signal Processing, Vol. 35, pp. 287-295, 2003. [4] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [5] B.-J. Chen, C.-H. Tsai, C. hsiung Chan, and C.-Y. Kao, "Disulfide connectivity prediction with 70% accuracy using two-level models," PROTEINS: Structure, Function, and Genetics, Vol. 64, pp. 246-252, 2006. [6] G. Chen, H. Deng, Y. Gui, Y. Pan, and X. Wang, "Cysteine separations pro‾les on protein secondary structure infer disulfide connectivity," Granular Comput- ing, 2006 IEEE International Conference on, pp. 663-665, May 2006. [7] Y.-C. Chen and J.-K. Hwang, "Prediction of disulfide connectivity from protein sequences," PROTEINS: Structure, Function, and Genetics, Vol. 61, pp. 507- 512, 2005. [8] Y.-C. Chen, Y.-S. Lin, C.-J. Lin, and J.-K. Hwang, "Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences," PROTEINS: Structure, Function, and Genetics, Vol. 55, pp. 1036-1042, 2004. [9] J. Cheng, H. Saigo, and P. Baldi, "Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching," PROTEINS: Structure, Function, and Genetics, Vol. 62, pp. 617-629, 2006. [10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, "Relation- ship between protein structures and disulfide-bonding patterns," PROTEINS: Structure, Function, and Genetics, Vol. 53, pp. 1-5, 2003. [11] M. O. DayhoR, R. M. Schwartz, and B. C. Orcutt, "A model of evolutionary change in proteins," Atlas of Protein Sequence and Structure (M. O. DayhoR, ed.), pp. 345-352+, Nat. Biomed. Research Foundation, 1978. [12] R. Ed, wmatch: a C Program to solve maximum weight matching, 1999. Soft- ware available at http://elib.zib.de/pub/Packages/mathprog/. [13] P. Fariselli and R. Casadio, "Prediction of disulfide connectivity in proteins," Bioinformatics, Vol. 17, No. 10, pp. 957-964, 2001. [14] P. Fariselli, P. Riccobelli, and R. Casadio, "Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins," PROTEINS: Structure, Function, and Genetics, Vol. 36, pp. 340-346, 1999. [15] F. Ferre and P. Clote, "Disulfide connectivity prediction using secondary struc- ture information and diresidue frequencies," Bioinformatics, Vol. 21, No. 10, pp. 2336-2346, 2005. [16] A. Fiser and I. Simon, "Predicting the oxidation state of cysteines by multiple sequence alignment," Bioinformatics, Vol. 16, No. 3, pp. 251-256, 2000. [17] P. Frasconi, A. Passerini, and A. Vullo, "A two-stage svm architecture for predicting the disulfide bonding state of cysteines," Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, pp. 25-34, 2002. [18] H. N. Gabow, "An efficient implementation of edmonds' algorithm for maxi- mum matching on graphs," Journal of the ACM, Vol. 23, No. 2, pp. 221-234, 1976. [19] M. Gribskov, A. D. Mclachlan, and D. Eisenberg, "Profile analysis: detection of distantly related proteins," Proceedings of the National Academy of Sciences of the United States of America, Vol. 84, No. 13, pp. 4355-4358, 1987. [20] P. M. Harrison and M. J. E. Sternberg, "Analysis and classification of disul- phide connectivity in proteins : The entropic eRect of cross-linkage," Journal of Molecular Biology, Vol. 244, No. 4, pp. 448-463, 1994. [21] S. HenikoR and J. G. HenikoR, "Amino acid substitution matrices from protein blocks," Proceedings of the National Academy of Sciences of the United States of America, Vol. 89, No. 22, pp. 10915-10919, 1992. [22] L. Holm and C. Sander, "Mapping the protein universe," Science, Vol. 273, No. 5275, pp. 595-602, 1996. [23] D. T. Jones, "Protein secondary structure prediction based on position-specific scoring matrices," Journal of Molecular Biology, Vol. 292, No. 2, pp. 195-202, 1999. [24] J. R. G. L., A. P. Shilton, M. M. Parker, and M. Palaniswami, "Prediction of cystine connectivity using svm," Bioinformation, Vol. 1, No. 2, pp. 69-74, 2005. [25] H.-L. Liu and S.-C. Chen, "Prediction of disulfide connectivity in proteins with support vector machine," Journal of the Chinese Institute of Chemical Engi- neers, Vol. 38, No. 1, pp. 63-70, 2007. [26] C.-H. Lu, Y.-C. Chen, C.-S. Yu, and J.-K. Hwang, "Predicting disulfide con- nectivity patterns," PROTEINS: Structure, Function, and Genetics, Vol. 67, pp. 262-270, 2007. [27] P. L. Martelli, P. Fariselli, L. Malaguti, and R. Casadio, "Prediction of the disulfide-bonding state of cysteines in proteins at 88% accuracy," Protein Sci- ence, Vol. 11, pp. 2735-2739, 2002. [28] J. Meiler, M. Muller, A. Zeidler, and F. Schmaschke, "Generation and evalu- ation of dimension-reduced amino acid parameter representations by artificial neural networks," Journal of Molecular Modeling, Vol. 7, pp. 360-369, 2001. [29] L. A. Mirny and E. I. Shakhnovich, "How to derive a protein folding potential? a new approach to an old problem," Journal of Molecular Biology, Vol. 264, No. 5, pp. 1164-1179, 1996. [30] S. M.Muskal, S. R.Holbrook, and S.-H. Kim, "Prediction of the disulfide- bonding state of cysteine in proteins," Protein Engineering, Vol. 3, No. 8, pp. 667-672, 1990. [31] M. Mucchielli-Giorgi, S. Hazout, and P. TuRery, "Predicting the disulfide bond- ing state of cysteines using protein descriptors," PROTEINS: Structure, Func- tion, and Genetics, Vol. 46, pp. 243-249, 2002. [32] R. Rubinstein and A. Fiser, "Predicting disulfide bond connectivity in proteins by correlated mutations analysis," Bioinformatics, Vol. 24, No. 4, pp. 498-504, 2008. [33] R. Singh, "A review of algorithmic techniques for disulfide-bond determina- tion," Brief Funct Genomic Proteomic, Vol. 7, No. 2, pp. 157-172, 2008. [34] C.-H. Tsai, B.-J. Chen, C.-H. Chan, H.-L. Liu, and C.-Y. Kao, "Improving disulfide connectivity prediction with sequential distance between oxidized cys- teines," Bioinformatics, Vol. 21, No. 24, pp. 4416-4419, 2005. [35] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1999. [36] M. Vincent, A. Passerini, M. Labbe, and P. Frasconi, "A simplified approach to disulfide connectivity prediction from protein sequences," BMC Bioinformatics, Vol. 9, No. 1, p. 20, 2008. [37] A. Vullo and P. Frasconi, "Disulfide connectivity prediction using recursive neural networks and evolutionary information," Bioinformatics, Vol. 20, No. 5, pp. 653-659, 2004. [38] E. Zhao, H.-L. Liu, C.-H. Tsai, H.-K. Tsai, C.-H. Chan, and C.-Y. Kao, "Cys- teine separations profiles (csp) on protein sequences infer disulfide connectivity," Bioinformatics, Vol. 21, No. 8, pp. 1415-1420, 2004. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |