國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,利用工具偏好分類之全原子骨幹結構預測 ,All-atom Backbone Prediction with Improved Tool Preference Classification

論文名稱 Title	利用工具偏好分類之全原子骨幹結構預測 All-atom Backbone Prediction with Improved Tool Preference Classification
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 1 學期 The fall semester of Academic Year 100	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	62
研究生 Author	陳愷瑜 Kai-Yu Chen
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	楊佳寧 Chia-Ning Yang
口試委員 Advisory Committee	林振盛, 曾國尊 Jen-Sen Lin; Kuo-Tsung Tseng
口試日期 Date of Exam	2011-08-31	繳交日期 Date of Submission	2011-09-07
關鍵字 Keywords	蛋白質、預測、三維座標、骨幹、支持向量機 protein, 3D coordinate, prediction, backbone, SVM
統計 Statistics	本論文已被瀏覽 5688 次，被下載 989 次 The thesis/dissertation has been browsed 5688 times, has been downloaded 989 times.

中文摘要
蛋白質骨幹結構重建問題（PBRP）是給定一條蛋白質序列和其中心碳的三維座標再重建出骨幹上所有原子的座標（包含碳、氮、氧原子）。有許多的研究在解決此問題，包含Adock的方法、SABBAC、BBQ還有實驗室張小燕學姐和顏欣偉學長的方法。顏欣偉學長根據實驗發現張小燕學姐和SABBAC的預測結果互有領先。所以他們提出利用工具偏好分類來選擇何種預測工具比較適合來預測這條蛋白質，本篇論文則是使用BBQ和張小燕學姐的方法來當作候選分類工具。此外，分別對蛋白質骨幹上的碳、氮、氧原子分別做工具偏好選擇，我們利用SVM來做出偏好的分類稱為原子分類器（atom classifier）。利用每個原子分類器的偏好分類來選擇適當的工具（BBQ或張小燕的方法）來預測該原子的結構。結合所有原子分類器的結果就可以重建出目標蛋白質的全原子骨幹結構。實驗的資料集來自CASP7、CASP8和CASP9，分別有29、24和55條蛋白質。而我們只取出該資料集中包含標準胺基酸的蛋白質。我們的結果比顏欣偉學長的方法提昇平均的RMSD值在CASP7從0.4019提升到0.3682；CASP8從0.4345提升到0.4202；CASP9從0.4155提升到0.3601。
Abstract
The all-atom protein backbone reconstruction problem (PBRP) is to reconstruct the 3D coordinates of all atoms, including N, C, and O atoms on the backbone, for a protein whose primary sequence and α-carbon coordinates are given. A variety of methods for solving PBRP have been proposed, such as Adcock’s method, SABBAC, BBQ, Chang’s and Yen’s methods. In a recent work, Yen et al. found that the results of Chang’s method are not always better than SABBAC. So they apply a tool preference classification to determine which tool is more suitable for predicting the structure of the given protein. In this thesis, we involve BBQ (Backbone Building from Quadrilaterals) and Chang’s method as our candidate prediction tools. In addition, the tool preferences of different atoms (N, C, O) are determined separately. We call the preference classification as an atom classifier, which is built by support vector machine (SVM). According to the preference classification of each atom classifier, a proper prediction tool, either BBQ or Chang’s method, is used to construct the atom of the target protein. Thus, the combination of all atom results, the backbone structure of a protein is reconstructed. The datasets of our experiments are extracted from CASP7, CASP8, and CASP9, which consists of 30, 24, and 55 proteins, respectively. The proteins of the datasets contain only standard amino acids. We improve the average RMSDs of Yen’s results from 0.4019 to 0.3682 in CASP7, from 0.4543 to 0.4202 in CASP8, and from 0.4155 to 0.3601 in CASP9.

目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Properties of Proteins . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Amino Acids and Peptides . . . . . . . . . . . . . . . . . . . . 4 2.2 Root Mean Square Deviation (RMSD) . . . . . . . . . . . . . . . . . 6 2.3 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . 7 2.4 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Atom Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Feature Set Selection and Feature Set Reorganization . . . . . 22 3.1.3 Training Set Filter and Weighting . . . . . . . . . . . . . . . . 22 3.2 The Classification of SVM . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 26 4.1 Self-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Independent Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

參考文獻 References
[1] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic Acids Research, Vol. 28, pp. 235–242, 2000. [2] H. Y. Chang, C. B. Yang, and H. Y. Ann, “Refinement on o atom positions for protein backbone prediction,” Proceedings of the 2nd WSEAS International Conference on BIOMEDICAL ELECTRONICS and BIOMEDICAL INFORMATICS, 2009. [3] D.-Y. Chiu and P.-J. Chen, “Dynamically exploring internal mechanism of stock market by fuzzy-based support vector machines with high dimension input space and genetic algorithm,” Expert Systems with Applications, Vol. 36, pp. 1240–1248, 2009. [4] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, “Prediction of protein folding class using global description of amino acid sequence,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 92, pp. 8700–8704, 1995. [5] D. Gront, S. Kmiecik, and A. Kolinski, “Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates,” Journal of Computational Chemistry, Vol. 28, pp. 1593–1597, 2007. [6] L. Holm and C. Sander, “Database algorithm for generating protein backbone and side-chain coordinates from a c alpha trace application to model building and detection of coordinate errors,” Journal of Molecular Biology, Vol. 21, No. 1, pp. 183–194, 1991. [7] C.-D. Huang, C.-T. Lin, and N. R. Pal, “Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification,” IEEE Transactions on NanoBioscience, Vol. 2, pp. 221–232, 2003. [8] Y. Iwata, A. Kasuya, and S. Miyamoto, “An efficient method for reconstructing protein backbones from α-carbon coordinates,” Journal of Molecular Graphics and Modelling, Vol. 21, pp. 119–128, 2002. [9] L. K. James, Nobel Laureates in Chemistry, 1901-1992. Chemical Heritage Foundation (June 1, 1993), 1993. [10] R. Kazmierkiewicz, A. Liwo, and H. A. Scheraga, “Energy-based reconstruction of a protein backbone from its α-carbon trace by a Monte-Carlo method,” Journal of Computational Chemistry, Vol. 23, pp. 715–723, 2002. [11] Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen, “PRO-FEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence,” Nucleic Acids Research, Vol. 34, pp. W32–W37, 2006. [12] N. M. Luscombe and J. M. Thornton, “Protein-DNA interactions: Amino acid conservation and the effects of mutations on binding specificity,” Journal of Molecular Biology, Vol. 320, pp. 991–1009, 2002. [13] J. Maupetit, R. Gautier, and P. Tuffery, “SABBAC: online structural alphabet-based protein backbone reconstruction from alpha-carbon trace,” Nucleic Acids Research, Vol. 34, pp. W147–W151, 2006. [14] T. P.Maupetit J, Gautier R, “SABBAC v1.2: Structural alphabet based protein backbone builder from alpha carbon trace.” http://bioserv.rpbs.jussieu.fr/cgi-bin/SABBAC. [15] M. Milik, A. Kolinski, and J. Skolnick, “Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates,” Journal of Computational Chemistry, Vol. 18, pp. 80–85, 1997. [16] J. Moult, K. Fidelis, A. Kryshtafovych, B. Rost, and A. Tramontano, “Critical assessment of methods of protein structure prediction - Round VIII,” Proteins, Vol. 77, pp. 1–4, 2009. [17] D. P. Muni, N. R. Pal, and J. Das, “Genetic programming for simultaneous feature selection and classifier design,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol. 36, pp. 106–117, 2006. [18] T. Noguchi and Y. Akiyama, “PDB-REPRDB: a database of representative protein chains from the protein data bank(PDB) in 2003,” Nucleic Acids Research, Vol. 31, pp. 492–493, 2003. [19] G. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, “Stereochemistry of polypeptide chain configurations,” Journal of Molecular Biology, Vol. 7, pp. 95–99, 1963. [20] P. Rotkiewicz and J. Skolnick, “Fast procedure for reconstruction of full-atom protein models from reduced representations,” Journal of Computational Chemistry, Vol. 29, pp. 1460–1465, 2008. [21] R. Samudrala and J. Moult, “An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction,” Journal of Molecular Biology, Vol. 275, pp. 895–916, 1998. [22] S. Santini, G. Wei, N. Mousseau, and P. Derreumaux, “Exploring the folding pathways of proteins through energy landscape sampling: Application to alzheimer's β-amyloid peptide,” Internet Electronic Journal of Molecular Design, Vol. 2, No. 9, pp. 564-577, 2003. [23] T. Shibuya, “Searching protein 3-D structures in linear time,” Lecture Notes in Computer Science, Vol. 5541, pp. 1–15, 2009. [24] K. T. Simons, C. Kooperberg, E. Huang, and D. Baker, “Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions,” Journal of Molecular Biology, Vol. 268, pp. 209–225, 1997. [25] J. M. Sotoca and F. Pla, “Supervised feature selection by clustering using conditional mutual information-based distances,” Pattern Recognition, Vol. 43, pp. 2068-2081, 2010. [26] R. Unger, “The genetic algorithm approach to protein structure prediction,” Structure and Bonding, Vol. 110, pp. 153–175, 2004. [27] M. Vassura, L. Margara, P. D. Lena, F. Medri, P. Fariselli, and R. Casadio, “Reconstruction of 3D structures from protein contact maps,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 5(3), pp. 357-367, 2008. [28] J. H. Wang, C. B. Yang, and C. T. Tseng, “Reconstruction of protein backbone with the -carbon coordinates,” Proceedings of 2007 National Computer Symposium, Taichung, Taiwan, pp. 136-143, 2007. [29] H.-W. Yen, C.-B. Yang, and H.-Y. Ann, “An effective tool preference selection method for protein structure prediction with SVM,” Proc. of the 27th Workshop on Combinatorial Mathematics and Computation Theory, Taichung, Taiwan, pp. 62–67, 2010. [30] X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, “Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines,” Journal of Theoretical Biology, Vol. 240, pp. 175–184, 2006. [31] Y. Zhang, “Progress and challenges in protein structure prediction,” Current Opinion in Structural Biology, Vol. 18, pp. 342–348, 2008.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0907111-133928.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS