Responsive image
博碩士論文 etd-0911112-132025 詳細資訊
Title page for etd-0911112-132025
論文名稱
Title
利用工具偏好分類於標準及非標準蛋白質之蛋白質骨 幹重建
Protein Backbone Reconstruction with Tool Preference Classification for Standard and Nonstandard Proteins
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
60
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-09-07
繳交日期
Date of Submission
2012-09-11
關鍵字
Keywords
生物資訊、骨幹、特徵集、標準胺基酸、支持向量機
backbone, bioinformatics, standard protein, support vector machine, feature set
統計
Statistics
本論文已被瀏覽 5662 次,被下載 869
The thesis/dissertation has been browsed 5662 times, has been downloaded 869 times.
中文摘要
全原子蛋白質骨幹重構問題(PBRP)為給定一條蛋白質序列以及在骨幹上的中心碳座標,重新建構在此蛋白質骨幹上的氮、碳、氧原子的三維座標。在過去數十年間,許多方法都被提出來解決蛋白質骨幹重構問題,例如:從頭開始法、同源模擬法、SABBAC、實驗室王仁暉學長和張小燕學姐的方法、BBQ及實驗室陳愷瑜學長的方法。陳愷瑜學長發現假如可以針對想知道的原子來選擇正確的預測工具,可以降低RMSD值。本篇論文是根據陳愷瑜學長的方法做改進,我們利用SVM選擇在每個殘基上的氮、碳、氧原子個別要使用張小燕學姐還是BBQ的
方法來進行預測該原子的座標。將所有殘基的原子分類器產生的結果結合就可以重建出目標蛋白質的骨幹結構。本論文使用的實驗資料集為CASP7、CASP8和CASP9,分別有65、52、63條蛋白質。在這些資料集中包含了標準及非標準胺基酸組成的蛋白質。我們的結果之平均RMSD值上於CASP7為0.3496:CASP8為0.3084:CASP9為0.3286。
Abstract
Given a protein sequence and the Cα coordinates on its backbone, the all-atom protein backbone reconstruction problem (PBRP) is to reconstruct the backbone by
its 3D coordinates of N, C and O atoms. In the past few decades, many methods have been proposed for solving PBRP, such as ab initio, homology modeling, SABBAC,
Wang’s method, Chang’s method, BBQ (Backbone Building from Quadrilaterals) and Chen’s method. Chen found that, if they can choose the correct prediction tool
to build the 3D coordinates of the desired atoms, the RMSD may be improved. In this thesis, we propose a method for solving PBRP based on Chen’s method. We
use tool preference classification on each atom of the residue, where the classification model is generated by SVM (Support Vector Machine). We rebuild the backbone by
combing the prediction results of all atoms in all residues. The data sets used in our experiments are CASP7, CASP8 and CASP9, which have 65, 52 and 63 proteins, respectively. These data sets contain nonstandard amino acids as well as standard ones. We improve the average RMSDs of Chen’s results in some cases. The average
RMSDs of our method are 0.3496 in CASP7, 0.3084 in CASP8 and 0.3286 in CASP9.
目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Properties of Proteins . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Amino Acids and Peptides . . . . . . . . . . . . . . . . . . . . 5
2.2 Root Mean Square Deviation (RMSD) . . . . . . . . . . . . . . . . . 8
2.3 Position Specific Scoring Matrix (PSSM) . . . . . . . . . . . . . . . . 9
2.4 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . 12
2.5 Critical Assessment of Protein Structure Prediction (CASP) . . . . . 13
2.6 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.1 SABBAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.2 Wang’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.3 Chang’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6.4 BBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.5 Chen’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Our Method for Preference Tool Selection . . . . . . . . . . . . . . . 21
3.3 Generation of FASTA Files . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
參考文獻 References
[1] S. A. Adcock, “Peptide backbone reconstruction using dead-end elimination and a knowledge-based forcefield,” Journal of Computational Chemistry, Vol. 25, pp. 16–27, 2004.
[2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, Vol. 215, No. 3, pp. 403–410, 1990.
[3] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, W. M. Z. Zhang, and D. J. Lipman, “Gapped blast and psi-blast: a new generation of protein database search programs,” Nucleic Acids Research, Vol. 25, No. 17, pp. 3389–3402, 1997.
[4] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic Acids Research, Vol. 28, pp. 235–242, 2000.
[5] D. Brock and O. Mayo, Biochemical genetics of man. Academic Press, New York, 1972.
[6] C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 27:1–27:27, 2011.
[7] H.-Y. Chang, C.-B. Yang, and H.-Y. Ann, “Refinement on O atom positions for protein backbone prediction,” Proceedings of the 2nd WSEAS International Conference on Biomedical Electronics and Biomedical Informatics (BEBI ’09), Moscow, Russia, pp. 99–104, 2009.
[8] M. Charton and B. I. Charton, “The structural dependence of amino acid hydrophobicity parameters,” Journal of Theoretical Biology, Vol. 99, pp. 629–644,
1982.
[9] K.-Y. Chen, C.-B. Yang, and K.-S. Huang, “Prediction of protein backbone structure by preference classification with SVM,” Proceedings of the 9th International Conference on Information Systems and Technology Management, Sao
Paulo, Brazil, pp. 1193–1206, 2012.
[10] C.-C. Chuang, C.-Y. Chen, J.-M. Yang, P.-C. Lyu, and J.-K. Hwang, “Relationship between protein structures and disulfide-bonding patterns,” Proteins: Structure, Function, and Bioinformatics, Vol. 53, pp. 1–5, 2003.
[11] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, J. K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman, “A second generation force field for the simulation of proteins, nucleic acids, and organic molecules,” Journal of American Chemical Society, Vol. 117, pp. 5179–5197, 1995.
[12] D. Cozzetto, A. Kryshtafovych, and A. Tramontano, “Evaluation of casp8 model quality predictions,” Proteins: Structure, Function, and Bioinformatics, Vol. 77, pp. 157–166, 2009.
[13] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, “Prediction of protein folding class using global description of amino acid sequence,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 92, pp. 8700–8704, 1995.
[14] G. D. Fasman, Handbook of Biochemistry and Molecular Biology, 3rd edition: Proteins. CRC Press; 3 edition (April 19, 1976), 1976.
[15] J.-L. Fauchere, M. Charton, L. B. Kier, A. Verloop, and V. Pliska, “Amino acid side chain parameters for correlation studies in biology and pharmacology,” International Journal of Peptide and Protein Research, Vol. 32, pp. 269–278, 1988.
[16] Z. Feng, L. Chen, H. Maddula, O. Akcan, R. Oughtred, H. M. Berman, and J. Westbrook, “Ligand depot: a data warehouse for ligands bound to macromolecules,” Bioinformatics, Vol. 20, pp. 2153–2155, 2004.
[17] R. Grantham, “Amino acid difference formula to help explain protein evolution,” Science, Vol. 185, pp. 862–864, 1974.
[18] D. Gront, S. Kmiecik, and A. Kolinski, “Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates,” Journal of Computational Chemistry, Vol. 28, pp. 1593–1597, 2007.
[19] L. Holm and C. Sander, “Database algorithm for generating protein backbone and side-chain coordinates from a c alpha trace application to model build-
ing and detection of coordinate errors,” Journal of Molecular Biology, Vol. 21, No. 1, pp. 183–194, 1991.
[20] J. Janin, S. Wodak, M. Levitt, , and B. Maigret, “Conformation of amino acid side-chains in proteins,” Journal of Molecular Biology, Vol. 125, pp. 357–386, 1978.
[21] R. Kazmierkiewicz, A. Liwo, and H. A. Scheraga, “Energy-based reconstruction of a protein backbone from its -carbon trace by a Monte-Carlo method,” Journal of Computational Chemistry, Vol. 23, pp. 715–723, 2002.
[22] P. Klein, M. Kanehisa, and C. DeLisi, “Prediction of protein function from sequence properties: Discriminant analysis of a data base,” Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, Vol. 787, pp. 221–226, 1984.
[23] N. Krasnogor,W. E. Hart, J. Smith, and D. A. Pelta, “Protein structure prediction with evolutionary algorithms,” Proceedings of the Genetic and Evolutionary Compution Conference, Orlando, USA, pp. 1596–1601, 1999.
[24] A. Kumar, “Modified residue list,” 2011. http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg21953.html.
[25] J. Kyte and R. F. Doolittle, “A simple method for displaying the hydropathic character of a protein,” Journal of Molecular Biology, Vol. 157, pp. 105–132, 1982.
[26] C.-Y. Lin, C.-B. Yang, C.-Y. Hor, and K.-S. Huang, “Disulfide bonding state prediction with svm based on protein types,” IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications, 2010, Changsha, China, pp. 1436–1442, 2010.
[27] J. Maupetit, R. Gautier, and P. Tuffery, “SABBAC: online structural alphabet-based protein backbone reconstruction from alpha-carbon trace,” Nucleic Acids Research, Vol. 34, pp. W147–W151, 2006.
[28] A. D. McLachlan, “Rapid comparison of protein structures,” Acta Crystallographica Section A, Vol. 38, pp. 871–873, 1982.
[29] M. Milik, A. Kolinski, and J. Skolnick, “Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates,” Journal of Computational Chemistry, Vol. 18, pp. 80–85, 1997.
[30] J. Moult, K. Fidelis, A. Kryshtafovych, B. Rost, and A. Tramontano, “Critical assessment of methods of protein structure prediction (CASP) x Round IX,” Proteins, Vol. 79, pp. 1–5, 2011.
[31] P. Rotkiewicz and J. Skolnick, “Fast procedure for reconstruction of full-atom protein models from reduced representations,” Journal of Computational Chemistry, Vol. 29, pp. 1460–1465, 2008.
[32] I. Ruczinski, C. Kooperberg, R. Bonneau, and D. Baker, “Distribution of beta sheets in proteins with application to structure prediction,” Proteins: Structure,
Function, and Genetics, Vol. 48, pp. 85–97, 2008.
[33] S. Santini, G. Wei, N. Mousseau, and P. Derreumaux, “Exploring the folding pathways of proteins through energy landscape sampling: Application to alzheimer’s -amyloid peptide,” Internet Electronic Journal of Molecular Design, Vol. 2, No. 9, pp. 564–577, 2003.
[34] J.-H.Wang, C.-B. Yang, and C.-T. Tseng, “Reconstruction of protein backbone with the -carbon coordinates,” Journal of Information Science and Engineering, Vol. 26, No. 3, pp. 1107–1119, 2010.
[35] J. Zimmerman, N. Eliezer, and R. Simha, “The characterization of amino acid sequences in proteins by statistical methods,” Journal of Theoretical Biology, Vol. 21, pp. 170–201, 1968.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code