Responsive image
博碩士論文 etd-0713115-160459 詳細資訊
Title page for etd-0713115-160459
論文名稱
Title
利用支援向量迴歸於蛋白質骨幹原子座標之修正方法
Coordinate Refinement on All Atoms of the Protein Backbone with Support Vector Regression
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
67
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-08-20
繳交日期
Date of Submission
2015-09-02
關鍵字
Keywords
生物資訊、蛋白質骨幹、三維座標、預測、支援向量迴歸
support vector regression, prediction, three-dimensional coordinates, protein backbone, bioinformatics
統計
Statistics
本論文已被瀏覽 5708 次,被下載 405
The thesis/dissertation has been browsed 5708 times, has been downloaded 405 times.
中文摘要
蛋白質結構的預測在生物資訊領域上已經發展了數十年。蛋白質骨幹重建問題為給定一條目標蛋白質序列和中心碳的座標,重建出其骨幹上所有原子的三維座標。為了使預測更準確,我們利用支援向量迴歸的方法來修正骨幹上原子的三維座標。我們使用在蛋白質骨幹預測表現比較好的兩個方法PD2和BBQ所預測出來的座標當作我們的候選特徵,接著我們定義了超過100個可能的特徵。在經過相關性的計算,我們找到多個與預測目標相關的特徵。我們進行了leave-one-protein-out以及5-fold 交叉驗證的實驗,實驗的資料集包含了CASP7到CASP11。實驗的結果顯示我們方法的平均RMSD值比PD2提升8%,因此在這個問題上我們的方法是最準確的預測工具。
Abstract
For the past decades, the protein structure prediction has been developed in the fields of bioinformatics. The protein backbone reconstruction problem (PBRP) is to reconstruct the 3D coordinates of all atoms on the protein backbone for a given target protein sequence and its Cα coordinates. In order to improve the prediction accuracy, we aim to refine the 3D coordinates of all backbone atoms with support vector regression (SVR). We use the predicted coordinates of two excellent methods, PD2 and BBQ, as our feature candidates. Accordingly, we define more than 100 possible features. After their correlations are calculated, we find out several significant features deeply related to the prediction target. Then, the leave-one-protein-out method and 5-fold cross validation are invoked to perform the experiments, and the experimental datasets include CASP7 through CASP11. As the experimental results show, our method has about 8% improvement in RMSD over PD2, which is the most accurate predictor for the problem.
目次 Table of Contents
中文審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
英文審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ii
謝辭 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . viii
LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Proteins and Amino Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Root Mean Square Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Pearson's Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 SABBAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.2 Wang's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.3 Chang's Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.4 BBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.5 Yen's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.6 Chen's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
2.5.7 Wu's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.8 PD2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3. The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Feature Generation and Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 The Difference Prediction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
參考文獻 References
[1] S. A.Adcock,” Peptide backbone reconstruction using dead-end elimination and a knowledge-based forcefield," Journal of Computational Chemistry, Vol.25, pp. 16-27, 2004.
[2] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, andP. E. Bourne, “The protein data bank," Nucleic Acids Research, Vol.28, pp.235-242, 2000.
[3] B. Boser, I. Guyon, and V. Vapnik, ”A training algorithm for optimal margin classifiers," Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, NY, USA, pp.144-152,1992.
[4] B. R. Brooks, C. L. B. III, A. D. M. Jr, L. Nilsson, R. J. Petrella, B. Rouxand, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J.Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D.M. York, and M. Karplus, ”CHARMM: The biomolecular simulation program," Journal of Computational Chemistry, Vol.30,pp.1545-1614,2009.
[5] C. J.Burges, ” A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, Vol.2(2),pp.121-167,1998.
[6] J.M.Chandonia, G. Hon, N. S. Walker, L. L. Conte ,P. Koehl ,M. Levitt, and S. E.Brenner,”The ASTRAL Compendium in 2004," Nucleic Acids Research, Vol.32,pp.D189-D192,2004.
[7] C. C. Chang and C. J. Lin, “LIBSVM: A library for support vector machines," ACM Transactions on Intelligent Systems and Technology, Vol.2, No. 3, pp.27:1-27:27,2011.
[8] H.Y.Chang,C.B.Yang,andH.Y.Ann,"Refinement on O atom positions for protein backbone prediction," Proceedings of the 2nd WSEAS International Conference on Biomedical Electronicsand Biomedical Informatics(BEBI'09), Moscow,Russia,pp.99-104,2009.
[9] Y.W.Chang,C.J.Hsieh,K.W.Chang,M.Ringgaard,and C.J.Lin,"Training and testing low-degree polynomial data mappings via linearSVM," Journal of Machine LearningResearch, Vol.11,pp.1471-1490,2010.
[10] K. Y.Chen,C.B.Yang,andK.S.Huang,"Prediction of protein backbone structure by preference classification with SVM," Proceedings of the 9th International Conference on Information Systems and Technology Management, Sao Paulo,Brazil,pp.1193-1206,2012.
[11] K. Y.Chen,"Forecasting systems reliability based on support vector regression with genetical gorithms," Reliability Engineering and System Safety, Vol.92, pp. 423-432,2007.
[12] V. CherkasskyandY.Ma,"Practical selection of SVM parameters and noise estimation for SVM regression," NeuralNetworks, Vol.17,pp.113-126,2004.
[13] W. D.Cornell,P.Cieplak,C.I.Bayly,I.R.Gould,J.K.M.Merz,D.M. Ferguson,D.C.Spellmeyer,T.Fox,J.W.Caldwell,andP.A.Kollman,"A second generation forcefield for the simulation of proteins,nucleicacids,and organic molecules," Journal of American Chemical Society, Vol.117,pp.5179- 5197, 1995.
[14] C. CortesandV.Vapnik,"Support-Vector Networks," Machine Learning, Vol.20,pp.273-297,1995.
[15] K. Duan,S.Keerthi,andA.Poo,"Evaluation of simple performance measures for tuning SVM hyperparameters," Neurocomputing, Vol.51,pp.41-59,2003.
[16] I. Dubchak, I.Muchnik,S.R.Holbrook,and S.H.Kim,"Prediction of protein folding class using global description of amino acid sequence," Proceedingsof the National Academy of Sciences of the United States of America, Vol.92, pp. 8700-8704,1995.
[17] R. Fletcher, Practical Methods of Optimization. Wiley,New York,1989.
[18] D. Gront,S.Kmiecik,andA.Kolinski,"Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates," Journal of Computational Chemistry, Vol.28,pp.1593-1597, 2007.
[19] L. Holm and C.Sander,"Database algorithm for generating protein backbone and side-chain coordinates froma Calpha trace application to model building and detection of coordinate errors," Journal of Molecular Biology, Vol.21, No. 1,pp.183-194,1991.
[20] D. E.JamesU.Bowie,RolandLuthy,"A method to identify protein sequences that fold into a known three-dimensional structure," Science, Vol.253,pp.164- 170, 1991.
[21] T. Jones,J.Zou,S.Cowan,and M.Kjeldgaard,"Improved method for building models in electron density maps and the location of errors in the semodels.," Acta Crystallographica Section A, Vol.47,pp.110-119,1992.
[22] W. Kabsch,"A solution for the best rotation torelate two sets of vectors," Acta Crystallographica Section A, Vol.32,pp.922-923,1976.
[23] W. Kabsch,"A discussion of the solution for the best rotation torelate two sets of vectors," Acta Crystallographica Section A, Vol.34,pp.827-828,1978.
[24] R. Kazmierkiewicz,A.Liwo,andH.A.Scheraga,"Energy-based reconstruction of a protein backbone from its aplha-carbon trace by a Monte-Carlomethod," Journal of Computational Chemistry, Vol.23,pp.715-723,2002.
[25] S. S.Keerthi,"Efficient tuning of SVMhyper-parameters using radius/margin boundand iterative algorithms," IEEE TransactionsonNeuralNetworks, Vol.13(5),pp.1225-1229,2002.
[26] S. S.Keerthi,S.K.Shevade,C.Bhattacharyya,andK.R.K.Murthy,"Improvements to platt's SMO algorithm for SVM classifier design," NeuralCom- putation, Vol.13,pp.637-649,2001.
[27] N. Krasnogor,W.E.Hart,J.Smith,andD.A.Pelta,"Protein structure prediction with evolutionary algorithms," Proceedings of the Genetic and Evolutionary Compution Conference, Orlando,USA,pp.1596-1601,1999.
[28] H. H.LinandL.Y.Tseng,"Prediction of disulfide bonding pattern based on support vector machine with parameters tuned by multiple trajectory search," WSEAS Transactions on Computers, Vol.8(9),pp.1429-1439,2009.
[29] P.T.Lin,S.F.Su,andT.T.Lee,"Support vector regression performance analysis and systematic parameter selection," Proceedings of International Joint Conference on Neural Networks, Montreal,Canada,pp.877-882,2005.
[30] S. W.Lin,K.C.Ying,S.C.Chen,andZ.J.Lee,"Particle swarm optimization for parameter determination and feature selection of support vector machines," Expert Systems with Applications, Vol.35(4),pp.1817-1824,2008.
[31] O. L.Mangasarian, Nonlinear programming. McGraw-Hill,NewYork,1969.
[32] J. Maupetit,R.Gautier,andP.Tufiery,"SABBAC:online structural alphabet based protein backbone reconstruction from alpha-carbon trace," Nucleic Acids Research, Vol.34,pp.W147-W151,2006.
[33] G. P.McCormick, Nonlinear Programming:Theory,Algorithms,andApplications. Wiley,NewYork,1983.
[34] B. L.Moore,L.A.Kelley,J.Barber,J.Murray,and J.T.MacDonald,"High- quality protein backbone reconstruction from alpha-carbons using Gaussian mixture models," Journal of Computational Chemistry, Vol.34,pp.1881-1889, 2013.
[35] J. Moult,K.Fidelis,A.Kryshtafovych,B.Rost,andA.Tramontano,"Critical assessment of methods of protein structure prediction(CASP)xRoundIX," Proteins, Vol.79,pp.1-5,2011.
[36] K. Pearson,"Mathematical Contributions to the Theory of Evolution.III.Regression, Heredity,andPanmixia," Transactions of RoyalSociety of London. Series A, Vol.187,pp.253-318,1896.
[37] J. RodgersandW.Nicewander,"Thirteen ways to look at the correlation coefficient," The American Statistician, Vol.42,pp.59-65,1988.
[38] I.Ruczinski,C.Kooperberg,R.Bonneau,and D.Baker,"Distribution of beta sheets in proteins with application to structure prediction," Proteins:Structure, Function,and Genetics, Vol.48,pp.85-97,2008.
[39] S. Santini,G.Wei,N.Mousseau,andP.Derreumaux,"Exploring the folding path ways of proteins through energy landscape sampling:Applicationto alzheimer's beta-amyloidpeptide," Internet ElectronicJournalofMolecularDe- sign, Vol.2,No.9,pp.564-577,2003.
[40] B. Scholkopf,K.Tsuda,andJ.P.Vert, Kernel MethodsinComputational Biology. TheMITPress,2004.
[41] A. J.SmolaandB.Scholkopf,"A tutorial on support vector regression," Statistics and Computing, Vol.14,pp.199-222,2004.
[42] V. VapnikandA.Chervonenkis,"On the uniform convergence of relative frequencies of events to their probabilities," Theory of Probability and Its Applications, Vol.16(2),pp.264-280,1971.
[43] V. Vapnik,S.E.Golowich,andA.Smola,"Support vector method for function approximation,regression estimation,and signal processing," Advancesin Neural Information Processing Systems 9, pp.281-287,MITPress,1996.
[44] V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer Series inStatistics:NewYork,1982.
[45] V. Vapnik, The Nature of Statistical Learning Theory. Springer,NewYork, 1995.
[46] V. Vapnik, Statistical Learning Theory. Wiley,NewYork,1998.
[47] J. H.Wang,C.B.Yang,andC.T.Tseng,"Reconstruction of Protein Backbone with the alpha-Carbon Coordinates," Journal of Information Scienceand Engineering, Vol.26,No.3,pp.1107-1119,2010.
[48] H. F.Wu,C.B.Yang,C.Y.Hor,Y.H.Peng,andK.T.Tseng,"Protein backbone reconstruction with tool preference classification for standard and non-standard proteins," Proceedings of the 12th Conferenceon Information Techology and Applications in OutlyingIslands, Kingmen,Taiwan,pp.175-182, 2013.
[49] H. W.Yen,C.B.Yang,andH.Y.Ann,"An effiective tool preference selection method for protein structure prediction with SVM," Proceedings of the 27th Workshop on Combinatorial Mathematics and Computation Theory, Taichung, Taiwan,pp.62-67,2010.
[50] H. C.Yuan,"A survey of computational methods for protein structure prediction," Master's Thesis, National Sun Yat-sen University,Kaohsiung,Taiwan, July,2015. 53
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code