國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,蛋白質摺疊預測之基因演算法,Protein Folding Prediction with Genetic Algorithms

論文名稱 Title	蛋白質摺疊預測之基因演算法 Protein Folding Prediction with Genetic Algorithms
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	92 學年度第 2 學期 The spring semester of Academic Year 92	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	35
研究生 Author	黃奕堯 Yi-Yao Huang
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	王有禮 Yu-Li Wang
口試委員 Advisory Committee	林耀鈴, 楊佳寧, 薛佑玲 Yaw-Ling Lin; Chia-Ning Yang; Yow-Ling Shiue
口試日期 Date of Exam	2004-07-09	繳交日期 Date of Submission	2004-07-28
關鍵字 Keywords	摺疊、預測、基因演算法、蛋白質結構、二級結構 prediction, folding, secondary structure, genetic algorithm, protein structure
統計 Statistics	本論文已被瀏覽 5667 次，被下載 2318 次 The thesis/dissertation has been browsed 5667 times, has been downloaded 2318 times.

中文摘要
蛋白質其生物上之功能是決定於其三維空間的結構，這是眾所皆知的性質。因此，解決蛋白質結構的問題是研究蛋白質很重要的其中一項工作。然而，關於蛋白質如何摺疊成其三維空間的結構目前仍是沒有很明確的定論，因此預測蛋白質結構是一個非常具有挑戰性的任務。本論文提出一個建構於晶格模型的基因演算法，以預測所謂的目標蛋白質之三維空間的結構，而假設其蛋白質序列與二級結構是已知的。親疏水性模型是最簡化且最受歡迎的蛋白質摺疊模型之一。其考慮蛋白質結構中，胺基酸之間疏水性與疏水性的相互作用；但是，這些模型所預測的結構仍然不夠好。因此，我們認為還有其他特性應該考慮，例如二級結構、電荷與雙硫鍵。也就是說，在我們的基因演算法的適應性函式裡，除了考慮到疏水性成對的數量外，同時也考慮到每一個胺基酸是位在哪種二級結構。而既然一開始我們對於蛋白質如何摺疊沒有頭緒，所以事實上晶格模型是為了幫助我們得到目標蛋白質的一個初步之摺疊構形。從預測結果與其真實結構的RMSD值之比較來看，這些額外的特性對於預測結構的確有所改進。
Abstract
It is well known that the biological function of a protein depends on its 3D structure. Therefore, solving the problem of protein structures is one of the most important works for studying proteins. However, protein structure prediction is a very challenging task because there is still no clear feature about how a protein folds to its 3D structure yet. In this thesis, we propose a genetic algorithm (GA) based on the lattice model to predict the 3D structure of an unknown protein, target protein, whose primary sequence and secondary structure elements (SSEs) are assumed known. Hydrophobic-hydrophilic model (HP model) is one of the most simplified and popular protein folding models. These models consider the hydrophobic-hydrophobic interactions of protein structures, but the results of prediction are still not encouraged enough. Therefore, we suggest that some other features should be considered, such as SSEs, charges, and disulfide bonds. That is, the fitness function of GA in our method considers not only how many hydrophobic-hydrophobic pairs there are, but also what kind of SSEs these amino acids belong to. The lattice model is in fact used to help us get a rough folding of the target protein, since we have no idea how they fold at the very beginning. We show that these additional features do improve the prediction accuracy by comparing our prediction results with their real structures with RMSD.

目次 Table of Contents
TABLE OF CONTENTS Page LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Amino Acids in Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Levels of Protein Structures . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 The Hydrophobic-hydrophilic Model . . . . . . . . . . . . . . . . . . . . 7 2.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3. Protein Structure Prediction Methods . . . . . . . . . . . . . . . . 11 3.1 Strategies of Protein Structure Prediction . . . . . . . . . . . . . . . . . . 11 3.2 Representations of Protein Sequences on the Lattice Model . . . . . . . . 13 3.3 Previous PSP Methods for the HP Model . . . . . . . . . . . . . . . . . . 15 Chapter 4. A New Method Based on the Lattice Model . . . . . . . . . . . . . 17 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Overall Steps of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 The Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 5. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

參考文獻 References
[1] P. Baldi, G. Pollastri, C. A. F. Andersen, and S. Brunak, “Matching protein betasheet partners by feedforward and recurrent neural network,” Proceedings of the 2000 Conference on Intelligent Systems for Molecular Biology, (ISMB00), AAAI Press, 2000. [2] D. H. Ballard, An Introduction to Natural Computation. MIT Press, 1999. [3] D. Beasley, D. Bull, and R. Martin, “An overview of genetic algorithms: Part2, research topics,” University Computing, Vol. 15, No. 4, pp. 170–181, 1993. [4] B. Berger and T. Leight, “Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete,” Journal of Computational Biology, Vol. 5, No. 1, pp. 27–40, 1998. [5] M. K. Campbell and S. O. Farrell, Biochemistry. Brooks Cole, fourth ed., 2002. [6] Y. Y. Chen, C. B. Yang, and K. T. Tseng, “Prediction of protein structures based on curve alignment,” Proceedings of the 20thWorkshop on Combinatorial Mathematics and Computation Theory, Chiayi, Taiwan, pp. 34–44, 2003. [7] R. S. Cheng, C. B. Yang, and K. T. Tseng, “Protein structure prediction based on secondary structure alignment,” Proceedings of 2004 Symposium on Digital Life and Internet Technologies(Abstract, full text in CD), Tainan, Taiwan, pp. 29–29, 2004. [8] P. Crescenzi, D. Goldman, C. Capadimitriou, A. Piccolboni, and M. Yannakakis, “On the complexity of protein folding,” Journal of Computational Biology, Vol. 5, No. 1, pp. 409–422, 1998. [9] Y. Cui, R. S. Chen, and W. H. Wong, “Protein folding simulation with genetic algorithm and supersecondary structure constraints,” Proteins, Vol. 31, pp. 247–257, 1998. [10] K. A. Dill, “Theory for the folding and stability of globular proteins,” Biochemistry, Vol. 24, pp. 1501–1509, 1985. [11] A. Dovier, M. Burato, and F. Fogolari, “Using secondary structure information for protein folding in CLP(FD),” Electronic Notes in Theoretical Computer Science (M. Comini and M. Falaschi, eds.), Vol. 76, Elsevier, 2002. [12] S. Duarte-Flores and J. Smith, “Study of fitness landscapes for the HP model of protein structure prediction,” In Proceedings of the Congress on Evolutionary Computation 2003 (CEC’2003), Vol. 1, Canberra, Australia, IEEE Service Center, pp. 2338– 2345, 2003. [13] S. Forrest, “Genetic algorithms,” ACM Computing Surveys, Vol. 28, pp. 77–80, 1996. [14] A. Fraenkel, “Complexity of protein folding,” Bulletin of Mathematical Biology, pp. 1199–1210, 1993. [15] C. Gibas and P. Jambeck, Developing Bioinformatics Computer Skills. O’Reilly & Associates, Inc., first ed., 2001. [16] F. Glover, “Future paths for integer programming and links to artificial intelligence,” Computers and Operations Research, Vol. 13, pp. 533–549, 1986. [17] D. Goldberg, Genetic Algorithms. AddisonWesley Publishing, first ed., 1988. [18] W. Hart and S. Istrail, “Robust proofs of NP-hardness for protein folding: general lattices and energy potentials,” Journal of Computational Biology, Vol. 4, No. 1, pp. 1–22, 1997. [19] J. Holland, “Adaptation in natural and artificial system.” Technical Report. The University of Michigan Press, USA, 1975. [20] T. Jiang, Q. Cui, G. Shi, and S. Ma, “Protein folding simulations of the hydrophobichydrophilic model by combining tabu search with genetic algorithm,” Journal of Chemical Physics, Vol. 119, No. 8, pp. 4592–4596, 2003. [21] N. Krasnogor, W. Hart, J. Smith, and D. Pelta, “Protein structure prediction with evolutionary algorithms,” In W. Banzhaf, J. Daida, A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakaiela, and R.E. Smith, editors, GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufman, 1999. [22] R. C. T. Lee, “Computational biology.” http://www.csie.ncnu.edu.tw/, Department of Computer Science and Information Engineering, National Chi-Nan University, Taiwan, 2001. [23] M. Milostan, P. Lukasiak, K. Dill, and J. Blazewicz, “A tabu search strategy for finding low energy structures of proteins in HP-model,” Proceedings of Seventh Annual International Conference on Research in Computational Molecular Biology, Berlin, Germany, 2003. [24] A. Patton, W. P. III, and E. Goodman, “A standard GA approach to native protein structure prediction,” Proceedings of 6th International Conference On Genetic Algorithm, Dublin, Ireland, pp. 574–581, 1995. [25] I. Ruczinski, C. Kooperberg, R. Bonneau, and D. Baker, “Distrubutions of beta sheets in proteins with application to structure prediction,” Proteins, Vol. 48, pp. 85– 97, 2002. [26] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, second ed., 1997. [27] R. Unger and J. Moult, “Finding the lowest free energy conformation of a protein is NP-hard problem: Proof and implications,” Bulletin of Mathematical Biology, Vol. 55, No. 6, pp. 1183–1198, 1993. [28] R. Unger and J. Moult, “Genetic algorithms for protein folding simulations,” Journal of Molecular Biology, Vol. 231, No. 1, pp. 75–81, 1993. [29] M. Waterman, Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London: CRC Press, 1995. [30] C. Zhang and A. K. Wong, “A genetic algorithm for multiple molecular sequence alignment,” Comput. Appl. Biosci., Vol. 13, pp. 565–581, 1997.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內立即公開，校外一年後公開 off campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0728104-125530.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS