國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,利用二級結構預測蛋白質三級結構,Protein Structure Prediction Based on Secondary Structure Alignment

論文名稱 Title	利用二級結構預測蛋白質三級結構 Protein Structure Prediction Based on Secondary Structure Alignment
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	45
研究生 Author	鄭瑞興 Rei-Sing Cheng
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	王有禮 Yue-Li Wang
口試委員 Advisory Committee	薛佑玲, 劉景煌, 盧錦龍 Yow-Lin Shine; Zin-Huang Liu; Chin-Lung Lu
口試日期 Date of Exam	2003-06-20	繳交日期 Date of Submission	2003-08-21
關鍵字 Keywords	蛋白質、對齊、預測、結構、二級結構 prediction, structure, alignment, protein, secondary structure
統計 Statistics	本論文已被瀏覽 5712 次，被下載 1726 次 The thesis/dissertation has been browsed 5712 times, has been downloaded 1726 times.

中文摘要
序列比對在計算生物學上是一個基本的方法，其基本功用是藉由生物序列上的差異大小比對出序列在演化過程中距離的遠近，然後再藉此結果加以應用。傳統上，蛋白質序列比對只應用到蛋白質一級結構的資訊，而沒有考慮到二級結構的重要性，本篇論文的重點即是藉由增加二級結構的資訊來增加序列比對的意義。再者，在同源模擬法(homology modeling)中，其決定性的步驟在於找到一個相似的已知結構，並且以此為基礎來預測未知的蛋白質結構，本論文希望能夠改進傳統序列比對的方法，找出蛋白質一級結構相似程度較低，但是其二級結構相似度較高的蛋白質用來當作結構預測的模板。
Abstract
Sequence alignment is a basic but powerful technique in molecular biology. Macromolecular sequences (DNA, RNA and protein sequences) can be aligned based on some criteria. The goal of sequence alignment is to find the similarity and the difference of input sequences. With various purposes, there are different algorithms In this thesis, we present a new algorithm which aligns sequences with consideration of secondary structures. Traditionally, a sequence alignment algorithm considers only the primary structure, which is the amino acid chain. When we make use of the information of protein secondary structure such as alpha helix, beta sheet etc, the sensitivity of pairwise alignment can be improved.

目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Scoring Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 DNA Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Protein Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3. Protein Structure Prediction . . . . . . . . . . . . . . . . . 21 3.1 Determination of Structures of Proteins . . . . . . . . . . . . . . . . . 21 3.2 Prediction of Protein Structures . . . . . . . . . . . . . . . . . . . . . 23 Chapter 4. Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Page 4.1 Secondary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 A Novel Sequence Alignment Algorithm . . . . . . . . . . . . . . . . 29 Chapter 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Appendixes A. Score Matrix of Secondary Structure . . . . . . . . . . . . . . . . . . 40 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

參考文獻 References
[1] S. Altschul and B. W. Erickson, Optimal sequence alignment using affine gap costs," Journal of Molecular Biology, Vol. 48, pp. 603-616, 1986. [2] S. Altschul, T.Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, Gapped blast and psi-blast: a new generation of protein database search programs," Nucleic Acids Research, Vol. 25, pp. 3389-3402, 1997. [3] S. F. Altschul, M. S. Boguski, W. G. Gish, and J. C. Wooton, Issues in searching molecular sequence databases," Nature Genetics, Vol. 6, pp. 119-129, 1994. [4] S. F. Altschul and D. J. Lipman, Trees, stars and multiple sequence alignment," Journal of Applied Mathematics, Vol. 49, No. 1, pp. 197-209, 1989. [5] D. J. Bacon and W. F. Anderson, Multiple sequence alignment," Journal of Molecular Biology, Vol. 191, pp. 153-161, 1986. [6] V. Bafna, E. L. Lawler, and P. Pevzner, Approximation algorithms for multiple sequence alignment," Proc. of 5th Ann. Symp. On Pattern Combinatorial Matching, Vol. 807, pp. 43-53, 1994. [7] T. Blundell, B. Sibanda, M. Sternberg, and J. Thornton, Knowledge-based prediction of protein structures and the design of novel molecules," Nature, Vol. 326, pp. 347-352, 1987. [8] S. H. Bryant and C. E. Lawrence, An empirical energy function for threading a protein sequence through the folding motif," Proteins, Vol. 16, pp. 92-112, 1993. [9] H. Carrillo and D. J. Lipman, The multiple sequence alignment problem in biology," Journal of Applied Mathematics, Vol. 48, pp. 1073-1082, 1988. [10] K. M. Chao, R. Hardison, and W. Miller, Constrained sequence alignment," Bulletin of Mathematical Biology, Vol. 55, pp. 503-524, 1993. 42 [11] K.M. Chao, R. Hardison, and W. Miller, Locating well-conserved regions within a pairwise alignment," Computer Application in the Biosciences, Vol. 9, pp. 387-396, 1993. [12] Y. Y. Chen, Prediction of protein structures based on curve alignment," Master Thesis, National Sun Yat-sen Unversity, 2002. [13] D. G. Covell, Folding protein carbon chains into compact forms by monte carlo methods," Proteins, Vol. 14, pp. 409-420, 1992. [14] T. Dandekar and P. Argos, Folding the main chain of small proteins with the genetic algorithm," Journal of Molecular Biololgy, Vol. 236, pp. 844-861, 1994. [15] M. O. Dayhoff, Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC, USA, 1978. [16] M. O. Dayhoff W. C. Barker, and L. Hunt, Establishing homologies in protein sequences," Methods Enzymol, Vol. 91, pp. 524-545, 1983. [17] D. F. Feng, M. S. Johnson, and R. F. Doolittle, Aligning amino acid sequences: comparison of commonly used methods," Journal of Molecular Evolution, Vol. 21, pp. 112-125, 1985. [18] C. Gibas and P. Jambeck, Developing Bioinformatics Comuter Skills. O'REILLY, 2001. [19] A. Godzik, A. Kolinski, and J. Skolnick, Toplogy fingerprint approach to the inverse protein folding problem," Journal of Molecular Biology, Vol. 227, pp. 227-238, 1992. [20] A. D. Gordon, A sequence-comparison statistic and algorithm," Biometrika, Vol. 60, pp. 197-200, 1973. [21] O. Gotoh, An improved algorithm for matching biological sequences," Journal of Molecular Biology, Vol. 162, pp. 705-708, 1982. [22] O. Gotoh, Optimal sequence alignment allowing for long gaps," Bulletin of Mathematical Biology, Vol. 52, pp. 359-373, 1990. [23] S. M. L. Grand and K. M. Merz, The application of the genetic algorithm to the minimization of potential energy functions," Journal of Global Optimization, Vol. 3, pp. 49-66, 1993. 43 [24] J. Greer, Comparative modeling of homologous proteins," Methods Enzymol, Vol. 202, pp. 239-252, 1991. [25] S. Henikoff and J. G. Henikoff Amino acid substitution matrices from protein blocks," Proceedings of the National Academy of Sciences, Vol. 89, pp. 10915-10919, 1992. [26] M. S. Johnson and R. F. Doolittle, A method for the simultaneous alignment of three or more amino acid sequences," Journal of Molecular Evolution, Vol. 23, pp. 267-278, 1986. [27] D. Jones, W. Taylor, and J. Thornton, A new approach to protein fold recognition," Nature, Vol. 358, pp. 86-89, 1992. [28] W. Kabsch and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features," Biopolymers, Vol. 22, pp. 2577-2637, 1983. [29] R. Luthy, J. U. Bowie, and D. Eisenberg, Assessment of protein models with three-dimensional pro疹es," Nature, Vol. 356, pp. 83-85, 1992. [30] S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequences of two proteins," Journal of Molecular Biology, Vol. 48, pp. 443-453, 1970. [31] W. Pearson and W. Miller, Dynamic programming algorithms for biological sequence comparison," Methods in Enzymology, Vol. 210, pp. 575-601, 1992. [32] W. R. Pearson, How the regulatory and catalytic domains get together," Protein Science, Vol. 4, pp. 1145-1160, 1995. [33] R. B. Russell, Protein fold recognition by mapping predicted secondary structures," Journal of Molecular Biololgy, Vol. 259, pp. 349-365, 1996. [34] R. M. Schwartz and M. O. Dayhoff, Matrices for detecting distant relationships. In M. Dayhoff editor, Atlas of Protein Sequence and Structure, Volume 5, pages 353-358. National Biomedical Research Foundation, Washington, DC, USA, 1979. [35] T. F. Smith and M. S. Waterman, Comparison of biosequences," Advances in Applied Mathematics, Vol. 2, pp. 482-489, 1981. 44 [36] W. R. Taylor, A flexible method to align a large number of sequences," Journal of Molecular Evolution, Vol. 28, pp. 161-169, 1988. [37] U. Tonges, S. W. Perrey, J. Stoye, and A. W. M. Dress, A general method for fast multiple sequence alignment," Gene, Vol. 172, No. 1, pp. 33-41, 1996. [38] C. Venclovas A. Kryshtafovych, and K. Fidelis, http://predictioncenter.llnl.gov/," [39] K. E. Vrana, How the regulatory and catalytic domains get together," Nature Structural Biology, Vol. 6, pp. 401-402, 1999. [40] A. Wong, S. Chan, and D. Chiu, A multiple sequence comparison method," Society for Mathematical Biology, Vol. 55, No. 2, pp. 465-486, 1993. 45

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內立即公開，校外一年後公開 off campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0821103-204917.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS