國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,RNA二級結構的對齊方法,RNA Secondary Structure Alignment

論文名稱 Title	RNA二級結構的對齊方法 RNA Secondary Structure Alignment
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	117
研究生 Author	吳孟宜 Meng-Yi Wu
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	洪西進 Shi-Jinn Horng
口試委員 Advisory Committee	黃毅青, 薛佑玲, 吳邦一 Ngai-Ching Wong; Yow-Ling Shiue; Bang-Ye Wu
口試日期 Date of Exam	2003-07-11	繳交日期 Date of Submission	2003-08-12
關鍵字 Keywords	二級結構、計算生物、對齊、核醣核酸、動態規劃 computational biology, secondary structure, RNA, dynamic programming, alignment
統計 Statistics	本論文已被瀏覽 5662 次，被下載 1906 次 The thesis/dissertation has been browsed 5662 times, has been downloaded 1906 times.

中文摘要
RNA或蛋白質的比較方法為分子生物研究中之重要基本工具。到目前為止，大部分比較方法僅適用在RNA或蛋白質之一級結構，也就是一般常見到的序列比對與序列對齊方法。從生物的角度來看，結構與功能有著非常密切的關係。目前對於結構探知的方法，主要有核磁共振法、X光繞射法以及透過電腦程式來預測，因此有許多分子是已知其結構但功能尚未被發掘。RNA二級結構的對齊問題，是想要知道兩個RNA二級結構整個對齊之後的情況，以及它們之間的相似程度。另一方面，可以藉由RNA二級結構對齊的方法來協助預測其功能，並加以分類。在本論文中，我們針對兩個RNA二級結構的對齊問題加以研究，在我們的方法中，所處理的二級結構並不包含假扭結。我們設計一個動態規劃演算法來解兩個RNA二級結構的對齊問題，此演算法的時間複雜度為O(N4)，其中N為兩個RNA結構的最大區塊個數。我們亦將此演算法應用於人類腺粒體tRNA，藉以評估其實用性。從實驗結果顯示，我們的方法可以有效地評估兩個RNA之結構相似程度，不論其序列是否相似。
Abstract
The comparison methods for RNA or protein molecules are important basic tools in molecular biology. So far, most comparison methods are only applicable to the primary structures of biomolecules, such as the sequence alignment and comparison methods. The functions of biomolecules have close relationship with their structures. The recent methods for finding the structures of biomolecules are NMR spectroscopy, X-ray crystallography, and prediction with computational simulation. There are many biomolecules with known structures, but their functions are unknown. The RNA secondary structure alignment problem is to align two RNA molecules to get the structure similarity, where their secondary structures are given. In addition, it is also helpful to predict the functions of biomolecules and to classify them. In this thesis, we design a dynamic programming method for aligning two RNA secondary structures which do not contain any pseudoknot. The time complexity of our algorithm is O(N4), where N is the number of blocks contained in the given RNA sequences. We also apply our algorithm to the real biomolecules, the tRNAs of Homo sapiens mitochondrion, to evaluate the practicability our method. We take three tRNA genes, TRNG, TRNA and TRNV, to test the performance of our algorithm. From the view point of human eyes, in fact, the structure of TRNG is more similar to TRNA. Our algorithm also gets this result. Hence, our algorithm provides an effective method to measure the similarity of two RNA secondary structures.

目次 Table of Contents
LIST OF FIGURES ............................................................. 4 LIST OF TABLES .............................................................. 7 ABSTRACT .................................................................... 0 Chapter 1. Introduction .................................................... 1 Chapter 2. Preliminaries ................................................... 9 2.1 RNA and cDNA ........................................................... 9 2.2 The Stem .............................................................. 12 Chapter 3. Prediction Methods for the RNA Secondary Structure ............. 17 Chapter 4. The Sequence Alignment ......................................... 24 4.1 The Longest Common Subsequence Problem ................................ 24 4.2 The Sequence Alignment Problem ........................................ 28 4.3 The Local Alignment Problem ........................................... 35 4.4 The Affine Gap Penalty ................................................ 38 Chapter 5. Sequence Alignment for RNA Secondary Structure .................. 43 5.1 Terminologies ......................................................... 43 5.2 An Alignment Algorithm for RNA Secondary Structures ................... 46 5.3 Improvement on Space .................................................. 72 5.4 The Stem Alignment Problem ............................................ 78 5.5 The RNA Secondary Structure Alignment Algorithm ....................... 83 5.6 Analysis of Time Complexity ........................................... 85 Chapter 6. Experimental Results ............................................ 87 Chapter 7. Conclusion ..................................................... 110 BIBLIOGRAPHY .............................................................. 112

參考文獻 References
[1] "3Dee database." http://www.compbio.dundee.ac.uk/3Dee/. [2] P. Agarwal and D. J. States, "Comparative accuracy of methods for protein sequence similarity search," Bioinformatics, Vol. 14, pp. 40-47, 1998. [3] S. F. Altschul, "A protein alignment scoring system sensitive to all evolutionary distances," Journal of Molecular Evolution, Vol. 36, pp. 290-300, 1993. [4] S. F. Altschul and G. Gish, "Local alignment statistics," Methods Enzymol, Vol. 266, pp. 460-480, 1996. [5] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," Journal of Molecular Biology, Vol. 215, pp. 403-410, 1990. [6] R. Apweiler, P. Stoehr, W. Zhu, and R. Lopez, "Swiss-Prot database." http://www.ebi.ac.uk/swissprot/index.html. [7] P. Argos, "A sensitive procedure to compare amino acid sequences," Journal of Molecular Biology, Vol. 193, pp. 385-396, 1987. [8] T. K. Attwood and D. J. Parry-Smith, Introduction to Bioinformatics. Prentice Hall, 1999. [9] A. D. Baxevanis, "The molecular biology database collection: 2003 update," Nucleic Acids Research, Vol. 31, No. 1, pp. 1-12, 2003. [10] S. E. Brenner, C. Chothia, T. Hubbard, and A. Murzin, "Understanding protein structure: Using SCOP for fold interpretation," Methods Enzymol, Vol. 266, pp. 635-643, 1996. [11] S. E. Brenner, C. Chothia, and T. J. Hubbard, "Population statistics of protein structures: Lessons from structural classifications," Current Opinion in Structural Biology, Vol. 7, pp. 369-376, 1997. [12] C. Bystroff and Y. Shao, "Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA," Bioinformatics, Vol. 18, pp. 54-61, 2002. [13] R. Casadio, I. Jacoboni, A. Messina, and V. D. Pinto, "A 3D model of the voltage-dependent anion channel (VDAC)," FEBS Letters, Vol. 520, pp. 1-7, 2002. [14] V. Chvatal and D. Sankoff, "Longest common subsequences of two random sequences," J. Appl. Probab., Vol. 12, pp. 306-315, 1975. [15] J. F. Collins, A. F. Coulson, and A. Lyall, "The significance of protein sequence similarities," Computer Applications in the Biosciences, Vol. 4, pp. 67-71, 1988. [16] L. L. Conte, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. Murzin, "SCOP database in 2002: Refinements accommodate structural genomics," Nucleic Acids Research, Vol. 30, No. 1, pp. 264-267, 2002. [17] F. Corpet and B. Michot., "RNAlign program: Alignment of RNA sequences using both primary and secondary structures," Computer Applications in the Biosciences, Vol. 10, No. 4, pp. 389-399, 1994. [18] J. E. Darnell, L. Philipson, R. Wall, and M. Adesnik, "Polyadenylic acid sequences: Role in conversion of nuclear RNA into messenger RNA," Science, Vol. 174, No. 8, pp. 507-510, 1971. [19] Y. V. de Peer, J. Jansen, P. D. Rijk, and R. D. Wachter, "Database on the structure of small ribosomal subunit RNA," Nucleic Acids Research, Vol. 25, pp. 111-116, 1997. [20] L. Dehaspe, H. Toivonen, and R. D. King, "Finding frequent substructures in chemical compounds," Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 30-36, 1998. [21] R. Duda and P. Hart, Pattern Classification and Scene Analysis. John Wiley, New York, 1973. [22] S. R. Eddy and R. Durbin, "RNA sequence analysis using covariance models," Nucleic Acids Research, Vol. 22, pp. 2079-2088, 1994. [23] W. M. Fitch and T. F. Smith, "Optimal sequences alignments," Proceedings of the National Academy of Science, Vol. 80, pp. 1382-1386, 1983. [24] J. Gorodkin, S. L. Stricklin, and G. D. Stormo, "Discovering common stem-loop motifs in unaligned RNA sequences," Nucleic Acids Research, Vol. 29, No. 10, pp. 2135-2144, 2001. [25] C. W. V. Hogue, H. Ohkawa, and S. H. Bryant, "A dynamic look at structures: WWW-Entrez and the molecular modeling database," Trends Biochemical Sciences, Vol. 21, pp. 226-229, 1996. [26] L. Holm and C. Sander, "FSSP(Fold classification based on Structure-Structure alignment of Proteins)database." http://www2.ebi.ac.uk/dali/fssp/fssp.html. [27] L. Holm and C. Sander, "Mapping the protein universe," Science, Vol. 273, pp. 595-602, 1996. [28] L. Holm and C. Sander, "Dali/FSSP classification of three-dimensional protein folds," Nucleic Acids Research, Vol. 25, pp. 231-234, 1997. [29] R. J. Jackson and N. Standart, "Do the poly(A) tail and 3' untranslated region control mRNA translation?," Cell, Vol. 62, No. 1, pp. 15-24, 1990. [30] R. Kikuno, T. Nagase, and M. Waki, "HUGE database (a database of human unidentified gene-encoded large proteins analyzed by kazusa human cDNA project)." http://www.kazusa.or.jp/huge/. [31] R. D. King, A. Karwath, A. Clare, and L. Dehaspe, "The utility of different representations of protein sequence for predicting functional class," Bioinformatics, Vol. 17, No. 5, pp. 445-454, 2001. [32] R. D. King, A. Karwath, A. Clear, and L. Dehaspe, "Genome scale prediction of protein functional class from sequence using data mining," The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 384-389, 2000. [33] A. Krogh, B. Larsson, G. von Heijne, and E. L. Sonnhammer, "Predicting transmembrane protein topology with a Hidden Markov Model: Application to complete genomes," Journal of Molecular Biology, Vol. 305, pp. 567-580, 2001. [34] R. C. T. Lee, "Computational biology." http://www.csie.ncnu.edu.tw/, Department of Computer Science and Information Engineering, National Chi-Nan University, 2001. [35] M. C. Lin, C. B. Yang, and K. S. Huang, "Prediction of RNA secondary structures by genetic algorithms," In Prof. of the 6th World Multiconference on Systemics, Cybernetics and Informatics, SCI 2002, Vol. 12, pp. 439-444, 2002. [36] C. L. Lu, Z. Y. Su, and C. Y. Tang, "A new measure of edit distance between labeled trees," Proceedings of the 7th Annual International Computing and Combinatorics Conference (COCOON 2001), Vol. 2108, pp. 338-348, 2001. [37] D. W. Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001. [38] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, "SCOP: A structural classification of proteins database for the ivestigation of sequences and structures," Journal of Molecular Biology, Vol. 247, pp. 536-540, 1995. [39] A. G. Murzin, L. L. Conte, A. Andreeva, D. Howorth, B. G. Ailey, S. E. Brenner, T. J. P. Hubbard, and C. Chothia, "SCOP(Structural Classification of Proteins) database." http://scop.mrc-lmb.cam.ac.uk/scop/. [40] N. I. H. (National Institutes of Health), "Entrez is a search and retrieval system." http://www.ncbi.nlm.nih.gov/Entrez/. [41] N. I. H. (National Institutes of Health), "MMDB-Entrez's structure database." http://www.ncbi.nlm.nih.gov/Structure/. [42] N. I. H. (National Institutes of Health), "NCBI (National Center for Biotechnology Information)." http://www.ncbi.nlm.nih.gov/. [43] M. Norin and M. Sundstrom, "Structural proteomics: Developments in structure-to-function predictions," TRENDS in Biotechnology, Vol. 20, No. 2, pp. 79-84, 2002. [44] C. Notredame, E. O'Brien, and D. G. Higgins, "RAGA: RNA sequence alignment by genetic algorithm," Nucleic Acids Research, Vol. 25, No. 22, pp. 4570-4580, 1997. [45] Y. Okamoto, "Protein folding simulations and structure predictions," Computer Physics Communications, Vol. 142, pp. 55-63, 2001. [46] C. Orengo, A. Mitchie, S. Jones, D. Jones, M. Swindells, and J. Thornton, "The CATH classification scheme of protein domain structural families," Protein Data Bank Quarterly Newsletter, Vol. 78, pp. 8-9, 1996. [47] C. A. Orengo, T. P. Flores, W. R. Taylor, and J. M. Thornton, "Identification and classification of protein fold families," Protein Engineering, Vol. 6, pp. 485-500, 1993. [48] C. Orengo, F. Pearl, J. Antonio, C. Bennett, M. Dibley, A. Harrison, D. Lee, S. Lise, R. Marsden, I. Sillitoe, and A. Todd, "CATH database." http://www.biochem.ucl.ac.uk/bsm/cath/. [49] C. Rick, "Simple and fast linear space computation of longest common subsequences," Information Processing Letters, Vol. 75, pp. 275-281, 2000. [50] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, second ed., 1997. [51] B. A. Shapiro, J. C. Wu, D. Bengali, and M. J. Potts, "The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation," Bioinformatics, Vol. 17, No. 2, pp. 137-148, 2001. [52] L. X. Shen, J. P. Basilion, and V. P. Stanton, "Single-nucleotide polymorphisms can cause different structural folds of mRNA," Proceedings of the National Academy of Sciences, Vol. 96, pp. 7871-7876, 1999. [53] R. Sowdhamini, S. D. Rufino, and T. L. Blundell, "A database of globular protein structural domains: Clustering of representative family members into similar folds," Fold Design, Vol. 1, pp. 209-220, 1996. [54] A. J. Tobin and J. Dusheck, Asking About Life. Harcourt College Publishers, second ed., 2001. [55] M. Vingron and M. S. Waterman, "Sequence alignment and penalty choice: Review of concepts, case studies and implications," Journal of Molecular Biology, Vol. 235, pp. 1-12, 1994. [56] M. S. Waterman and M. Eggert, "A new algorithm for best subsequence alignments with application to tRNA-tRNA comparisons," Journal of Molecular Biology, Vol. 197, pp. 723-728, 1987. [57] M. S. Waterman and T. F. Smith, "RNA secondary structure: A complete mathematical analysis," Mathematical Bioscience, Vol. 42, pp. 257-266, 1978. [58] J. Wuyts, Y. V. de Peer, T. Winkelmans, and R. D. Wachter, "The european database on small subunit ribosomal RNA," Nucleic Acids Research, Vol. 30, pp. 183-185, 2002 [59] J. Wuyts and Y. V. de Peer, "The european ribosomal RNA database." http://www-rrna.uia.ac.be/ssu/index.html. [60] J. Zhu, J. S. Liu, and C. E. Lawrence, "Bayesian adaptive sequence alignment algorithms," Bioinformatics, Vol. 14, pp. 25-39, 1998. [61] M. Zuker and D. Sankoff, "RNA secondary structures and their prediction," Mathematical Bioscience, Vol. 46, pp. 591-621, 1984. [62] M. Zuker and P. Stiegler, "Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information," Nucleic Acids Research, Vol. 9, pp. 133-148, 1981.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內立即公開，校外一年後公開 off campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0812103-131217.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS