Responsive image
博碩士論文 etd-0812103-131217 詳細資訊
Title page for etd-0812103-131217
論文名稱
Title
RNA二級結構的對齊方法
RNA Secondary Structure Alignment
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
117
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-07-11
繳交日期
Date of Submission
2003-08-12
關鍵字
Keywords
二級結構、計算生物、對齊、核醣核酸、動態規劃
computational biology, secondary structure, RNA, dynamic programming, alignment
統計
Statistics
本論文已被瀏覽 5662 次,被下載 1906
The thesis/dissertation has been browsed 5662 times, has been downloaded 1906 times.
中文摘要
RNA或蛋白質的比較方法為分子生物研究中之重要基本工具。到目前為止,大部分比較方法僅適用在RNA或蛋白質之一級結構,也就是一般常見到的序列比對與序列對齊方法。從生物的角度來看,結構與功能有著非常密切的關係。目前對於結構探知的方法,主要有核磁共振法、X光繞射法以及透過電腦程式來預測,因此有許多分子是已知其結構但功能尚未被發掘。RNA二級結構的對齊問題,是想要知道兩個RNA二級結構整個對齊之後的情況,以及它們之間的相似程度。另一方面,可以藉由RNA二級結構對齊的方法來協助預測其功能,並加以分類。在本論文中,我們針對兩個RNA二級結構的對齊問題加以研究,在我們的方法中,所處理的二級結構並不包含假扭結。我們設計一個動態規劃演算法來解兩個RNA二級結構的對齊問題,此演算法的時間複雜度為O(N4),其中N為兩個RNA結構的最大區塊個數。我們亦將此演算法應用於人類腺粒體tRNA,藉以評估其實用性。從實驗結果顯示,我們的方法可以有效地評估兩個RNA之結構相似程度,不論其序列是否相似。
Abstract
The comparison methods for RNA or protein molecules are important basic tools in molecular biology. So far, most comparison methods are only applicable to the primary structures of biomolecules, such as the sequence alignment and comparison methods. The functions of biomolecules have close relationship with their structures. The recent methods for finding the structures of biomolecules are NMR spectroscopy, X-ray crystallography, and prediction with computational simulation. There are many biomolecules with known structures, but their functions are unknown. The RNA secondary structure alignment problem is to align two RNA molecules to get the structure similarity, where their secondary structures are given. In addition, it is also helpful to predict the functions of biomolecules and to classify them. In this thesis, we design a dynamic programming method for aligning two RNA secondary structures which do not contain any pseudoknot. The time complexity of our algorithm is O(N4), where N is the number of blocks contained in the given RNA sequences. We also apply our algorithm to the real biomolecules, the tRNAs of Homo sapiens mitochondrion, to evaluate the practicability our method. We take three tRNA genes, TRNG, TRNA and TRNV, to test the performance of our algorithm. From the view point of human eyes, in fact, the structure of TRNG is more similar to TRNA. Our algorithm also gets this result. Hence, our algorithm provides an effective method to measure the similarity of two RNA secondary structures.
目次 Table of Contents
LIST OF FIGURES ............................................................. 4
LIST OF TABLES .............................................................. 7
ABSTRACT .................................................................... 0
Chapter 1. Introduction .................................................... 1
Chapter 2. Preliminaries ................................................... 9
2.1 RNA and cDNA ........................................................... 9
2.2 The Stem .............................................................. 12
Chapter 3. Prediction Methods for the RNA Secondary Structure ............. 17
Chapter 4. The Sequence Alignment ......................................... 24
4.1 The Longest Common Subsequence Problem ................................ 24
4.2 The Sequence Alignment Problem ........................................ 28
4.3 The Local Alignment Problem ........................................... 35
4.4 The Affine Gap Penalty ................................................ 38
Chapter 5. Sequence Alignment for RNA Secondary Structure .................. 43
5.1 Terminologies ......................................................... 43
5.2 An Alignment Algorithm for RNA Secondary Structures ................... 46
5.3 Improvement on Space .................................................. 72
5.4 The Stem Alignment Problem ............................................ 78
5.5 The RNA Secondary Structure Alignment Algorithm ....................... 83
5.6 Analysis of Time Complexity ........................................... 85
Chapter 6. Experimental Results ............................................ 87
Chapter 7. Conclusion ..................................................... 110
BIBLIOGRAPHY .............................................................. 112
參考文獻 References
[1] "3Dee database." http://www.compbio.dundee.ac.uk/3Dee/.
[2] P. Agarwal and D. J. States, "Comparative accuracy of methods for protein
sequence similarity search," Bioinformatics, Vol. 14, pp. 40-47, 1998.
[3] S. F. Altschul, "A protein alignment scoring system sensitive to all evolutionary
distances," Journal of Molecular Evolution, Vol. 36, pp. 290-300, 1993.
[4] S. F. Altschul and G. Gish, "Local alignment statistics," Methods Enzymol,
Vol. 266, pp. 460-480, 1996.
[5] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local
alignment search tool," Journal of Molecular Biology, Vol. 215, pp. 403-410,
1990.
[6] R. Apweiler, P. Stoehr, W. Zhu, and R. Lopez, "Swiss-Prot database."
http://www.ebi.ac.uk/swissprot/index.html.
[7] P. Argos, "A sensitive procedure to compare amino acid sequences," Journal of
Molecular Biology, Vol. 193, pp. 385-396, 1987.
[8] T. K. Attwood and D. J. Parry-Smith, Introduction to Bioinformatics. Prentice
Hall, 1999.
[9] A. D. Baxevanis, "The molecular biology database collection: 2003 update,"
Nucleic Acids Research, Vol. 31, No. 1, pp. 1-12, 2003.
[10] S. E. Brenner, C. Chothia, T. Hubbard, and A. Murzin, "Understanding protein
structure: Using SCOP for fold interpretation," Methods Enzymol, Vol. 266,
pp. 635-643, 1996.
[11] S. E. Brenner, C. Chothia, and T. J. Hubbard, "Population statistics of protein
structures: Lessons from structural classifications," Current Opinion in
Structural Biology, Vol. 7, pp. 369-376, 1997.
[12] C. Bystroff and Y. Shao, "Fully automated ab initio protein structure prediction
using I-SITES, HMMSTR and ROSETTA," Bioinformatics, Vol. 18, pp. 54-61, 2002.
[13] R. Casadio, I. Jacoboni, A. Messina, and V. D. Pinto, "A 3D model of the
voltage-dependent anion channel (VDAC)," FEBS Letters, Vol. 520, pp. 1-7, 2002.
[14] V. Chvatal and D. Sankoff, "Longest common subsequences of two random
sequences," J. Appl. Probab., Vol. 12, pp. 306-315, 1975.
[15] J. F. Collins, A. F. Coulson, and A. Lyall, "The significance of protein sequence
similarities," Computer Applications in the Biosciences, Vol. 4, pp. 67-71, 1988.
[16] L. L. Conte, S. E. Brenner, T. J. P. Hubbard, C. Chothia, and A. Murzin,
"SCOP database in 2002: Refinements accommodate structural genomics,"
Nucleic Acids Research, Vol. 30, No. 1, pp. 264-267, 2002.
[17] F. Corpet and B. Michot., "RNAlign program: Alignment of RNA sequences
using both primary and secondary structures," Computer Applications in the
Biosciences, Vol. 10, No. 4, pp. 389-399, 1994.
[18] J. E. Darnell, L. Philipson, R. Wall, and M. Adesnik, "Polyadenylic acid sequences:
Role in conversion of nuclear RNA into messenger RNA," Science,
Vol. 174, No. 8, pp. 507-510, 1971.
[19] Y. V. de Peer, J. Jansen, P. D. Rijk, and R. D. Wachter, "Database on the
structure of small ribosomal subunit RNA," Nucleic Acids Research, Vol. 25,
pp. 111-116, 1997.
[20] L. Dehaspe, H. Toivonen, and R. D. King, "Finding frequent substructures
in chemical compounds," Proceeding of the 4th International Conference on
Knowledge Discovery and Data Mining (KDD-98), pp. 30-36, 1998.
[21] R. Duda and P. Hart, Pattern Classification and Scene Analysis. John Wiley,
New York, 1973.
[22] S. R. Eddy and R. Durbin, "RNA sequence analysis using covariance models,"
Nucleic Acids Research, Vol. 22, pp. 2079-2088, 1994.
[23] W. M. Fitch and T. F. Smith, "Optimal sequences alignments," Proceedings of
the National Academy of Science, Vol. 80, pp. 1382-1386, 1983.
[24] J. Gorodkin, S. L. Stricklin, and G. D. Stormo, "Discovering common stem-loop
motifs in unaligned RNA sequences," Nucleic Acids Research, Vol. 29, No. 10,
pp. 2135-2144, 2001.
[25] C. W. V. Hogue, H. Ohkawa, and S. H. Bryant, "A dynamic look at structures:
WWW-Entrez and the molecular modeling database," Trends Biochemical Sciences,
Vol. 21, pp. 226-229, 1996.
[26] L. Holm and C. Sander, "FSSP(Fold classification based on Structure-Structure
alignment of Proteins)database." http://www2.ebi.ac.uk/dali/fssp/fssp.html.
[27] L. Holm and C. Sander, "Mapping the protein universe," Science, Vol. 273,
pp. 595-602, 1996.
[28] L. Holm and C. Sander, "Dali/FSSP classification of three-dimensional protein
folds," Nucleic Acids Research, Vol. 25, pp. 231-234, 1997.
[29] R. J. Jackson and N. Standart, "Do the poly(A) tail and 3' untranslated region
control mRNA translation?," Cell, Vol. 62, No. 1, pp. 15-24, 1990.
[30] R. Kikuno, T. Nagase, and M. Waki, "HUGE database (a database of human
unidentified gene-encoded large proteins analyzed by kazusa human cDNA
project)." http://www.kazusa.or.jp/huge/.
[31] R. D. King, A. Karwath, A. Clare, and L. Dehaspe, "The utility of different
representations of protein sequence for predicting functional class," Bioinformatics,
Vol. 17, No. 5, pp. 445-454, 2001.
[32] R. D. King, A. Karwath, A. Clear, and L. Dehaspe, "Genome scale prediction
of protein functional class from sequence using data mining," The Sixth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
pp. 384-389, 2000.
[33] A. Krogh, B. Larsson, G. von Heijne, and E. L. Sonnhammer, "Predicting
transmembrane protein topology with a Hidden Markov Model: Application to
complete genomes," Journal of Molecular Biology, Vol. 305, pp. 567-580, 2001.
[34] R. C. T. Lee, "Computational biology." http://www.csie.ncnu.edu.tw/,
Department of Computer Science and Information Engineering, National Chi-Nan
University, 2001.
[35] M. C. Lin, C. B. Yang, and K. S. Huang, "Prediction of RNA secondary
structures by genetic algorithms," In Prof. of the 6th World Multiconference on
Systemics, Cybernetics and Informatics, SCI 2002, Vol. 12, pp. 439-444, 2002.
[36] C. L. Lu, Z. Y. Su, and C. Y. Tang, "A new measure of edit distance between
labeled trees," Proceedings of the 7th Annual International Computing
and Combinatorics Conference (COCOON 2001), Vol. 2108, pp. 338-348, 2001.
[37] D. W. Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring
Harbor Laboratory Press, 2001.
[38] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, "SCOP: A structural
classification of proteins database for the ivestigation of sequences and
structures," Journal of Molecular Biology, Vol. 247, pp. 536-540, 1995.
[39] A. G. Murzin, L. L. Conte, A. Andreeva, D. Howorth, B. G. Ailey, S. E. Brenner,
T. J. P. Hubbard, and C. Chothia, "SCOP(Structural Classification of Proteins)
database." http://scop.mrc-lmb.cam.ac.uk/scop/.
[40] N. I. H. (National Institutes of Health), "Entrez is a search and retrieval
system." http://www.ncbi.nlm.nih.gov/Entrez/.
[41] N. I. H. (National Institutes of Health), "MMDB-Entrez's structure database."
http://www.ncbi.nlm.nih.gov/Structure/.
[42] N. I. H. (National Institutes of Health), "NCBI (National Center for
Biotechnology Information)." http://www.ncbi.nlm.nih.gov/.
[43] M. Norin and M. Sundstrom, "Structural proteomics: Developments in
structure-to-function predictions," TRENDS in Biotechnology, Vol. 20, No. 2,
pp. 79-84, 2002.
[44] C. Notredame, E. O'Brien, and D. G. Higgins, "RAGA: RNA sequence alignment
by genetic algorithm," Nucleic Acids Research, Vol. 25, No. 22, pp. 4570-4580, 1997.
[45] Y. Okamoto, "Protein folding simulations and structure predictions," Computer
Physics Communications, Vol. 142, pp. 55-63, 2001.
[46] C. Orengo, A. Mitchie, S. Jones, D. Jones, M. Swindells, and J. Thornton,
"The CATH classification scheme of protein domain structural families," Protein
Data Bank Quarterly Newsletter, Vol. 78, pp. 8-9, 1996.
[47] C. A. Orengo, T. P. Flores, W. R. Taylor, and J. M. Thornton, "Identification
and classification of protein fold families," Protein Engineering, Vol. 6, pp. 485-500, 1993.
[48] C. Orengo, F. Pearl, J. Antonio, C. Bennett, M. Dibley, A. Harrison,
D. Lee, S. Lise, R. Marsden, I. Sillitoe, and A. Todd, "CATH database."
http://www.biochem.ucl.ac.uk/bsm/cath/.
[49] C. Rick, "Simple and fast linear space computation of longest common subsequences,"
Information Processing Letters, Vol. 75, pp. 275-281, 2000.
[50] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology.
PWS Publishing Company, Boston, second ed., 1997.
[51] B. A. Shapiro, J. C. Wu, D. Bengali, and M. J. Potts, "The massively parallel
genetic algorithm for RNA folding: MIMD implementation and population
variation," Bioinformatics, Vol. 17, No. 2, pp. 137-148, 2001.
[52] L. X. Shen, J. P. Basilion, and V. P. Stanton, "Single-nucleotide polymorphisms
can cause different structural folds of mRNA," Proceedings of the National
Academy of Sciences, Vol. 96, pp. 7871-7876, 1999.
[53] R. Sowdhamini, S. D. Rufino, and T. L. Blundell, "A database of globular
protein structural domains: Clustering of representative family members into
similar folds," Fold Design, Vol. 1, pp. 209-220, 1996.
[54] A. J. Tobin and J. Dusheck, Asking About Life. Harcourt College Publishers,
second ed., 2001.
[55] M. Vingron and M. S. Waterman, "Sequence alignment and penalty choice:
Review of concepts, case studies and implications," Journal of Molecular Biology,
Vol. 235, pp. 1-12, 1994.
[56] M. S. Waterman and M. Eggert, "A new algorithm for best subsequence alignments
with application to tRNA-tRNA comparisons," Journal of Molecular
Biology, Vol. 197, pp. 723-728, 1987.
[57] M. S. Waterman and T. F. Smith, "RNA secondary structure: A complete
mathematical analysis," Mathematical Bioscience, Vol. 42, pp. 257-266, 1978.
[58] J. Wuyts, Y. V. de Peer, T. Winkelmans, and R. D. Wachter, "The european
database on small subunit ribosomal RNA," Nucleic Acids Research, Vol. 30,
pp. 183-185, 2002
[59] J. Wuyts and Y. V. de Peer, "The european ribosomal RNA database."
http://www-rrna.uia.ac.be/ssu/index.html.
[60] J. Zhu, J. S. Liu, and C. E. Lawrence, "Bayesian adaptive sequence alignment
algorithms," Bioinformatics, Vol. 14, pp. 25-39, 1998.
[61] M. Zuker and D. Sankoff, "RNA secondary structures and their prediction,"
Mathematical Bioscience, Vol. 46, pp. 591-621, 1984.
[62] M. Zuker and P. Stiegler, "Optimal computer folding of large RNA sequences
using thermodynamics and auxiliary information," Nucleic Acids Research, Vol. 9, pp. 133-148, 1981.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code