Responsive image
博碩士論文 etd-0723104-181009 詳細資訊
Title page for etd-0723104-181009
論文名稱
Title
巢狀弧標記之RNA二級結構比對
The Comparison of RNA Secondary Structures with Nested Arc Annotation
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
61
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2004-07-09
繳交日期
Date of Submission
2004-07-23
關鍵字
Keywords
計算生物、動態規劃、核醣核酸、二級結構、比對
Computational Biology, RNA, Secondary Structures, Comparison, Dynamic Programming
統計
Statistics
本論文已被瀏覽 5688 次,被下載 2310
The thesis/dissertation has been browsed 5688 times, has been downloaded 2310 times.
中文摘要
近年來,RNA的結構比對已經變成生物資訊研究領域的一個重要問題。一般而言,用巢狀弧標記(arc annotation)的方式來表示RNA的結構是個普遍被接受的表示方式。而要比較兩個RNA的結構可以有很多種比較法,比如tree edit distance, longest arc-preserving common subsequence以及stem-based alignment等等。然而,這些比對方法因為時間複雜度較高,所以只能適用於較短的序列。在本論文中,我們提出了一個較精簡的方法來比較兩個RNA的結構,此方法的時間複雜度為O(mn),m及n分別為這兩個RNA結構的序列長度。我們的方法是將RNA的結構先轉換成所謂的「物件序列」,再將這些物件序列進行比對進而得到原來兩個RNA的共同子結構。我們從RNase P Database中下載了118個RNA的結構來針對我們的比對方法進行測試。對於任兩個結構,我們試著用我們的方法進行結構比對,同時也用LCS的方式來進行序列比對,試圖辨識某兩個RNA是否屬於同一個家族。從實驗的結果發現,我們的結構比對法在家族辨識上比傳統的序列比對法擁有更高的準確率以及更快的速率。因此,我們的方法更有生物上的意義,而且在時間複雜度方面也更有效率。
Abstract
In recent years, RNA structural comparison becomes a crucial problem in bioinformatic research. Generally, it is a popular approach for representing the RNA secondary
structures with arc-annotation sets. Several methods can be used to compare two RNA structures, such as tree edit distance, longest arc-preserving common subsequence
(LAPCS) and stem-based alignment. However, these methods may be helpful only for small RNA structures because of their high time complexity. In this thesis, we propose
a simplified method to compare two RNA structures in O(mn)time, where m and n are the lengths of the two RNA sequences, respectively. Our method transforms the RNA
structures into specific sequences called object sequences, then compare these object sequences to find their common substructures. We test our comparison method with 118 RNA structures obtained from RNase P Database. For any two structures, we try to identify whether they are in the same family by both structure comparison and sequence comparison. In our experiment, we find that our method for comparing RNA structures can yield better hit rates and is faster than the traditional method to compare the RNA sequences. Therefore, our approach for comparing RNA secondary structures is more sensitive in biology and more efficient in time complexity.
目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . 0

Chapter 1. Introduction . . . . . . . . . . . . 1
1.1 DNA . . . . . . . . . . . . . . . . . . . . 1
1.2 RNA . . . . . . . . . . . . . . . . . . . . 2
1.3 RNA Structure Representation . . . . . . . . 2
1.4 Pseudoknots in RNA Structures . . . . . . . 5

Chapter 2. Preliminaries . . . . . . . .. . . . 7
2.1 Related Research on RNA Structure . . . . . 7
2.2 Dynamic Programming . . . . . . . . . . . . 8
2.3 The Longest Common Subsequence Problem . . . 10

Chapter 3. Related Algorithms . . . . . . . . . 14
3.1 Hamming Distance . . . . . . . . . . . . . . 14
3.2 Tree Edit Distance . . . . . . . . . . . . . 16
3.3 Longest Arc-Preserving Common Subsequence .. 18
3.4 Stem-Based Alignment .. . . . . . . . . . . 20

Chapter 4. Our Method . . . . . . . . . . . . . 24
4.1 Merging Arcs into an Object . . . . . .. . . 24
4.2 The Object Sequence . . . . . . . . . . . . 25
4.3 LCS in Object Sequences . . . . . . . . . . 27
4.4 Our Match Function . . . . . . . . . . . . . 27
4.5 OLCS with Different Depths . . . . . . . . . 30
4.6 The Additional Sequence Alignment Score . . 30
4.7 Time Complexity and Space Complexity . . . . 31

Chapter 5. Experimental Results . . . . . .. . . 35
5.1 Definition of Similarity . . . . . . . . . . 35
5.2 Family Relationship . . . . . . .. . . . . . 36
5.3 Distribution of Similarity . . . . . . . . . 36
5.4 Family identification . . . . . . . . . . . 37

Chapter 6. Conclusions . . . . . . . . . . . . . . . . . . 57

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . 59
參考文獻 References
[1] “Paul Gardner’s Homepage.” http://www.massey.ac.nz/ ppgardne/.
[2] “PseudoBase.” http://wwwbio.leidenuniv.nl/ Batenburg/PKB.html.
[3] “RNase P Database.” http://www.mbio.ncsu.edu/RNaseP/.
[4] J. Alber, J. Gramm, J. Guo, and R. Niedermeier, “Computing the similarity of two
sequences with nested arc annotations,” Theoretical Computer Science, pp. 337–
358, 2004.
[5] M. Andronescu, A. P. Fejes, F. Hutter, H. H. Hoos, and A. Condon, “A new algorithm
for RNA secondary structure design,” Journal of Molecular Biology, pp. 607–624,
2004.
[6] K. Charter, J. Schaeffer, and D. Szafron, “Sequence alignment using FastLSA,”
Mathematics and Engineering Techniques in Medicine and Biological Sciences,
pp. 239–245, 2000.
[7] Y. Ding and C. E. Lawrence, “A statistical sampling algorithm for RNA secondary
structure prediction,” Nucleic Acids Research, pp. 7280–7301, 2003.
[8] S. Dulucq and L. Tichit, “RNA secondary structure comparison: exact analysis of
the zhang-shasha tree edit algorithm,” Theoretical Computer Science, pp. 471–484,
2003.
[9] S. R. Eddy, “A memory-efficient dynamic programming algorithm for optimal alignment
of a sequence to an RNA secondary structure,” BMC Bioinformatics, pp. 3–18,
2002.
[10] P. Evans, “Algorithms and complexity for annotated sequence analysis.” citeseer.
ist.psu.edu/evans99algorithms.html, Department of Computer Science, University
of Victoria, 1999.
59
[11] W. Gruner, R. Giegerich, D. Storthmann, C. Reidys, J. Weber, I. L. Hofacker, P. F.
Stadler, and P. Schuster, “Analysis of RNA sequence structure maps by exhaustive
enumeration,” Santa Fe Institute Working Papers, 1995.
[12] I. L. Hofacker, M. Fekete, and P. F. Stadler, “Secondary structure prediction for
aligned RNA sequences,” Santa Fe Institute Working Papers, 2001.
[13] I. L. Hofacker, W. Fontana, P. F. Stadler, L. S. Bonhoeffer, M. Tacker, and P. Schuster,
“Fast folding and comparison of RNA secondary structures,” Santa Fe Institute
Working Papers, 1993.
[14] J. Jansson and A. Lingas, “A fast algorithm for optimal alignment between similar
ordered trees,” Proceedings of the 12th Annual Symposium on Combinatorial
Pattern Matching, pp. 232–240, 2001.
[15] T. Jiang, G.-H. Lin, B. Ma, and K. Zhang, “A general edit distance between RNA
structures,” Journal of Computational Biology, pp. 371–388, 2002.
[16] V. Juan and C. Wilson, “RNA secondary structure prediction based on free energy
and phylogenetic analysis,” Journal of Molecular Biology, pp. 935–947, 1999.
[17] R. C. T. Lee, S. S. Tseng, R. C. Chang, and Y. T. Tsai, Introduction to the Design
and Analysis of Algorithms. Unalis Corporation, first ed., 1999.
[18] G.-H. Lin, Z.-Z. Chen, T. Jiang, , and J. Wen, “The longest common subsequence
problem for sequences with nested arc annotations,” International Colloquium on
Automata, Languages and Programming, pp. 444–455, 2001.
[19] G.-H. Lin, B. Ma, and K. Zhang, “Edit distance between two RNA structures,” Research
in Computational Molecular Biology, pp. 211–220, 2001.
[20] T. Liu and B. Schmidt, “Parallel RNA sequence-structure alignment,” High Performance
Computational Biology, 2004.
[21] Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjolander, R. C. Underwood,
and D. Haussler, “Stochastic context-free grammars for tRNA modeling,” Nucleic
Acids Research, pp. 5112–5120, 1994.
[22] J. T. Wang, B. A. Shapiro, D. Shasha, K. Zhang, and K. M. Currey, “An algorithm
for finding the largest approximately common substructures of two trees,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, 1998.
60
[23] M.-Y.Wu, C.-B. Yang, and K.-S. Huang, “RNA secondary structure alignment based
on stem representation,” Proceedings of the 21st Workshop on CombinationalMathematics
and Computation Theory, Taichung, Taiwan, pp. 60–69, 2004.
[24] K. Zhang and D. Shasha, “Simple fast algorithms for the editing distance between
trees and related problems,” SIAM Journal on Computing, pp. 1245–1262, 1989.
[25] K. Zhang, L. Wang, and B. Ma, “Computing similarity between RNA structures,”
Combinatorial Pattern Matching, pp. 281–293, 1999.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code