國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,多序列之共同子序列問題及其應用,Some Common Subsequence Problems of Multiple Sequences and Their Applications

論文名稱 Title	多序列之共同子序列問題及其應用 Some Common Subsequence Problems of Multiple Sequences and Their Applications
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	95 學年度第 2 學期 The spring semester of Academic Year 95	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	128
研究生 Author	黃國璽 Kuo-Si Huang
指導教授 Advisor	楊昌彪 Chang-Biau Yang
召集委員 Convenor	洪西進 Shi-Jinn Horng
口試委員 Advisory Committee	林耀鈴, 陳榮傑, 蔣榮先, 唐傳義, 李宗南, 趙坤茂 Yaw-Ling Lin; Rong-Jaye Chen; Jung-Hsien Chiang; Chuan Yi Tang; Chungnan Lee; Kun-Mao Chao
口試日期 Date of Exam	2007-06-28	繳交日期 Date of Submission	2007-07-14
關鍵字 Keywords	嵌合序列、演算法、多序列、融合序列、最長共同子序列 algorithm, multiple sequences, merged sequence, mosaic sequence, longest common subsequence
統計 Statistics	本論文已被瀏覽 5847 次，被下載 1648 次 The thesis/dissertation has been browsed 5847 times, has been downloaded 1648 times.

中文摘要
最長共同子序列問題為資訊科學與分子生物學之經典問題。共同子序列可明確指出多序列間的相似部份藉以了解序列間之關係。本論文著重在k條序列間之最長共同子序列問題 (k-LCS problem)、融合最長共同子序列問題 (merged LCS problem)與嵌合最長共同子序列問題(mosaic LCS problem)之研究。此三問題之目的分別為：找出多序列間之共同子序列；找出目標序列與由兩序列組成之融合序列間的序列交錯關係；找出目標序列與給定之序列集合中之序列嵌合關係。給定k條序列，k條序列間之最長共同子序列問題之目標為找出這k條序列中之最長共同子序列。關於此問題，本論文提出了兩個保證最佳解至多為找出解之sigma倍之近似解演算法，其時間複雜度分別為O(sigma k n)與O(sigma^2 k n + sigma^3 n)，其中sigma為字母集合之大小、n為序列長度。由於此演算法所需之時間與空間複雜度較低，故可於資料庫搜尋時負責篩選出候選序列，並可應用於多序列之對齊與親緣樹之重建。給定目標序列T與待融合序列A與B，融合最長共同子序列問題為找出T與由A與B交錯組成之最佳融合序列兩者間之最長共同子序列。其目標是找出A與B之間如何融合藉以了解T、A與B三序列間之交錯關係。我們首先提出了時間複雜度為O(n^3)的演算法解決此問題，其中n為序列長度。接著我們考慮生物序列之區塊資訊並提出相對應之區塊融合最長共同子序列問題。為解決此新問題，我們提出時間複雜度為O(n^2 m_b)之演算法，其中mb為序列中區塊之個數。利用S-table之資料結構與技巧，我們接著提出時間複雜度為O(n^2 + n m_b^2)之演算法以改進原來的演算法。另外，為希望了解目標序列與序列集合間之嵌合關係，我們提出了嵌合最長共同子序列問題。此問題針對給定之目標序列T與序列集合S，從序列集合S中選擇k條可重複序列藉以組合出最佳嵌合序列C，使得T與C之間的共同子序列最長。考慮序列T上之可能斷點，我們提出了時間複雜度為O(n^2 m \|S\|+n^3 log k)之演算法，其中n為序列T之長度，m為序列集合S中最長序列之長度。利用前處理的技巧與S-table資料結構，我們亦提出時間複雜度為O(n(m+k)\|S\|)之演算法解決此問題。
Abstract
The longest common subsequence (LCS) problem is a famous and classical problem in computer science and molecular biology. The common subsequence of multiple sequences shows the identical and similar parts in these sequences. This dissertation pays attention to the approximate algorithms for finding the LCS of $k$ input sequence ($k$-LCS problem), the merged LCS problem, and the mosaic LCS problem. These three problems try to hunt out the identical relationships among the $k$ sequences, the interleaving relationship between a target sequence and a merged sequence of a pair of sequences, and the mosaic relationship between a target sequence and a set of sequences, respectively. Given $k$ input sequences, the $k$-LCS problem is to find the LCS which is common in all sequences. We first propose two $sigma$-approximate algorithms for the $k$-LCS problem with time complexities $O(sigma k n)$ and $O(sigma^{2} k n + sigma^{3} n)$ respectively, where $sigma$ and $n$ are the alphabet size and length of sequences, respectively. Experimental results show that our algorithms for 2-LCS could be a good filter to select the candidate sequences in database searching. Given a target sequence $T$ and a pair of merging sequences $A$ and $B$, the merged LCS problem is to find the LCS of $T$ and the optimally merged sequence by merging $A$ and $B$ alternately. Its goal is to find a merging way for understanding the interleaving relationship of sequences. We first propose an algorithm with $O(n^{3})$ time for solving the problem, where $n$ is the sequence length. We further add the block information of input sequences in the blocked merged LCS problem. To solve the latter problem, we propose an algorithm with time complexity $O(n^{2}m_{b})$, where $m_{b}$ is the number of blocks. Based on the S-table technique, we can design an improved algorithm with $O(n^{2} + nm_{b}^{2})$ time. Additionally, we desire to obtain the relationship between one sequence and a set of sequences. Given a target sequence $T$ and a set $S$ of source sequences, the mosaic LCS problem is to find the LCS of $T$ and a mosaic sequence $C$, composed of repeatable $k$ sequences in $S$. Based on the concept of break points in $T$, a divide and conquer algorithm is proposed with $O(n^2m\|S\|+ n^3log k)$ time, where $n$ and $m$ are the lengths of $T$ and the maximal length of sequences in $S$, respectively. Again, based on the S-table technique, an improved algorithm with $O(n(m+k)\|S\|)$ time is proposed by applying an efficient preprocessing.

目次 Table of Contents
1 Introduction .......... 1 2 Preliminaries .......... 8 2.1 Notations ......... 9 2.2 The Longest Common Subsequence Problem ......... 10 2.2.1 Dynamic Programming Algorithm for 2-LCS ......... 12 2.2.2 Linear Space Algorithms for 2-LCS ......... 14 2.3 The Sequence Alignment Problem ......... 16 2.4 Phylogeny Reconstruction ......... 21 2.4.1 Maximum Parsimony Tree ......... 23 2.4.2 Maximum Likelihood Tree ......... 27 2.5 Longest Common Subsequence Problems with Constraints ... 29 2.5.1 The Constrained Longest Subsequence Problem ..... 29 2.5.2 The Longest Increasing Subsequence Problem ..... 30 2.6 S-table and the Linear Time Merging Algorithm of LCS ... 31 3 Finding a Common Subsequence of Multiple Sequences 38 3.1 Motivation .................. 39 3.2 Related Works .................. 41 3.2.1 The Long Run Algorithm ......... 41 3.2.2 The Expansion Algorithm ......... 42 3.2.3 The Best Next Algorithm ......... 43 3.3 Our Algorithms for k-LCS ......... 44 3.4 Experimental Results ......... 48 3.5 Multiple Sequence Alignment Based on k-LCS ......... 51 3.6 Phylogeny Reconstruction Based on k-LCS ......... 56 3.7 Summary ......... 58 4 The Merged Longest Common Subsequence Problem 60 4.1 Background ......... 61 4.2 The Merged LCS Problem ......... 64 4.3 The Blocked Merged LCS Problem ......... 68 4.4 Result and Discussion ......... 74 4.5 Summary ......... 77 5 The Mosaic Longest Common Subsequence Problem ... 80 5.1 Background ......... 81 5.2 Algorithms for the Chimeric and Mosaic Alignment Problems ... 84 5.2.1 The 2-chimeric Alignment Problem ......... 84 5.2.2 The 2-mosaic Alignment Problem ......... 88 5.2.3 The k-mosaic Alignment Problem ......... 90 5.3 The Mosaic LCS Problem ......... 92 5.4 Algorithms for the Mosaic LCS Problem ......... 92 5.5 Summary ......... 99 6 Conclusion ......... 100 BIBLIOGRAPHY ......... 103 INDEX ......... 115

參考文獻 References
[1] E. N. Adams III, “Consensus techniques and the comparison of taxonomic trees,” Systematic Zoology, Vol. 21, pp. 390–397, 1972. [2] E. N. Adams III, “N-trees as nestings: complexity, similarity, and consensus,” Journal of Classification, Vol. 3, pp. 299–317, 1986. [3] A. Aggarwal, M. M. Klawe, S. Moran, P. Shor, and R. Wilber, “Geometric applications of a matrix-searching algorithm,” Algorithmica, Vol. 2, No. 1, pp. 195–208, 1987. [4] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, Vol. 215, pp. 403–410, 1990. [5] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, Vol. 25, pp. 3389–3402, 1997. [6] A. Apostolico and C. Guerra, “The longest common subsequences problem revisited,” Algorithmica, Vol. 18, pp. 1–11, 1987. [7] B. S. Baker and R. Giancarlo, “Sparse dynamic programming for longest common subsequence from fragments,” Journal of Algorithms, Vol. 42, No. 2, pp. 231–254, 2002. [8] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, “GenBank,” Nucleic Acids Research, Vol. 34, pp. D16–D20, 2006. [9] L. Bergroth, H. Hakonen, and T. Raita, “New approximation algorithms for longest common subsequences,” Proceedings of String Processing and Information Retrieval: A South American Symposium, SPIRE 1998, pp. 32–40, 1998. [10] L. Bergroth, H. Hakonen, and T. Raita, “A survey of longest common subsequence algorithms,” Proceedings of Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, pp. 39–48, 2000. [11] P. Bonizzoni, G. D. Vedova, and G. Mauri, “Experimenting an approximation algorithm for the LCS,” Discrete Applied Mathematics, Vol. 110, No. 1, pp. 13–24, 2001. [12] D. Bryant, “New approximation algorithms for longest common subsequences,” M. Janowitz, F. J. Lapointe, F. McMorris, B. Mirkin, F. Roberts (eds), Bioconsensus, DIMACS-AMS, pp. 163–184, 2003. [13] P. Chain, S. Kurtz, E. Ohlebusch, and T. Slezak, “An applications focused review of comparative genomics tools: capabilities, limitations, and future challenges,” Briefings in Bioinformatics, Vol. 4, pp. 105–123, 2003. [14] F. Y. L. Chin, A. D. Santis, A. L. Ferrara, N. L. Ho, and S. K. Kim, “A simple algorithm for the constrained sequence problems,” Information Processing Letters, Vol. 90, No. 4, pp. 175–179, 2004. [15] J. R. Cole, B. Chai, T. L. Marsh, R. J. Farris, Q. Wang, S. A. Kulam, S. Chandra, D. M. McGarrell, T. M. Schmidt, G. M. Garrity, and J. M. Tiedje, “The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy,” Nucleic Acids Research, Vol. 31, No. 1, pp. 442–443, 2003. [16] W. H. E. Day, D. S. Johnson, and D. Sankoff, “The computational complexity of inferring rooted phylogenies by parsimony,” Mathematical Biosciences, Vol. 81, pp. 33–42, 1986. [17] W. H. E. Day and D. Sankoff, “Computational complexity of inferring phylogenies by compatibility,” Systematic Zoology, Vol. 35, pp. 224–229, 1986. [18] L. A. Delcher, S. Kasif, A. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg, “Alignment of whole genomes,” Nucleic Acids Research, Vol. 27, No. 11, pp. 2369–2376, 1999. [19] M. Elloumi, “Comparison of strings belonging to the same family,” Information Sciences, Vol. 111, pp. 49–63, 1998. [20] M. Farach, S. Kannan, and T. Warnow, “A robust model for finding optimal evolutionary trees,” Algorithmica, Vol. 13, pp. 155–179, 1995. [21] J. Felsenstein, “Evolutionary trees from DNA sequences: a maximum likelihood approach,” Journal of Molecular Evolution, Vol. 17, pp. 368–376, 1981. [22] W. M. Fitch, “Toward defining the course of evolution: minimum change for a specified tree topology,” Systematic Zoology, Vol. 20, pp. 406–416, 1971. [23] W. M. Fitch and E. Margoliash, “Construction of phylogenetic trees,” Science, Vol. 155, pp. 279–284, 1967. [24] L. R. Foulds and R. L. Graham, “The steiner problem in phylogeny is NP-complete,” Advances in Applied Mathematics, Vol. 3, pp. 43–49, 1982. [25] C. B. Fraser, Subsequences and supersequences of strings. University of Glasgow, Computing Science Department Research Report, TR-1995-16, 1995. [26] K. A. Frazer, L. Elnitski, D. M. Church, I. Dubchak, and R. C. Hardison, “Cross-species sequence comparisons: A review of methods and available resources,” Genome Research, Vol. 13, pp. 1–12, 2003. [27] L. Gasieniec, J. Jansson, A. Lingas, and A. Ostlin, “On the complexity of constructing evolutionary trees,” Journal of Combinatorial Optimization, Vol. 3, pp. 183–197, 1999. [28] C. W. Gibson, N. H. Thomson, W. R. Abrams, and J. Kirkham, “Nested genes: Biological implications and use of AFM for analysis,” Gene, Vol. 350, No. 1, pp. 15–23, 2005. [29] O. Gotoh, “Multiple sequence alignment: Algorithms and applications,” Advances In Biophysics, Vol. 36, pp. 159–206, 1999. [30] D. Grauer and W. H. Li, Fundamentals of molecular evolution. Sunderland: Sinauer Associates, Inc., 2000. [31] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge Press, NY, 1997. [32] K. Hakata and H. Imai, “The longest common subsequence problem for small alphabet size between many strings,” Proceedings of the Third International Symposium on Algorithms and Computation, Lecture Notes in Computer Science 650, Springer Verlag, pp. 469–478, 1992. [33] M. D. Hendy and D. Penny, “Branch and bound algorithms to determine minimal evolutionary trees,” Mathematical Biosciences, Vol. 59, pp. 277–290, 1982. [34] U. G. Hinz, J. Fivaz, P. A. Girod, and J. P. Zyrd, “The gene coding for the DOPA dioxygenase involved in betalain biosynthesis in amanita muscaria and its regulation,” Molecular and General Genetics, Vol. 256, No. 1, pp. 1–6, 1997. [35] D. S. Hirschberg, “A linear space algorithm for computing maximal common subsequence,” Communications of the ACM, Vol. 18, No. 6, pp. 341–343, 1975. [36] D. S. Hirschberg, “Algorithms for the longest common subsequence problem,” Journal of ACM, Vol. 24, pp. 664–675, 1977. [37] K. Hokamp, A. McLysaght, and K. H. Wolfe, “The 2R hypothesis and the human genome sequence,” Journal of Structural and Functional Genomics, Vol. 3, No. 1-4, pp. 95–110, 2003. [38] K. F. Huang, C. B. Yang, and K. T. Tseng, “An efficient algorithm for multiple sequence alignment,” Proc. of the 19th Workshop on Combinatorial Mathematics and Computation Theory, pp. 50–59, 2002. [39] K.-S. Huang, C.-B. Yang, K.-T. Tseng, Y.-H. Peng, and H.-Y. Ann, “Dynamic programming algorithms for the mosaic longest common subsequence problem,” Information Processing Letters, Vol. 102, pp. 99–103, 2007. [40] T. Huber, G. Faulkner, and P. Hugenholtz, “Bellerophon: a program to detect chimeric sequences in multiple sequence alignments,” Bioinformatics, Vol. 20, No. 14, pp. 2317–2319, 2004. [41] P. Hugenholtz, B. M. Goebel, and N. R. Pace, “Impact of culture independent studies on the emerging phylogenetic view of bacterial diversity,” Journal of Bacteriology, Vol. 180, No. 18, pp. 4765–4774, 1998. [42] P. Hugenholtz and T. Huber, “Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases,” International Journal of Systematic and Evolutionary Microbiology, Vol. 53, pp. 289–293, 2003. [43] J. W. Hunt and T. G. Szymanski, “A fast algorithm for computing longest common subsequences,” Communications of the ACM, Vol. 20, No. 5, pp. 350–353, 1977. [44] R. W. Irving and C. B. Fraser, “Two algorithms for the longest common subsequence of three (or more) strings,” Proceedings of CPM’92, the Fourth Annual Symposium on Combinatorial Pattern Matching, Arizona, Lecture Notes in Computer Science 644, Springer Verlag, pp. 214–229, 1992. [45] O. Jaillon, J.-M. Aury, F. Brunet, J.-L. Petit, N. Stange-Thomann, E. Mauceli, L. Bouneau, C. Fischer, C. Ozouf-Costaz, A. Bernot, S. Nicaud, D. Jaffe, S. Fisher, G. Lutfalla, C. Dossat, B. Segurens, C. Dasilva, M. Salanoubat, M. Levy, N. Boudet, S. Castellano, V. Anthouard, C. Jubin, V. Castelli, M. Katinka, B. Vacherie, C. Biemont, Z. Skalli, L. Cattolico, J. Poulain, V. de Berardinis, C. Cruaud, S. Duprat, P. Brottier, J.-P. Coutanceau, J. Gouzy, G. Parra, G. Lardier, C. Chapple, K. J.McKernan, P.McEwan, S. Bosak, M. Kellis, J.-N. Volff, R. Guigo, M. C. Zody, J. Mesirov, K. Lindblad-Toh, B. Birren, C. Nusbaum, D. Kahn, M. Robinson-Rechavi, V. Laudet, V. Schachter, F. Quetier, W. Saurin, C. Scarpelli, P. Wincker, E. S. Lander, J. Weissenbach, and H. R. Crollius, “Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate protokaryotype,” Nature, Vol. 431, No. 7011, pp. 946–957, 2004. [46] T. Jiang and M. Li, “On the approximation of shortest common supersequences and longest common subsequences,” SIAM Journal on Computing, Vol. 24, pp. 1122–1139, 1995. [47] T. Johtela, J. Smed, H. Hakonen, and T. Raita, “An efficient heuristic for the LCS problem,” Third South American Workshop on String Processing, WSP’96, pp. 126–140, 1996. [48] T. H. Jukes and C. R. Cantor, “Evolution of protein molecules,” Mammalian Protein Metabolism, H. N. Munro, ed., vol. III, Academic Press, New York, pp. 21–132, 1969. [49] J. D. Kececioglu, H. P. Lenhof, K.Mehlhorn, P.Mutzel, K. Reinert, and M. Vingron, “A polyhedral approach to sequence alignment problems,” Discrete Applied Mathematics, Vol. 104, pp. 143–186, 2000. [50] M. Kellis, B. W. Birren, and E. S. Lander, “Proof and evolutionary analysis of ancient genome duplication in the yeast saccharomyces cerevisiae,” Nature, Vol. 428, No. 6983, pp. 617–624, 2004. [51] N. Kim, S. Shin, K.-H. Cho, and S. Lee, “ChimerDB - database of chimeric sequences in the GenBank,” Genomics & Informatics, Vol. 2, No. 2, pp. 61–66, 2004. [52] B. Kolaczkowski and J. W. Thornton., “Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous,” Nature, Vol. 431, No. 7011, pp. 980–984, 2004. [53] G. A. Komatsoulis and M. S. Waterman, “Chimeric alignment by dynamic programming: algorithm and biological uses,” RECOMB ’97: Proceedings of the first annual international conference on computational molecular biology, New York, NY, USA, pp. 174–180, ACM Press, 1997. [54] G. A. Komatsoulis and M. S. Waterman, “A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations,” Applied Environmental Microbiology, Vol. 63, No. 6, pp. 2338–2346, June 1997. [55] E. D. Kopczynski, M. M. Bateson, and D. M. Ward, “Recognition of chimeric small-subunit ribosomal DNAs composed of genes from uncultivated microorganisms,” Applied and Environmental Microbiology, Vol. 60, No. 2, pp. 746–748, 1994. [56] G. M. Landau, E. Myers, and M. Ziv-Ukelson, “Two algorithms for LCS consecutive suffix alignment,” Combinatorial Pattern Matching, Lecture Notes in Computer Science 3109, Springer Berlin / Heidelberg, pp. 173–193, 2004. [57] G. M. Landau, B. Schieber, and M. Ziv-Ukelson, “Sparse LCS common substring alignment,” Information Processing Letters, Vol. 88, No. 6, pp. 259–270, 2003. [58] G. M. Landau and M. Ziv-Ukelson, “On the common substring alignment problem,” Journal of Algorithms, Vol. 41, No. 2, pp. 338–354, 2001. [59] B. Laundrie, J. S. Peterson, J. S. Baum, J. C. Chang, D. Fileppo, S. R. Thompson, and K. McCall, “Germline cell death is inhibited by P-element insertions disrupting the dcp-1/pita nested gene pair in drosophila,” Genetics, Vol. 165, pp. 1881–1888, 2003. [60] W. Liesack, H. Weyland, and E. Stackebrandt, “Potential risks of gene amplification by PCR as determined by 16S rDNA analysis of a mixed-culture of strict barophilic bacteria,” Microbial Ecology, Vol. 21, pp. 191–198, 1991. [61] D. Maier, “The complexity of some problems on subsequences and supersequences,” Journal of the ACM, Vol. 25, No. 2, pp. 322–336, 1978. [62] T. Margush and F. R. McMorris, “Consensus n-trees,” Bulletin of Mathematical Biology, Vol. 43, pp. 239–244, 1981. [63] C. R. Marshall, “Statistical and computational problems in reconstructing evolutionary histories from DNA data,” Computing Science and Statistics, Vol. 29, No. 2, pp. 218–226, 1997. [64] W. J. Masek and M. S. Paterson, “A faster algorithm computing string edit distances,” Journal of Computer and System Sciences, Vol. 20, pp. 18–31, 1980. [65] M. J. Mauro and B. J. Druker, “STI571: targeting BCR-ABL as therapy for CML,” Oncologist, Vol. 6, No. 3, pp. 233–238, 2001. [66] C. D. Michener and R. R. Sokal, “A quantitative approach to a problem in classification,” Evolution, Vol. 11, pp. 130–162, 1957. [67] F. Mitelman, “Recurrent chromosome aberrations in cancer,” Mutation Research, Vol. 462, No. 2-3, pp. 247–253, 2000. [68] B. M. E. Moret, L.-S. Wang, and T. Warnow, “Towards new software for computational phylogenetics,” Computer, Vol. 35, pp. 55–64, 2002. [69] B. M. E. Moret and T. Warnow, “Reconstructing optimal phylogenetic trees: A challenge in experimental algorithmics,” Experimental Algorithmics, Lecture Notes in Computer Sscience 2547, Springer Verlag, pp. 163–180, 2002. [70] B. Morgenstern, “A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences,” Applied Mathematics Letters, Vol. 15, pp. 11–16, 2002. [71] D. W. Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001. [72] D. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, Vol. 48, No. 3, pp. 443–453, 1970. [73] M. Ninio, E. Privman, T. Pupko, and N. Friedman, “Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using bayesian inference of evolutionary rates,” Bioinformatics, Vol. 23, No. 2, pp. e136–e141, 2007. [74] S. Ohno, Evolution by Gene Duplication. Springer-Verlag, Heidelberg, 1970. [75] N. R. Pace, “A molecular view of microbial diversity and the biosphere,” Science, Vol. 276, pp. 734–740, 1997. [76] G. Panopoulou and A. J. Poustka, “Timing and mechanism of ancient vertebrate genome duplications - the adventure of a hypothesis,” TRENDS in Genetics, Vol. 21, No. 10, pp. 559–567, 2005. [77] M. Paterson and V. Dancik, “Longest common subsequence,” Proceedings of the 19th Mathematical Foundations of Computer Science (MFCS), LNCS 841, pp. 127–142, 1994. [78] W. R. Pearson and D. Lipman, “Improved tools for biological sequence comparison,” Proceedings of the National Academy of Sciences, Vol. 85, pp. 2444–2448, 1988. [79] W. R. Pearson, G. Robins, and T. Zhang, “Generalized neighbor-joining: more reliable phylogenetic tree reconstruction,” Molecular Biology and Evolution, Vol. 16, pp. 806–816, 1999. [80] C.-L. Peng, “An approach for solving the constrained longest common subsequence problem,” Master Thesis, Department of Computer Science and Engineering, National Sun Yat-sen University, Taiwan, July 2003. [81] X. Qiu, L.Wu, H. Huang, P. E. McDonel, A. V. Palumbo, J. M. Tiedje, and J. Zhou, “Evaluation of PCR-generated chimeras: mutations, and heteroduplexes with 16S rRNA gene-based cloning,” Applied and Environmental Microbiology, Vol. 67, No. 2, pp. 880–887, 2001. [82] R. Ravi and J. D. Kececioglu, “Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree,” Discrete Applied Mathematics, Vol. 88, No. 1-3, pp. 355–366, 1998. [83] K. Reinert, H. P. Lenhof, P. Mutzel, K. Mehlhorn, and J. Kececioglu, “A branch-and-cut algorithm for multiple sequence alignment,” Proceedings of the 1st ACM Conference on Computational Melecular Biology, pp. 241–249, 1997. [84] C. Rick, “Simple and fast linear space computation of longest common subsequences,” Information Processing Letters, Vol. 75, pp. 275–281, 2000. [85] J. F. Robinson-Cox, M. M. Bateson, and D. M. Ward, “Evaluation of nearest-neighbor methods for detection of chimeric small-subunit rRNA sequences,” Applied and Environmental Microbiology, Vol. 61, No. 4, pp. 1240–1245, 1995. [86] A. Romani, E. Guerra, M. Trerotola, and S. Alberti, “Detection and analysis of spliced chimeric mRNAs in sequence databanks,” Nucleic Acids Research, Vol. 31, No. 4, p. e17, 2003. [87] Y. Ruan, C. L. Wei, L. A. Ee, V. B. Vega, H. Thoreau, S. T. S. Yun, J.-M. Chia, and P. Ng, “Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection,” The Lancet, Vol. 361, pp. 1779–1785, 2003. [88] N. Saitou and M. Nei, “The neighbor-joining method: A new method for reconstructing phylogenetic trees,” Molecular Biology and Evolution, Vol. 4, No. 4, pp. 406–425, 1987. [89] Y. Sakai, “A linear space algorithm for computing a longest common increasing subsequence,” Information Processing Letters, Vol. 99, No. 5, pp. 203–207, 2006. [90] D. Sankoff and J. B. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, MA., 1983. [91] D. Sankoff, “Minimal mutation trees of sequences,” SIAM Journal on Applied Mathematics, Vol. 28, pp. 35–42, 1975. [92] C. Schensted, “Longest increasing and decreasing subsequences,” Canadian Journal of Mathematics, Vol. 13, pp. 179–191, 1961. [93] J. P. Schmidt, “All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings,” SIAM Journal on Computing, Vol. 27, No. 4, pp. 972–992, 1998. [94] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, second ed., 1997. [95] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, Vol. 147, No. 1, pp. 195–197, 1981. [96] A. G. C. L. Speksnijder, G. A. Kowalchuk, S. D. Jong, E. Kline, J. R. Stephen, and H. J. Laanbroek, “Microvariation artifacts introduced by PCR and cloning of closely related 16S rRNA gene sequences,” Applied and Environmental Microbiology, Vol. 67, No. 1, pp. 469–472, 2001. [97] K. St. John, T. Warnow, B. M. E. Moret, and L. Vawter, “Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining,” Journal of Algorithms, Vol. 48, pp. 173–193, 2003. [98] M. Steel, “The complexity of reconstructing trees from qualitative characters and subtrees,” Journal of Classification, Vol. 9, pp. 91–116, 1992. [99] J. D. Thompson, T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins, “The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools,” Nucleic Acids Research, Vol. 24, pp. 4876–4882, 1997. [100] J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice,” Nucleic Acids Research, Vol. 22, pp. 4673–4680, 1994. [101] Y. T. Tsai and J. T. Hsu, “An approximation algorithm for multiple longest common subsequence problems,” Proceeding of the 6th World Multiconference on Systemics, Cybernetics and Informatics, SCI2002, pp. 456–460, 2002. [102] Y.-T. Tsai, “The constrained longest common subsequence problem,” Information Processing Letters, Vol. 88, pp. 173–176, 2003. [103] L. Vigilant, M. Stoneking, H. Harpending, K. Hawkes, and A. C. Wilson, “African populations and the evolution of human mitochondrial DNA,” Science, Vol. 253, pp. 1503–1507, 1991. [104] R. A. Wagner and M. J. Fischer, “The string-to-string correction problem,” Journal of the ACM, Vol. 21, No. 1, pp. 168–173, 1974. [105] G. C.-Y.Wang and Y.Wang, “The frequency of chimeric molecules as a consequence of PCR co-amplification of 16S rRNA genes from different bacterial species,” Microbiology, Vol. 142, pp. 1107–1114, 1996. [106] G. C.-Y. Wang and Y. Wang, “Frequency of formation of chimeric molecules is a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes,” Applied and Environmental Microbiology, Vol. 63, No. 12, pp. 4645–4650, 1997. [107] L.Wang and D. Gusfield, “Improved approximation algorithms for tree alignment,” Journal of Algorithms, Vol. 25, No. 2, pp. 255–273, 1997. [108] L. Wang and T. Jiang, “On the complexity of multiple sequence alignment,” Journal of Computational Biology, Vol. 1, pp. 337–348, 1994. [109] L. Wang, T. Jiang, and D. Gusfield, “A more efficient approximation scheme for tree alignment,” SIAM Journal on Computing, Vol. 30, No. 1, pp. 283–299, 2000. [110] L. Wang, T. Jiang, and E. L. Lawler, “Approximation algorithms for tree alignment with a given phylogeny,” Algorithmica, Vol. 16, No. 3, pp. 302–315, 1996. [111] B. Y. Wu, K. M. Chao, and C. Y. Tang, “Approximation and exact algorithms for constructing miminum ultrametric trees from distance matrices,” Journal of Combinatorial Optimation, Vol. 3, pp. 199–211, 1999. [112] C. B. Yang and R. C. T. Lee, “Systolic algorithms for the longest common subsequence problem,” Journal of the Chinese Institute of Engineers, Vol. 10, No. 6, pp. 691–699, 1987. [113] I.-H. Yang, C.-P. Huang, and K.-M. Chao, “A fast algorithm for computing a longest common increasing subsequence,” Information Processing Letters, Vol. 93, No. 5, pp. 249–253, 2005. [114] P. Yu, D. Ma, and M. Xu, “Nested genes in the human genome,” Genomics, Vol. 86, No. 4, pp. 414–422, 2005. [115] J. Zhang and T. L. Madden, “PowerBLAST: a new network blast application for interactive or automated sequence analysis and annotation,” Genome Methods, pp. 649–656, 1997.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內立即公開，校外一年後公開 off campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0714107-160907.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS