Responsive image
博碩士論文 etd-0106109-080018 詳細資訊
Title page for etd-0106109-080018
論文名稱
Title
多重序列的最長共同子序列之基因演算法
A Genetic Algorithm for the Longest Common Subsequence of Multiple Sequences
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2008-07-14
繳交日期
Date of Submission
2009-01-06
關鍵字
Keywords
最長共同子序列、基因演算法
longest common subsequence, multiple sequences, genetic algorithm
統計
Statistics
本論文已被瀏覽 5702 次,被下載 2036
The thesis/dissertation has been browsed 5702 times, has been downloaded 2036 times.
中文摘要
許多的方法已經被提出在尋找最長共同子序列(LCS)的問題上,而這些方法在最壞的情況下時,其時間複雜度為O(n2),n是指輸入序列的長度。然而,當輸入序列的長度n在非常大的時候,這些演算法會變得不可實行。近來,k條序列間的最長共同子序列(k-LCS, k≧2)的問題變得越來越引人注意。已有一些演算法被提出來為解決此問題,但是,為解此問題而所需的執行時間依然太長,以致於無法實行。
在本論文中,我們提出一種基因演算法來解決k-LCS問題,其時間複雜度為O(Gpk(n + |P_j|)),G為所經過的世代數,p為樣板序列的數量,k為輸入序列的數量,而n與|P_j|分別為輸入序列的長度以及樣板序列的長度。如同我們的實驗結果,當輸入序列數量為20、輸入序列長度為1000時,我們演算法的效能比率(|CS|/|LCS|)是大於0.8的,其中,|CS|所指的是我們所找到的解答長度,而|LCS|是真正的LCS長度。我們與Expansion演算法以及BNMAS演算法做效能比率的比較,當輸入序列的數量從2到20條,輸入的序列長度為100到2000時,我們所求得的效能比率是非常好。
Abstract
Various approaches have been proposed for finding the longest
common subsequence (LCS) of two sequences. The time complexities
of these algorithms are usually $O(n^2)$ in the worst case, where
$n$ is the length of input sequences. However, these algorithms
would become infeasible when the input length, $n$, is very long.
Recently, the $k$-LCS $(k ≥ 2)$ problem has become more
attractive. Some algorithms have been proposed for solving the
problem, but the execution time required for solving the $k$-LCS
problem is still too long to be practical. In this thesis, we
propose a genetic algorithm for solving the $k$-LCS problem with
time complexity $O(Gpk(n + |P_j|))$, which $G$ is the number of
generations, $p$ is the number of template patterns, $k$ is the
number of input sequences, $n$ and $|P_j|$ are the length of input
sequences and the length of template patterns, respectively. As
our experimental results show, when $k$ is 20 and $n$ is 1000, the
performance ratio ($|CS|/|LCS|$) of our algorithm is greater than
0.8, where $|CS|$ denotes the length of the solution we find, and
$|LCS|$ represents the length of the real (optimal) LCS. Comparing
the performance ratios with Expansion Algorithm and BNMAS
Algorithm, our algorithm is much better than them when the number
of input sequences varies from 2 to 20 and the length of the input
sequences varies from 100 to 2000.
目次 Table of Contents
LIST OF FIGURES iii
LIST OF TABLES iv
中文摘要
ABSTRACT v
1 Introduction 1
2 Preliminaries 3
2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The Longest Common
SubsequenceProblem . . . . . . . . . . 4
2.3 Dynamic Programming Algorithm for
2-LCS . . . . . . . . . . 5
2.4 The Multiple Sequence Alignment
Problem . . . . . . . . . . . 8
2.5 The Genetic Algorithm . . . .. . . . . . . . . . . . . . . . . . 15
3 Previous Work of the k-LCS Problem 19
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 The Expansion Algorithm . . . . . . . . . . . . . . . . 20
3.2.2 The Best Next for Maximal Available
Symbols . . . . . 22
3.2.3 Starting from Scratch: Growing LCS with
Evolution . 23
3.2.4 ACO for k-LCS . . . . . . . . . . . . . . . . . . . . . . 25
4 Our Genetic Algorithm for k-LCS 27
5 Experimental Results and Discussion 33
5.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 33
5.2 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 Conclusion 42
BIBLIOGRAPHY 44
參考文獻 References
[1] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Ba-
sic local alignment search tool," Journal of Molecular Biology, Vol. 215,
pp. 403{410, 1990.
[2] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang,
W. Miller, and D. Lipman, Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs," Nucleic Acids Re-
search, Vol. 25, pp. 3389{3402, 1997.
[3] L. Bergroth, H. Hakonen, and T. Raita, A survey of longest common
subsequence algorithms," Proceedings of Seventh International Sym-
posium on String Processing and Information Retrieval, SPIRE 2000,
pp. 39{48, 2000.
[4] P. Bonizzoni, G. D. Vedova, and G. Mauri, Experimenting an approxi-
mation algorithm for the LCS," Discrete Applied Mathematics, Vol. 110,
No. 1, pp. 13{24, 2001.
[5] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer
Science and Computational Biology. Cambridge Press, NY, 1997.
[6] K. Hakata and H. Imai, The longest common subsequence problem for
small alphabet size between many strings," Proceedings of the Third In-
ternational Symposium on Algorithms and Computation, Lecture Notes
in Computer Science 650, Springer Verlag, pp. 469{478, 1992.
[7] D. S. Hirschberg, Algorithms for the longest common subsequence
problem," Journal of ACM, Vol. 24, pp. 664{675, 1977.
[8] J. H. Holland, Adaptation in Natural and Artificial Systems. University
of Michigan Press, Michigan, 1975.
[9] K. F. Huang, C. B. Yang, and K. T. Tseng, An efficient algorithm for
multiple sequence alignment," Proc. of the 19th Workshop on Combina-
torial Mathematics and Computation Theory, pp. 50{59, 2002.
[10] T. Jiang and M. Li, On the approximation of shortest common super-
sequences and longest common subsequences," SIAM Journal on Com-
puting, Vol. 24, pp. 1122{1139, 1995.
[11] B. A. Julstrom and B. Hinkemeyer, Starting from scratch: Growing
longest common subsequences with evolution," Proceedings of the 9th In-
ternational Conference on Parallel Problem Solving From Nature (PPSN
IX), Lecture Notes in Computer Science 4193, Springer Berlin / Hei-
delberg, pp. 930{938, 2006.
[12] C.-B. Y. Kuo-Si Huang and K.-T. Tseng, Fast algorithms for finding
the common subsequence of multiple sequences, taipei, taiwan, dec. 15-
17, 2004," Proc. of International Computer Symposium, p. 90, 2004.
[13] D. Maier, The complexity of some problems on subsequences and su-
persequences," Journal of the ACM, Vol. 25, pp. 322{336, 2001.
[14] W. J. Masek and M. S. Paterson, A faster algorithm computing string
edit distances," Journal of Computer and System Sciences, Vol. 20,
pp. 18{31, 1980.
[15] D. B. Needleman and C. D. Wunsch, A general method applicable to
the search for similarities in the amino acid sequence of two proteins,"
Journal of Molecular Biology, Vol. 48, No. 3, pp. 443{453, 1970.
[16] W. A. Pearson, Rapid and sensitive sequence comparison with fastp
and fasta," Methods in Enzymology, Vol. 183, pp. 63{98, 1990.
[17] W. R. Pearson and D. Lipman, Improved tools for biological sequence
comparison," Proceedings of the National Academy of Sciences, Vol. 85,
pp. 2444{2448, 1988.
[18] I. R. and F. C., Two algorithms for the longest common subsequence of
three (or more) strings," Proceedings of the 3rd Annual Symposium on
Combinatorial Pattern Matching, New York, Springer-Verlag, Vol. 644,
pp. 214{229, 1992.
[19] C. Rick, Simple and fast linear space computation of longest common
subsequences," Information Processing Letters, Vol. 75, pp. 275{281,
2000.
[20] J. Setubal and J. Meidanis, Introduction to Computational Molecular
Biology. PWS Publishing Company, Boston, second ed., 1997.
[21] S. J. Shyu and C.-Y. Tsai, Finding the longest common subsequence
for multiple biological sequences by ant colony optimization," Computers
and Operations Research, Vol. 36, pp. 73{91, 2007.
[22] T. F. Smith and M. S. Waterman, Identification of common molecular
subsequences," Journal of Molecular Biology, Vol. 147, No. 1, pp. 195{
197, 1981.
[23] J. D. Thompson, D. G. Higgins, and T. J. Gibson, CLUSTAL W:
improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, positions-specific gap penalties and weight
matrix choice," Nucleic Acids Research, Vol. 22, pp. 4673{4680, 1994.
[24] Y. T. Tsai and J. T. Hsu, An approximation algorithm for multiple
longest common subsequence problems," Proceeding of the 6th World
Multiconference on Systemics, Cybernetics and Informatics, SCI2002,
pp. 456{460, 2002.
[25] R. A. Wagner and M. J. Fischer, The string-to-string correction prob-
lem," Journal of the ACM, Vol. 21, No. 1, pp. 168{173, 1974.
[26] L. Wang and T. Jiang, On the complexity of multiple sequence align-
ment," Journal of Computational Biology, Vol. 1, pp. 337{348, 1994.
[27] C. B. Yang and R. C. T. Lee, Systolic algorithms for the longest com-
mon subsequence problem," Journal of the Chinese Institute of Engi-
neers, Vol. 10, No. 6, pp. 691{699, 1987.
[28] J. Zhang and T. L. Madden, PowerBLAST: a new network blast appli-
cation for interactive or automated sequence analysis and annotation,"
Genome Methods, pp. 649{656, 1997.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code