Responsive image
博碩士論文 etd-0023117-160730 詳細資訊
Title page for etd-0023117-160730
論文名稱
Title
小字符集的限制最長共同子序列之快速演算法
A Fast Algorithm for the Constrained Longest Common Subsequence Problem with Small Alphabet
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
89
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-01-16
繳交日期
Date of Submission
2017-02-07
關鍵字
Keywords
小字符集、動態規劃、相似度、限制最長共同子序列、最長共同子序列
Small alphabet, Dynamic Programming, Similar, CLCS, LCS
統計
Statistics
本論文已被瀏覽 5686 次,被下載 47
The thesis/dissertation has been browsed 5686 times, has been downloaded 47 times.
中文摘要
給定A與B兩條序列以及一條限制序列C,限制最長共同子序列問題即是找出A與B的最長共同子序列,而且此序列必須包含限制序列C。Chin等學者提出一個動態規劃的演算法來解決限制最長共同子序列問題,但是此方法需要計算完整的三維矩陣。我們觀察CLCS的三維矩陣發現,在小字符數的情況下,大多數的對應點在相鄰兩層的數值都相同,也就是說這些數值相同的點應該可以直接使用不需要重新計算。因此,在本論文中,我們定義需要計算以及不需要計算的點。在大多數的情況之下,我們只需要計算特定邊界點的數值,不需要計算完整的三維矩陣。但是,我們的演算法在最糟的情況之下仍然需要計算完整的三維矩陣,所以時間複雜度與空間複雜度分別需要O(mnr)以及O(mn)。根據Deorowicz and Obstój在2010年論文的實驗結果,在字符數小於等於20的時候,Chin的方法會有不錯的執行效率。而且,根據我們的實驗結果,在字符數小於等於20的時候,我們演算法的執行效率比Chin的方法更好。所以,我們的演算法在小字符數時會比現有的限制最長共同子序列演算法更有效率。
Abstract
Given three sequences A, B and C with lengths of m, n and r, respectively,
the constrained longest common subsequence (CLCS) problem is to fi nd the LCS of
A and B which contains C as the subsequence of the answer. The dynamic program-
ming (DP) for solving CLCS, proposed by Chin et al., calculates two-dimensional
lattices layer by layer. We find that the values of most corresponding CLCS lattice
cells are identical in two consecutive layers when |∑| is small, where |∑| denotes the
alphabet size. In this thesis, we clarify which lattice cells need to be calculated, and
which lattices need not be calculated, since the calculation is redundant if two cells
are equal in two consecutive layers. Accordingly, our algorithm calculates only some
special boundary cells, instead of the whole 3-dimensional lattice in most cases. Our
algorithm still requires O(mnr) time and O(mn) space to solve the CLCS problem
in the worst case. In 2010, Deorowicz and Obstoj showed that the algorithm of Chin
et al. has good performance when |∑| ≤ 20. As our experimental results show, our
algorithm is faster than Chin's algorithm when |∑| ≤ 20. So our algorithm is better
than most of the previous CLCS algorithms when |∑| is small.
目次 Table of Contents
學位論文審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
THESIS VERIFIVATION FORM . . . . . . . . . . . . . . . . . . . . . . ii
論文公開授權書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
謝辭. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 The Longest Common Subsequence Problem . . . . . . . . . . . . . . 6
2.2 The Edit Distance of Two Sequences . . . . . . . . . . . . . . . . . . 8
2.3 The Constrained Longest Common Subsequence Problem . . . . . . . 10
2.3.1 The Constrained Longest Common Subsequence Algorithm of Tsai . . . . . . . . . . 11
2.3.2 The Constrained Longest Common Subsequence Algorithm of Chin et al. . . . . . 12
2.3.3 The Constrained Longest Common Subsequence Algorithm of Deorowicz and Obstoj . . 16
Chapter 3. Our Algorithm for the Constrained Longest Common Subsequence Problem . . . . . 19
3.1 The Constrained Longest Common Subsequence Algorithm . . . . . . 19
3.2 Our Edit Distance Algorithm for the Constrained LCS Problem . . . 35
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 36
Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Appendixes
A. Miscellaneous Experimental Results . . . . . . . . . . . . . . . . . . 52
參考文獻 References
H.-Y. Ann, C.-B. Yang, C.-T. Tseng, and C.-Y. Hor, "Fast algorithms for computing
the constrained lcs of run-length encoded strings," Theoretical Computer
Science, Vol. 432, pp. 1-9, 2012.

O. Arbell, G. M. Landau, and J. S. Mitchell, "Edit distance of run-length
encoded strings," Information Processing Letters, Vol. 83, pp. 307-314, 2002.

A. N. Arslan and O. Omer Egecioglu, "Algorithms for the constrained longest
common subsequence problems," International Journal of Foundations of Com-
puter Science, Vol. 16, No. 6, pp. 1099-1109, 2005.

D. Becerra, W. Soto, L. Nino, and Y. Pinzon, "An algorithm for constrained
LCS," ACS/IEEE International Conference on Computer Systems and Applica-
tions, AICCSA 2010, Hammamet, Tunisia, May 16-19, 2010, Revised Selected
Papers, pp. 237-246, 2010.

F. Y. L. Chin, A. D. Santis, A. L. Ferrara, N. L. Ho, and S. K. Kim, "A simple
algorithm for the constrained sequence problems," Information Processing
Letters, Vol. 90(4), pp. 175-179, 2004.

S. Deorowicz, "Fast algorithm for constrained longest common subsequence
problem.," Theoretical and Applied Informatics, Vol. 19, No. 2, pp. 91-102,
2007.

S. Deorowicz, "Bit-parallel algorithm for the constrained longest common subsequence
problem," Fundamenta Informaticae, Vol. 99, pp. 409-433, 2010.

S. Deorowicz and A. Danek, "Bit-parallel algorithms for the merged longest
common subsequence problem," International Journal of Foundations of Com-
puter Science, Vol. 24, pp. 1281-1298, 2013.

S. Deorowicz and J. Obstoj, "Constrained longest common subsequence computing
algorithms in practice," Computing and Informatics, Vol. 29, pp. 427-
445, 2010.

D. He and A. N. Arslan, "A space-ecient algorithm for the constrained
pairwise sequence alignment problem," Genome Informatics, Vol. 16, No. 2,
pp. 237-246, 2005.

D. S. Hirschberg, "A linear space algorithm for computing maximal common
subsequences," Communications of the ACM, Vol. 18, No. 6, pp. 341-343, 1975.

K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng, "Efficient
algorithms for finding interleaving relationship between sequences," Informa-
tion Processing Letters, Vol. 105, pp. 188-193, 2008.

J. W. Hunt and T. G. Szymanski, "A fast algorithm for computing longest
common subsequences," Communications of the ACM, Vol. 20, No. 5, pp. 350-
353, 1977.

C. S. Iliopoulos and M. S. Rahman, "New efficient algorithms for the LCS and
constrained LCS problems," Information Processing Letters, Vol. 106, No. 1,
pp. 13-18, 2008.

J. B. KRUSKAL, "An overview of sequence comparison: the warps, string edits,
and macromolecules," SIAM Review, Vol. 25, No. 2, pp. 201-237, 1977.

J. Liu, G. Huang, Y.Wang, and R. Lee, "Edit distance for a run-length-encoded
string and an uncompressed string," Information Processing Letters, Vol. 105,
pp. 12-16, 2007.

W. J. Masek, "A faster algorithm computing string edit distance," Journal of
Computer and System Sciences, Vol. 20, pp. 18-31, 1980.

N. Nakatsu, Y. Kambayashi, and S. Yajima, "A longest common subsequence
algorithm suitable for similar text strings," Acta Informatica, Vol. 18, pp. 171-
179, 1982.

D. B. Needleman and C. D. Wunsch, "A general method applicable to the
search for similarities in the amino acid sequence of two proteins," Journal of
Molecular Biology, Vol. 48, pp. 443-453, 1970.

C.-L. Peng, "An approach for solving the constrained longest common subsequence
problem." http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/search,
Master's Thesis, Department of Computer Science and Engineering, National
Sun Yat-Sen University, Kaohsiung, Taiwan, 2003.

Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, and C.-Y. Hor, "E-
fficient sparse dynamic programming for the merged LCS problem with block
constraints," International Journal of Innovative Computing, Information and
Control, Vol. 6, pp. 1935-1947, 2010.

Y.-H. Peng, C.-B. Yang, K.-S. Huang, and K.-T. Tseng, "An algorithm and
applications to sequence alignment with weighted constraints," International
Journal of Foundations of Computer Science, Vol. 21, pp. 51-59, 2010.

Z. S. Peng and H. F. Ting, "Time and space efficient algorithms for constrained
sequence alignment," Implementation and Application of Automata: 9th Inter-
national Conference, CIAA 2004, Kingston, Canada, July 22-24, 2004, Revised
Selected Papers, pp. 237-246, 2005.

A. M. Rahman and M. S. Rahman, "Effictive sparse dynamic programming
algorithms for merged and block merged LCS problems," Journal of Computers,
Vol. 9, No. 8, pp. 1743-1754, 2014.

Y. T. Tsai, "The constrained longest common subsequence problem," Informa-
tion Processing Letters, Vol. 88, pp. 173-176, 2003.

R. Wagner and M. Fischer, "The string-to-string correction problem," Journal
of the ACM, Vol. 21, No. 1, pp. 168-173, 1974.

W. L. Wang, "Longest common subsequence with constraint," Master's Thesis,
Department of Computer Science and Information Engineering, National Chi-
Nan University, Nantou, Taiwan, 2006.

S.Wu, U. Manber, and G. Myers, "An O(NP) sequence comparison algorithm,"
Information Processing Letters, Vol. 35, pp. 317-323, 1990.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code