Responsive image
博碩士論文 etd-0811117-153330 詳細資訊
Title page for etd-0811117-153330
論文名稱
Title
使用線性空間S-table解決最大共同子序列問題之演算法
The Algorithms for the Linear Space S-table on the Longest Common Subsequence Problem
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
60
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-09-12
繳交日期
Date of Submission
2017-09-12
關鍵字
Keywords
連續後綴匹配問題、線性空間、後綴表、區間查詢、區段樹、最長共同子序列
longest common subsequence, consecutive suffix alignment problem, linear space, S-table, range query, segment tree
統計
Statistics
本論文已被瀏覽 5620 次,被下載 53
The thesis/dissertation has been browsed 5620 times, has been downloaded 53 times.
中文摘要
給定兩個序列A和B,長度分別為m和n,連續後綴匹配(CSA)問題為計算A序列與所有B序列子字串的最長共同子序列(LCS),而儲存連續後綴匹配問題的二維矩陣稱為後綴表。給定A(r)和B的S-table與A(r-1)和B計算LCS的結果,我們可以在O(L)的時間解決兩個字串A(r-1)和A(r)相接與另一字串B的LCS,其中L表示LCS的長度。
此論文中,我們針對線性空間後綴表進行研究。Alves等學者首先提出線性空間後綴表的想法,但沒有更多深入的研究與實際的應用。我們提出時間複雜度為O(n)的演算法以解決字串相接與另一字串的LCS問題。我們也提出了時間複雜度為O(nlogn)的演算法來合併兩張線性空間後綴表,改善了原先合併兩張後綴表需要O(nL)的時間。最後,給定A與B的線性空間後綴表,我們提出了一個時間複雜度為O(n)的演算法,以計算Aα與B的線性空間後綴表,其中α為接在A序列後方的新字元。
Abstract
Given two sequences A and B of lengths m and n, respectively, the consecutive suffix alignment problem is to compute the longest common subsequence (LCS) between A and each suffix of B. The data structure for the consecutive suffix alignment problem is named S-table, which is a two-dimensional matrix. The S-table of A(r) and B can be used to compute the LCS with the concatenation of two substrings (A(r-1) and A(r)) and B in O(L) time, with the alignment result of A(r-1) and B is given, where L denotes the LCS length.
In this thesis, we focus on the linear space S-table. The linear space S-table was fi rst proposed by Alves et al., but without further discussions and practical applications. We propose an algorithm to compute the LCS of two concatenated strings and one string in O(n) time by using the linear space S-table. We also propose an algorithm for merging two linear S-tables in O(n log n) time, instead of merging two S-table in quadratic time previously. At last, we propose an algorithm to compute the linear space S-table of A and B with given the linear space S-table of A and B in O(n) time, where denotes a new character appended to the tail of A.
目次 Table of Contents
THESIS VERIFICATION FORM . . . . . . . . . . . . . . . . . . . . . . i
THESIS AUTHORIZATION FORM . . . . . . . . . . . . . . . . . . . . iii
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
CHINESE ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 The Longest Common Subsequence Problem . . . . . . . . . . . . . . 3
2.2 The Consecutive Suffix Alignment Problem . . . . . . . . . . . . . . . 6
2.3 Applications of the S-table . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 The Merging of Two S-tables . . . . . . . . . . . . . . . . . . . . . . 11
2.5 The Segment Tree for Range Query and Range Update . . . . . . . . 12
Chapter 3. Our Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 The Alignment with the Linear Space S-table . . . . . . . . . . . . . 18
3.1.1 An O(n log n)-time Sequential Algorithm . . . . . . . . . . . . 21
3.1.2 An Algorithm with O(n) time . . . . . . . . . . . . . . . . . . 23
3.2 Merging two Linear Space S-tables . . . . . . . . . . . . . . . . . . . 27
3.3 An Algorithm for Computing the Linear Space S-table Incrementally 34
Chapter 4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
參考文獻 References
A. Aggarwal, M. Klawe, S. Moran, P. Shor, and R. Wilber, "Geometric applications
of a matrix searching algorithm," Proceedings of the Second Annual
Symposium on Computational Geometry, New York, USA, pp. 285-292, ACM,
1986.
L. Allison and T. I. Dix, "A bit-string longest-common-subsequence algorithm,"
Information Processing Letters, Vol. 23, No. 5, pp. 305-310, 1986.
C. E. R. Alves, E. N. Caceres, and S. W. Song, "An all-substrings common
subsequence algorithm," Electronic Notes in Discrete Mathematics, Vol. 19,
pp. 133-139, 2005.
H.-Y. Ann, C.-B. Yang, and C.-T. Tseng, "Effcient polynomial-time algorithms
for the constrained lcs problem with strings exclusion," Journal of Combinato-
rial Optimization, Vol. 28, No. 4, pp. 800-813, Nov. 2014.
H.-Y. Ann, C.-B. Yang, C.-T. Tseng, and C.-Y. Hor, "Fast algorithms for com-
puting the constrained lcs of run-length encoded strings," Theoretical Computer
Science, Vol. 432, pp. 1-9, May 2012.
A. Apostolico, "String editing and longest common subsequences," Handbook
of Formal Languages, pp. 361-398, Springer, 1997.
J. L. Bentley, "Algorithms for klee's rectangle problems." Technical Report,
Computer Science Department, Carnegie Mellon University, 1977.
L. Bergroth, H. Hakonen, and T. Raita, "A survey of longest common subsequence
algorithms," Proceedings of Seventh International Symposium on String
Processing and Information Retrieval, A Coruna, Spain, pp. 39-48, IEEE, 2000.
F. Y. Chin, A. De Santis, A. L. Ferrara, N. Ho, and S. Kim, "A simple algorithm
for the constrained sequence problems," Information Processing Letters, Vol. 90,
No. 4, pp. 175-179, 2004.
M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid, "A fast and
practical bit-vector algorithm for the longest common subsequence problem,"
Information Processing Letters, Vol. 80, No. 6, pp. 279-285, 2001.
H. N. Gabow and R. E. Tarjan, "A linear-time algorithm for a special case of
disjoint set union," Proceedings of the Fifteenth Annual ACM Symposium on
Theory of Computing, New York, USA, pp. 246-251, ACM, 1983.
D. S. Hirschberg, "A linear space algorithm for computing maximal common
subsequences," Communications of the ACM, Vol. 18, No. 6, pp. 341-343, 1975.
D. S. Hirschberg, "Algorithms for the longest common subsequence problem,"
Journal of the ACM, Vol. 24, No. 4, pp. 664-675, 1977.
K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng, "Algorithms
for the merged-LCS problem and its variant with block constraint,"
Proc. of the 23rd Workshop on Combinatorial Mathematics and Computation
Theory, Chang-Hua, Taiwan, pp. 232-239, 2006.
J. W. Hunt and T. G. Szymanski, "A fast algorithm for computing longest
common subsequences," Communications of the ACM, Vol. 20, No. 5, pp. 350-
353, 1977.
G. M. Landau, E. Myers, and M. Ziv-Ukelson, "Two algorithms for LCS consecutive
suffix alignment," Annual Symposium on Combinatorial Pattern Match-
ing, Istanbul, Turkey, pp. 173-193, Springer, 2004.
G. M. Landau, E. W. Myers, and J. P. Schmidt, "Incremental string comparison,"
SIAM Journal on Computing, Vol. 27, No. 2, pp. 557-582, 1998.
G. M. Landau, B. Schieber, and M. Ziv-Ukelson, "Sparse LCS common substring
alignment," Annual Symposium on Combinatorial Pattern Matching, Michoacn,
Mexico, pp. 225-236, Springer, 2003.
G. M. Landau and M. Ziv-Ukelson, "On the common substring alignment problem,"
Journal of Algorithms, Vol. 41, No. 2, pp. 338-359, 2001.
M. Lu and H. Lin, "Parallel algorithms for the longest common subsequence
problem," IEEE Transactions on Parallel and Distributed Systems, Vol. 5,
No. 8, pp. 835-848, 1994.
M. Maes, "On a cyclic string-to-string correction problem," Information Pro-
cessing Letters, Vol. 35, No. 2, pp. 73-78, 1990.
S. B. Needleman and C. D. Wunsch, "A general method applicable to the
search for similarities in the amino acid sequence of two proteins," Journal of
Molecular Biology, Vol. 48, No. 3, pp. 443-453, 1970.
Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, and C.-Y. Hor, "Effi-
cient sparse dynamic programming for the merged LCS problem with block
constraints," International Journal of Innovative Computing, Information and
Control, Vol. 6, No. 4, pp. 1935-1947, 2010.
Y.-H. Peng, C.-B. Yang, K.-T. Tseng, and K.-S. Huang, "An algorithm and
applications to sequence alignment with weighted constraints," International
Journal of Foundations of Computer Science, Vol. 21, No. 1, pp. 51-59, Feb.
2010.
J. P. Schmidt, "All highest scoring paths in weighted grid graphs and their
application to finding all approximate repeats in strings," SIAM Journal on
Computing, Vol. 27, No. 4, pp. 972-992, 1998.
R. E. Tarjan, "Efficiency of a good but not linear set union algorithm," Journal
of the ACM, Vol. 22, No. 2, pp. 215-225, 1975.
Y.-T. Tsai, "The constrained longest common subsequence problem," Informa-
tion Processing Letters, Vol. 88, No. 4, pp. 173-176, 2003.
C.-T. Tseng, C.-B. Yang, and H.-Y. Ann, ""Efficient algorithms for the longest
common subsequence problem with sequential substring constraints," Journal
of Complexity, Vol. 29, No. 1, pp. 44-52, Feb. 2013.
K.-T. Tseng, D.-S. Chan, and C.-B. Yang, "An efficient merged longest common
subsequence algorithm for similar sequences," Proceedings of the 20th World
Multi-Conference on Systemics, Cybernetics and Informatics, Vol. I, Orlando,
Florida, USA, pp. 93-98, July 2016.
R. A. Wagner and M. J. Fischer, "The string-to-string correction problem,"
Journal of the ACM, Vol. 21, No. 1, pp. 168-173, 1974.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code