Responsive image
博碩士論文 etd-0020114-165412 詳細資訊
Title page for etd-0020114-165412
論文名稱
Title
彈性最長共同子序列問題之演算法
Efficient Algorithms for the Flexible Longest Common Subsequence Problem
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-01-08
繳交日期
Date of Submission
2014-01-20
關鍵字
Keywords
優勢策略、彈性最長共同子序列、序列對齊、動態規劃、最長共同子序列
Dominant Strategy, Flexible Longest Common Subsequence, Longest Common Subsequence, Dynamic Programming, Sequence Alignment
統計
Statistics
本論文已被瀏覽 5726 次,被下載 607
The thesis/dissertation has been browsed 5726 times, has been downloaded 607 times.
中文摘要
傳統的最長共同子序列 (LCS) 問題,欲設法從給定的兩序列間,找出具有最多匹配字元的共同子序列,而並未考慮匹配字元間的連續與否。然而在許多應用領域,即便匹配字元的個數較少,較為連續的序列對齊方式,是比起零碎而分散者來得有意義。於是,我們在此提出新的 LCS 變形問題,稱作「彈性最長共同子序列」(FLCS) 問題。於此論文中,我們設計用以評定序列對齊連續性的評分函數,並發展出有效率解決 FLCS 問題的演算法。藉由動態規劃的方式,我們先是以直覺的想法提出了時間複雜度為 O(n^3) 的演算法,其中 n 為輸入序列中長度較長者的長度。接著,我們導入優勢清單的概念,以減少格子間冗餘的計算。最後,我們提出了有效率的演算法, 能夠在 O(mn) 的時間複雜度內解決 FLCS 問題,m 與 n 分別代表兩輸入序列的長度。
Abstract
Given two sequences, the traditional longest common subsequence (LCS) problem is to obtain the common subsequence with the maximum number of matches, without considering the continuity of the matched characters. However, in many applications, the alignment results with higher continuity are more meaningful than the sparse ones, even if the number of matched characters is a little lower. Accordingly, we define a new variant of the LCS problem, called the flexible longest common subsequence (FLCS) problem. In this thesis, we design a scoring function to estimate the continuity of an alignment between two strings, and develop efficient algorithms for solving the FLCS problem. We first propose a straightforward method with the dynamic programming approach, which requires O(n^3) time, where n denotes the longer length of the input sequences. Then, we apply the concept of dominant lists to reduce the redundant computation in each lattice cell. Finally, we propose an efficient algorithm for solving the FLCS problem with O(mn) time, where m and n denote the lengths of the two input sequences.
目次 Table of Contents
LIST OF FIGURES ........................................ iii
ABSTRACT ................................................. v
Chapter 1. Introduction .................................. 1
Chapter 2. Preliminaries ................................. 5
2.1 Notations ............................................ 5
2.2 The Longest Common Subsequence Problem ............... 5
2.3 Affine Gap Penalty ................................... 7
2.4 Dynamic Time Warping ................................. 9
Chapter 3. The Scoring Function ..........................12
Chapter 4. The Algorithms for the FLCS .................. 15
4.1 The Algorithm with Straightforward Dynamic Programing 15
4.2 The Algorithm with the Dominant Strategy ............ 18
4.3 The Efficient Dominating Method ..................... 23
4.4 The O(mn)-time Algorithm ............................ 30
Chapter 5. Conclusion ................................... 37
BIBLIOGRAPHY ............................................ 38
Appendixes
A. The Intersection Point When γ = 4 ................... 41
B. The Intersection Point When γ = 5 ................... 42
參考文獻 References
[1] A. Abdulla-Al-Maruf, H.-H. Huang, and K. Kawagoe, "Time series classification method based on longest common subsequence and textual approximation," Proceeding of 2012 Seventh International Conference on Digital Information Management (ICDIM), pp. 130-137, 2012.
[2] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. Dover Publications, Inc., New York, 1970.
[3] S. Altschul and B. W. Erickson, "Optimal sequence alignment using affine gap costs," Journal of Molecular Biology, Vol. 48, pp. 603-616, 1986.
[4] H.-Y. Ann, C.-B. Yang, Y.-H. Peng, and B.-C. Liaw, "Efficient algorithms for the block edit problems," Information and Computation, Vol. 208(3), pp. 221- 229, 2010.
[5] H.-Y. Ann, C.-B. Yang, C.-T. Tseng, and C.-Y. Hor, "A fast and simple algo rithm for computing the longest common subsequence of run-length encoded strings," Information Processing Letters, Vol. 108, pp. 360-364, 2008.
[6] D. J. Berndt and J. Clifford, "Using dynamic time warping to find patterns in time series.," KDD workshop, Vol. 10, pp. 359-370, Seattle, WA, 1994.
[7] K.-Y. Cheng, K.-S. Huang, and C.-B. Yang, "The longest common subsequence problem with the gapped constraint," Proc. of the 30th Workshop on Combi natorial Mathematics and Computation Theory, pp. 80-85, 2013.
[8] D. Clifford, G. Stone, I. Montoliu, S. Rezzi, F.-P. Martin, P. Guy, S. Bruce, and S. Kochhar, "Alignment using variable penalty dynamic time warping," Analytical chemistry, Vol. 81, No. 3, pp. 1000-1007, 2009.
[9] A. Flores-Mendez and M. Bernal-Urbina, "Dynamic signature verification through the longest common subsequence problem and genetic algorithms," Evolutionary Computation (CEC), 2010 IEEE Congress on, pp. 1-6, 2010. 38
[10] D. S. Hirschberg, "A linear space algorithm for computing maximal common subsequences," Communications of the ACM, Vol. 18, pp. 341-343, 1975.
[11] K.-S. Huang, C.-B. Yang, and K.-T. Tseng, "Fast algorithms for finding the common subsequence of multiple sequences," Proceedings of International Com puter Symposium, Taipei, Taiwan, pp. 90(Abstract, full text in CD), 2004.
[12] K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng, "Efficient algorithms for finding interleaving relationship between sequences," Informa tion Processing Letters, Vol. 105(5), pp. 188-193, 2008.
[13] K.-S. Huang, C.-B. Yang, K.-T. Tseng, Y.-H. Peng, and H.-Y. Ann, "Dynamic programming algorithms for the mosaic longest common subsequence problem," Information Processing Letters, Vol. 102, pp. 99-103, 2007.
[14] J. W. Hunt and T. G. Szymanski, "A fast algorithm for computing longest common subsequences," Communications of the ACM, Vol. 20(5), pp. 350-353, 1977.
[15] C. S. Iliopoulos and M. S. Rahman, "Algorithms for computing variants of the longest common subsequence problem," Theoretical Computer Science, Vol. 395, pp. 255-267, 2008.
[16] C. S. Iliopoulos and M. S. Rahman, "New efficient algorithms for the LCS and constrained LCS problems," Information Processing Letters, Vol. 106(1), pp. 13-18, 2008.
[17] Y.-S. Jeong, M. K. Jeong, and O. A. Omitaomu, "Weighted dynamic time warp ing for time series classification," Pattern Recognition, Vol. 44, No. 9, pp. 2231-2240, 2011.
[18] D. R. Kincaid and E. W. Cheney, Numerical Analysis: Mathematics of Scien tific Computing. American Mathematical Soc., third ed., 2002.
[19] Y. Namiki, T. Ishida, and Y. Akiyama, "Acceleration of sequence clustering using longest common subsequence filtering," BMC Bioinformatics, Vol. 14, No. Suppl 8, p. S7, 2013.
[20] M. Pawlik and N. Augsten, "RTED: a robust algorithm for the tree edit dis tance," Proceedings of the VLDB Endowment, Vol. 5, No. 4, pp. 334-345, 2011.
[21] Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, and C.-Y. Hor, "Effi cient sparse dynamic programming for the merged LCS problem with block constraints," International Journal of Innovative Computing, Information and Control, Vol. 6, pp. 1935-1947, 2010. 39
[22] Y.-H. Peng, C.-B. Yang, K.-S. Huang, and K.-T. Tseng, "An algorithm and applications to sequence alignment with weighted constraints," International Journal of Foundations of Computer Science, Vol. 21, pp. 51-59, 2010.
[23] I. Stewart, Galois theory. Chapman Hall/CRC Mathematics, third ed., 2003.
[24] H. Wang, "All common subsequences.," International Joint Conference on Ar tificial Intelligence (IJCAI), pp. 635-640, 2007.
[25] W. Zhang, T. Yoshida, and X. Tang, "A comparative study of TF*IDF, LSI and multi-words for text classification," Expert Systems with Applications, Vol. 38, No. 3, pp. 2758-2765, 2011.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code