Responsive image
博碩士論文 etd-0031116-004017 詳細資訊
Title page for etd-0031116-004017
論文名稱
Title
二維最大共同子結構問題之定義與計算
The Definitions and Computation of the Two Dimensional Largest Common Substructure Problems
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
98
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-01-26
繳交日期
Date of Submission
2016-01-31
關鍵字
Keywords
NP難題、矩陣、整數線性規劃、相似度、最長共同子序列、啟發式演算法
Longest Common Subsequence, Similarity, NP-hard, Matrices, Integer Linear Programming, Heuristic Algorithm
統計
Statistics
本論文已被瀏覽 5858 次,被下載 399
The thesis/dissertation has been browsed 5858 times, has been downloaded 399 times.
中文摘要
傳統最長共同子序列問題是在於在兩個序列之中找出最多合乎順序的配對,其中兩個一維序列的相似度可以利用最長共同子序列演算法來獲得,而此演算法已經被大量研究,然而利用類似於最長共同子序列的方法來計算二維資料的相似度仍值得我們研究探討。在本論文中,我們利用傳統最長共同子序列的概念給予二維最大共同子結構問題更加通用的定義,利用不同的配對條件,我們定義四種二維最大共同子結構的問題,我們也提出不同的轉換方法來證明兩個二維最大共同子結構問題屬於NP-hard,接著我們提出一些方法來解決二維最大共同子結構問題,首先我們提出了兩個利用整數線性規劃的公式來解決二維最大共同子結構問題,接著我們提供兩個啟發式演算法來有效率的找尋次佳解。
Abstract
The traditional longest common subsequence (LCS) problem is to find the maximum number of ordered matches in two sequences. The similarity of two one-dimensional sequences can be measured by the LCS algorithms, which have been extensively studied. However, for the two-dimensional data, computing the similarity with an LCS-like approach remains worthy of investigation. In this thesis, we give the more generalized definition of the two-dimensional largest common substructure (TLCS) problem by referring to the traditional LCS concept. With different matching rules, we thus define four versions of TLCS problems. We also show that two of the TLCS problems are NP-hard by another proof way. Then, we develop some methods for solving the TLCS problem. We first provide two integer linear programming formulas to solve the TLCS problem. Furthermore, we devise two heuristic algorithms for finding a sub-optimal solution efficiently.
目次 Table of Contents
VERIFICATION FORM i
THESIS AUTHORIZATION FORM iii
ACKNOWLEDGMENTS iv
ABSTRACT v
ENGLISH ABSTRACT vi
LIST OF FIGURES ix
LIST OF TABLES xii
1 Introduction 1
2 Preliminary 4
2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Longest Common Subsequence Problem . . . . . . . . . . . . . . 5
2.3 The k-Clique Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Integer Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 The Picture Retrieval Problem . . . . . . . . . . . . . . . . . . . . . . 11
2.7 The Two-Dimensional Largest Common Substructure Problem . . . . 13
3 Problem De nitions and Proofs of NP-hardness 19
3.1 Operator De nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Problem De nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Proofs of NP-hardness for P(ENE) and P(ENL) . . . . . . . . . . 33
vii
4 Algorithms for the TLCS Problems 40
4.1 Algorithm for Optimal Solutions in P(LNA) . . . . . . . . . . . . . . 40
4.2 Integer Linear Programming for TLCS Problem . . . . . . . . . . . . 43
4.2.1 ILP with a Straightforward Method . . . . . . . . . . . . . . . 45
4.2.2 ILP with the Matching Set Method . . . . . . . . . . . . . . . 51
4.3 Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 The First Heuristic Algorithm for P(ENE) . . . . . . . . . . 55
4.3.2 The Second Heuristic Algorithm for P(ENE) . . . . . . . . . 56
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Conclusion 61
BIBLIOGRAPHY 63
A The pyomo code of the ILP formulas for P(ENL) 68
A.1 The Straightforward Method . . . . . . . . . . . . . . . . . . . . . . . 68
A.2 The Matching Set Method . . . . . . . . . . . . . . . . . . . . . . . . 73
B The pseudo code of ILP formulas for P(ENE) 76
C The pyomo code of ILP formulas for P(ENE) 78
C.1 The Straightforward Method . . . . . . . . . . . . . . . . . . . . . . . 78
C.2 The Matching Set Method . . . . . . . . . . . . . . . . . . . . . . . . 81
viii
參考文獻 References
[1] T. Achterberg, “Scip: solving constraint integer programs,” Mathematical Pro- gramming Computation, Vol. 1, pp. 1–41, 2009.
[2] A. Amir, T. Hartmana, O. Kapaha, B. R. Shaloma, and D. Tsur, “Generalized LCS,” Theoretical Computer Science, Vol. 409, pp. 438–449, 2008.
[3] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,
J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, third ed., 1999.
[4] H.-Y. Ann, C.-B. Yang, Y.-H. Peng, and B.-C. Liaw, “Efficient algorithms for the block edit problems,” Information and Computation, Vol. 208, pp. 221–229, 2010.
[5] H.-Y. Ann, C.-B. Yang, and C.-T. Tseng, “Efficient polynomial-time algorithms for the constrained lcs problem with strings exclusion,” Journal of Combinato- rial Optimization, Vol. 28, pp. 800–813, 2014.
[6] H.-Y. Ann, C.-B. Yang, C.-T. Tseng, and C.-Y. Hor, “A fast and simple algo- rithm for computing the longest common subsequence of run-length encoded strings,” Information Processing Letters, Vol. 108, pp. 360–364, 2008.
[7] R. A. Baeza-Yates, “Similarity in two-dimensional strings,” Proc. of the 4th Annual International Conference on Computing and Combinatorics(COCOON ’98), pp. 319–328, London, UK, 1998.
[8] R. Bird, “Two dimensional pattern matching,” Theoretical Computer Science, Vol. 6, pp. 168–170, 1977.
[9] C.-I. Brndn and J. Tooze, Introduction to Protein Structure. Garland, 1999.
[10] S.-K. Chang, E. Jungert, and Y. Li, “Representation and retrieval of symbolic pictures using generalized 2D strings,” SPIE Proc. Visual Communications and Image Processing, pp. 1360–1372, Philadelphia, 1989.
[11] S.-K. Chang, Q.-Y. Shi, and C.-W. Yan, “Iconic indexing by 2-D strings,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI- 9, No. 3, pp. 413–428, 1987.
[12] S. Chang and Y. Li, “Representation of multi-resolution symbolic and binary pictures using 2D h-strings,” Languages for Automation: Symbiotic and Intel- ligent Robots, 1988., IEEE Workshop on, pp. 190–195, 1988.
[13] S. Chang, C. Yan, D. Dimitroff, and T. Arndt, “An intelligent image database system,” Software Engineering, IEEE Transactions on, Vol. 14, No. 5, pp. 681– 688, May 1988.
[14] K.-Y. Cheng, K.-S. Huang, and C.-B. Yang, “The longest common subsequence problem with the gapped constraint,” Proc. of the 30th Workshop on Combi- natorial Mathematics and Computation Theory, pp. 80–85, Hualien, Taiwan, 2013.
[15] N. Chiba and T. Nishizeki, “Arboricity and subgraph listing algorithms,” SIAM Journal on Computing, Vol. 14, pp. 210–223, 1985.
[16] S. A. Cook, “The complexity of theorem-proving procedures,” Proc. of the third annual ACM symposium on Theory of computing, New York, USA, pp. 151–158, 1971.
[17] A. Danek and S. Deorowicz, “Bit-parallel algorithm for the block variant of the merged longest common subsequence problem,” Advances in Intelligent Systems and Computing, Vol. 242, pp. 173–181, 2014.
[18] G. B. Dantzig, “Reminiscences about the origins of linear programming,” Op- erations Research Letters, Vol. 1, pp. 43–48, 1982.
[19] D. Guan, C.-Y. Chou, and C.-W. Chen, “Computational complexity of similar- ity retrieval in a pictorial database,” Information Processing Letters, Vol. 75, No. 3, pp. 113 – 117, 2000.
[20] J. Guo and F. Hwang, “An almost-linear time and linear space algorithm for the longest common subsequence problem,” Information Processing Letters, Vol. 94, pp. 131–135, 2005.
[21] Y.-P. Guo, Y.-H. Peng, and C.-B. Yang, “Efficient algorithms for the flexible longest common subsequence problem,” Proc. of the 31st Workshop on Combi- natorial Mathematics and Computation Theory, pp. 1–8, Taipei, Taiwan, 2014.
[22] Gurobi Optimization, Inc., “Gurobi optimizer reference manual.” http://www. gurobi.com, 2015. [Online; accessed 6-January-2016].
[23] J. H. W. Lenstra, “Integer programming with a fixed number of variables,”
Mathematics of Operations Research, Vol. 8, pp. 538–548, 1983.
[24] D. S. Hirschberg, “A linear space algorithm for computing maximal common subsequences,” Communications of the ACM, Vol. 18, pp. 341–343, 1975.
[25] K.-S. Huang, C.-B. Yang, and K.-T. Tseng, “Fast algorithms for finding the common subsequence of multiple sequences,” Proc. of International Computer Symposium, Taipei, Taiwan, pp. 90(Abstract, full text in CD), 2004.
[26] K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng, “Efficient algorithms for finding interleaving relationship between sequences,” Informa- tion Processing Letters, Vol. 105, pp. 188–193, 2008.
[27] K.-S. Huang, C.-B. Yang, K.-T. Tseng, Y.-H. Peng, and H.-Y. Ann, “Dynamic programming algorithms for the mosaic longest common subsequence problem,” Information Processing Letters, Vol. 102, pp. 99–103, 2007.
[28] J. W. Hunt and T. G. Szymanski, “A fast algorithm for computing longest common subsequences,” Communications of the ACM, Vol. 20, pp. 350–353, 1977.
[29] C. S. Iliopoulos, M. Kubica, M. S. Rahman, and T. Wale, “Algorithms for computing the longest parameterized common subsequence,” Proc. of the 18th Annual Symposium on Combinatorial Pattern Matching, London, Canada, pp. 265–273, 2007.
[30] C. S. Iliopoulos and M. S. Rahman, “Algorithms for computing variants of the longest common subsequence problem,” Theoretical Computer Science, Vol. 395, pp. 255–267, 2008.
[31] C. S. Iliopoulos and M. S. Rahman, “New efficient algorithms for the LCS and constrained LCS problems,” Information Processing Letters, Vol. 106, pp. 13– 18, 2008.
[32] A. Itai and M. Rodeh, “Finding a minimum circuit in a graph,” SIAM Journal on Computing, Vol. 7, pp. 413–423, 1978.
[33] R. M. Karp, “Reducibility among combinatorial problems,” Proceedings of a symposium on the Complexity of Computer Computations, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, pp. 85–103, 1972.
[34] D. Knuth, J. Morris, and V. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing, Vol. 6, pp. 323–350, 1977.
[35] K. Krithivasan and R. Sitalakshmi, “Efficient two-dimensional pattern match- ing in the presence of errors,” Information Sciences, Vol. 43, pp. 169–184, 1987.
[36] S.-Y. Lee and F.-J. Hsu, “2D c-string: A new spatial knowledge representation for image database systems,” Pattern Recognition, Vol. 23, No. 10, pp. 1077 – 1087, 1990.
[37] S.-Y. Lee and F.-J. Hsu, “Spatial reasoning and similarity retrieval of images us- ing 2D c-string knowledge representation,” Pattern Recognition, Vol. 25, No. 3, pp. 305 – 318, 1992.
[38] S.-Y. Lee, M.-K. Shan, and W.-P. Yang, “Similarity retrieval of iconic image database,” Pattern Recognition, Vol. 22, No. 6, pp. 675 – 682, 1989.
[39] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Courier Corporation, 1982.
[40] M. Pawlik and N. Augsten, “RTED: a robust algorithm for the tree edit dis- tance,” Proc. of the VLDB Endowment, Vol. 5, No. 4, pp. 334–345, 2011.
[41] Y.-H. Peng and C.-B. Yang, “The longest common subsequence problem with variable gapped constraints,” Proc. of the 28th Workshop on Combinatorial Mathematics and Computation Theory, Penghu, Taiwan, pp. 17–23, 2010.
[42] Y.-H. Peng and C.-B. Yang, “Finding the gapped longest common subse- quence by incremental suffix maximum queries,” Information and Computation, Vol. 237, pp. 95–100, 2014.
[43] Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, and C.-Y. Hor, “Effi- cient sparse dynamic programming for the merged LCS problem with block constraints,” International Journal of Innovative Computing, Information and Control, Vol. 6, pp. 1935–1947, 2010.
[44] Y.-H. Peng, C.-B. Yang, K.-S. Huang, and K.-T. Tseng, “An algorithm and applications to sequence alignment with weighted constraints,” International Journal of Foundations of Computer Science, Vol. 21, pp. 51–59, 2010.
[45] Y. Pochet and L. A. Wolsey, Production Planning by Mixed Integer Program- ming. Springer-Verlag New York, 2006.
[46] A. M. Rahman and M. S. Rahman, “Effective sparse dynamic programming algorithms for merged and block merged LCS problems,” Journal of Computers, Vol. 9, No. 8, pp. 1743–1754, 2014.
[47] T. J. Schaefer, “The complexity of satisfiability problems,” Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, STOC ’78, pp. 216– 226, ACM, 1978.
[48] A. Schrijver, Theory of Linear and Integer Programming. John Wiley and Sons, Inc. New York, 1986.
[49] H. Tamura and N. Yokoya, “Image database systems: A survey,” Pattern Recog- nition, Vol. 17, No. 1, pp. 29 – 43, 1984.
[50] S. Tanimoto, “An iconic/symbolic data structuring scheme,” Pattern recogni- tion and artificial intelligence, C.H. Chen , Ed. Newyork: Academic, 1976.
[51] Y. T. Tsai, “The constrained longest common subsequence problem,” Informa- tion Processing Letters, Vol. 88, pp. 173–176, 2003.
[52] C.-T. Tseng, C.-B. Yang, and H.-Y. Ann, “Efficient algorithms for the longest common subsequence problem with sequential substring constraints,” Journal of Complexity, Vol. 29, pp. 44–52, 2013.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code