Responsive image
博碩士論文 etd-0108115-142017 詳細資訊
Title page for etd-0108115-142017
論文名稱
Title
共同多重集區間的高效率演算法
Efficient Algorithms for the Common Multiset Interval Problem
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
47
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-01-26
繳交日期
Date of Submission
2015-02-08
關鍵字
Keywords
行為知識空間、共同多重集區間、大學能力程式檢定、最長共同子序列、共同區間、組合語言碼
Assembly Code, CI, LCS, CPE, BKS, CMI
統計
Statistics
本論文已被瀏覽 5696 次,被下載 395
The thesis/dissertation has been browsed 5696 times, has been downloaded 395 times.
中文摘要
對於兩序列 A = a1a2a3 … am 以及 B = b1b2b3 … cn, 一個多重集區間為 ∆(A, i, j) = [ax | i ≤ x ≤ j], 以及一個同時出現在兩序列的多重集,共同多重集區間 (common multisets interval,CMI)為∆(A, iA, jA) = ∆(B, iB, jB) 對於 $iA, jA, iB, jB. 1 ≤ iA ≤ jA ≤ m 且 1 ≤ iB ≤ jB ≤ n。
先前,研究者推出了用以找到兩個排列(permutation)以及序列(sequence)的共同區間演算法。 在這篇碩士論文中,我們推出兩個用來在兩序列中找到共同多重集區間的演算法。第一個演算法是 occurrence counting 演算法,它計算出現元素在兩輸入序列中所有區間的次數並且計算元素出現次數的差值。它的時間複雜度是O(n3),n 代表輸入序列的長度。第二個演算法是 hash key 演算法,使用質數乘積以及模運算來建立哈希表以便加快搜尋。第二個演算法的時間複雜度是 O(n2 + Gn + qn) 或 O(n2|Σ| + G|Σ|+ q|Σ|), G 代表答案的數量而 q 代表錯誤碰撞的數量。在我們的實驗中,我們使用CPE (Collegiate Programming Examination of Taiwan) 中的C/C++程式碼作為我們分類用的資料集。實驗結果顯示BKS (behavior knowledge space) 混合 LCS (longest common subsequence) 和 CMI 可以得到比兩個方法單獨使用還要高的準確度。
Abstract
For two sequences A = a1a2a3 … am and B = b1b2b3 … cn, a multiset interval ∆(A, i, j) = [ax | i ≤ x ≤ j], and a common multisets interval (CMI) is ∆(A, iA, jA) = ∆(B, iB, jB) for some $iA, jA, iB, jB. 1 ≤ iA ≤ jA ≤ m and 1 ≤ iB ≤ jB ≤ n, which is a multiset that appears in both sequences.
Previously, researchers have proposed algorithms for finding the common set interval of permutations and sequences. In this thesis, we propose two algorithms to find common multiset intervals of two sequences. The first is the occurrence counting algorithm, which counts the occurrences of the characters in all intervals of the two input sequences and calculate the difference of character occurrences. Its time complexity is O(n3) time, where n denotes the length of the input sequences. The second is the hash key algorithm, which use the product of prime numbers and the modulo operation to build a hash table for quick search. The time complexity of the second algorithm is O(n2 + Gn + qn) or O(n2|Σ| + G|Σ|+ q|Σ|), where G denotes the number of answers and q denotes the number of error collisions. In our experiments, we use C/C++ source codes in CPE (Collegiate Programming Examination of Taiwan) as the data set for classification. The experimental results show that the BKS (behavior knowledge space) method with the combination of the LCS (longest common subsequence) and CMI classifiers can obtain better accuracy than the two methods alone.
目次 Table of Contents
中文論文審定書 i
英文論文審定書 ii
謝辭 iii
中文摘要 iv
英文摘要 v
TABLE OF CONTENTS vii
LIST OF FIGURES viii
LIST OF TABLES ix
Chapter 1. Introduction 1
1.1 Definitions 1
Chapter 2. Previous Works 3
2.1 The Common Set Intervals of Two Permutations 3
2.1.1 Algorithm 1 of Uno and Yagiura 4
2.1.2 Algorithm 2 of Uno and Yagiura 4
2.1.3 Algorithm 3 of Uno and Yagiura 6
2.1.4 Algorithm 4 of Uno and Yagiura 7
2.2 The Common Set Intervals of k Permutations 7
2.3 The Common Set Intervals of Two Sequences 8
2.4 The Common Set Intervals of k Sequences 10
2.5 The Longest Common Subsequence Problem 11
2.6 The Behavior Knowledge Space Method 11
Chapter 3. Algorithms for the Common Multiset Interval Problem 13
3.1 The Occurrence Counting Algorithm 13
3.2 The Hash Key Algorithm 14
Chapter 4. Experimental Results 26
4.1 Assembly Process 26
4.2 BKS with LCS and CMI 28
Chapter 5. Conclusions 31
BIBLIOGRAPHY 32
參考文獻 References
[1]H.-Y. Ann, C.-B. Yang, C.-T. Tseng, and C.-Y. Hor, “A fast and simple algorithm for computing
the longest common subsequence of run-length encoded strings,” Information Processing
Letters, Vol. 108, pp. 360–364, 2008.
[2] M.-P. B´eal, A. Bergeron, S. Corteel, and M. Raffinot, “An algorithmic view of gene teams,”
Theoretical Computer Science, Vol. 320, No. 2-3, pp. 395–418, 2004.
[3] K.-Y. Cheng, K.-S. Huang, and C.-B. Yang, “The longest common subsequence problem
with the gapped constraint,” Proc. of the 30th Workshop on Combinatorial Mathematics and
Computation Theory, pp. 80–85, 2013.
[4] M. Clauss, M. Bernt, and M. Middendorf, “A common interval guided aco algorithm for
permutation problems,” 2013 IEEE Symposium on Swarm Intelligence (SIS), pp. 64–71, 2013.
[5] G. Didier, “Common intervals of two sequences,” Algorithms in Bioinformatics, Vol. 2812,
pp. 17–24, 2003.
[6] M. Y. Galperin and E. V. Koonin, “Who’s your neighbor? new computational approaches for
functional genomics,” Nature Biotechnology, Vol. 18, No. 6, pp. 609–13, 2009.
[7] S. Heber and J. Stoye, “Finding all common intervals of k permutations,” In Combinatorial
Pattern Matching, 12th Annual Symposium, CPM 2001, pp. 207–218, Springer Verlag, 2001.
[8] D. S. Hirschberg, “A linear space algorithm for computing maximal common subsequences,”
Communications of the ACM, Vol. 18, pp. 341–343, 1975.
[9] K.-S. Huang, C.-B. Yang, and K.-T. Tseng, “Fast algorithms for finding the common subsequence
of multiple sequences,” Proceedings of International Computer Symposium, Taipei,
Taiwan, pp. 90(Abstract, full text in CD), 2004.
[10] K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng, “Efficient algorithms
for finding interleaving relationship between sequences,” Information Processing Letters,
Vol. 105(5), pp. 188–193, 2008.
[11] K.-S. Huang, C.-B. Yang, K.-T. Tseng, Y.-H. Peng, and H.-Y. Ann, “Dynamic programming
algorithms for the mosaic longest common subsequence problem,” Information Processing
Letters, Vol. 102, pp. 99–103, 2007.
[12] Y. Huang and C. Suen, “The behavior-knowledge space method for combination of multiple
classifiers,” Computer Vision and Pattern Recognition, 1993. Proceedings CVPR ’93., 1993
IEEE Computer Society Conference on, pp. 347–352, Jun 1993.
32[13] J. W. Hunt and T. G. Szymanski, “A fast algorithm for computing longest common subsequences,”
Communications of the ACM, Vol. 20(5), pp. 350–353, 1977.
[14] C. S. Iliopoulos and M. S. Rahman, “Algorithms for computing variants of the longest common
subsequence problem,” Theoretical Computer Science, Vol. 395, pp. 255–267, 2008.
[15] C. S. Iliopoulos and M. S. Rahman, “New efficient algorithms for the LCS and constrained
LCS problems,” Information Processing Letters, Vol. 106(1), pp. 13–18, 2008.
[16] W. C. Lathe, B. Snel, and P. Bork, “Gene context conservation of a higher order than operons,”
Trends in Biochemical Sciences, Vol. 25, No. 10, pp. 474–479, 2000.
[17] N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot, “Gene teams: a new formalization of gene
clusters for comparative genomics,” Computational Biology and Chemistry, Vol. 27, No. 1,
pp. 59–67, 2003.
[18] R. Overbeek, M. Fonstein, M. D’Souza, G. D. Pusch, and N. Maltsev, “The use of gene
clusters to infer functional coupling,” Proceedings of the National Academy of Sciences of the
United States of America, Vol. 96, No. 6, pp. 2896–2901, 1999.
[19] R. J. Parikh, “On context-free languages,” J. ACM, Vol. 13, No. 4, pp. 570–581, Oct. 1966.
[20] Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, and C.-Y. Hor, “Efficient sparse dynamic
programming for the merged LCS problem with block constraints,” International Journal of
Innovative Computing, Information and Control, Vol. 6, pp. 1935–1947, 2010.
[21] Y.-H. Peng, C.-B. Yang, K.-S. Huang, and K.-T. Tseng, “An algorithm and applications
to sequence alignment with weighted constraints,” International Journal of Foundations of
Computer Science, Vol. 21, pp. 51–59, 2010.
[22] I. Rusu, “Extending common intervals searching from permutations to sequences,” The Computing
Research Repository, Vol. abs/1310.4290, 2013.
[23] T. Schmidt and J. Stoye, “Quadratic time algorithms for finding common intervals in two and
more sequences,” In Proceedings of the 15th Annual Symposium on Combinatorial Pattern
Matching, CPM 2004, pp. 347–58, Springer, 2004.
[24] I. Stewart, Galois theory. Chapman Hall/CRC Mathematics, third ed., 2003.
[25] J. Tamames, “Evolution of gene order conservation in prokaryotes,” Genome Biology, Vol. 2,
No. 6, p. research0020.1Vresearch0020.11, 2001.
[26] J. Tamames, G. Casari, C. Ouzounis, and A. Valencia, “Conserved clusters of functionally
related genes in two bacterial genomes,” Journal of Molecular Evolution, Vol. 44, No. 1,
pp. 66–73, Jan. 1997.
[27] T. Uno and M. Yagiura, “Fast algorithms to enumerate all common intervals of two permutations,”
Algorithmica, Vol. 26, No. 2, pp. 290–309, 2000.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code