Responsive image
博碩士論文 etd-0527113-121634 詳細資訊
Title page for etd-0527113-121634
論文名稱
Title
ㄧ個以交易序號及時間值來探勘序列資料庫中之時間間隔序列樣式的方法
The Transation-ID-Time Algorithm for Mining Time-Interval Sequential Patterns in Sequence Databases
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
94
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-06-19
繳交日期
Date of Submission
2013-06-27
關鍵字
Keywords
深度優先搜尋、資料探勘、序列樣式、終端邊、時間間隔序列樣式
Data mining, Time-interval sequential patterns, Sequential patterns, Terminal edge, Depth-first search
統計
Statistics
本論文已被瀏覽 5705 次,被下載 152
The thesis/dissertation has been browsed 5705 times, has been downloaded 152 times.
中文摘要
我們可以從大型資料庫中利用資料探勘技術取出一些資料庫內含的豐富資訊,包括一些先前未知的訊息,可能潛在有用的訊息。從大型資料庫中所發現的訊息和知識是非常有用的,它可管範的利用在各種應用,包括市場分析,決策支持,缺點發掘和企業管理等。在這之前許多學者提出了眾多方法來挖掘資料庫中的訊息,而序列樣式的挖掘是其中重要的方法之一。舉一個典型序列樣式的例子,當有一個顧客買了一台電腦後,過不久又返回購買一個掃描器和麥克風。儘管這種類型的序列樣式記錄了每個項目的購買順序,但卻無法提供我們每個項目間的時間間隔資訊。因此後來,一個延伸的新序列樣式被提出,叫時間間隔序列樣式,時間間隔序列樣式不但記錄了每個項目的順序,而且也提供了每個連續項目間的時間間隔資訊。挖掘時間間隔序列樣式有一個非常著名的OGST 演算法,OGST 演算法主要是利用所有長度為2 的時間間隔序列樣式來建構一個有方向性的graph。接著,OGST 演算法利用深度優先搜尋的方法去搜尋此graph,並找出所有時間間隔序列樣式。儘管如此,在一個序列中OGST 演算法不但浪費空間去儲存每個項目的位置,而且還搜尋了許多不需要的路徑。因此OGST 演算法的效能可被改善。而且,我們發現在OGST 演算法中會出現一些不正確的結果和一些遺失的序列樣式。為了避免OGST 演算法這些問題,我們提出了更有效率的TIDT演算法找出時間間隔序列樣式。我們提出針對有方向性的graph 修正版,將有方向性的graph 再多加一些有用的資訊(Sid, Stime, Etime)。有這些新加入資訊後,我們可避免搜尋到不需要的路徑。基於修正有方向性的graph 後,我們利用提出的TIDT 演算法便能更有效率的找出時間間隔序列樣式。我們也利用產生終端邊並且記錄曾經搜尋過的路徑資訊。動態地透過終端邊來修改原有graph,可以避免在搜尋graph 時重複搜尋某些路徑。根據模擬的結果,我們顯示出我們提出的TIDT 演算法比OGST 演算法更有效率。
Abstract
Data mining extracts implicit, previously unknown and potentially useful informationfrom databases. The discovered information and knowledge are useful forvarious applications, including market analysis, decision support, flaw detection andbusiness management. Many approaches have been proposed to extract information,and the mining of sequential patterns is one of the most important methods. A typicalexample of a sequential pattern is like that a customer who has bought a computer,returns to buy a scanner and a microphone. Although this kind of sequential patternincludes the order of the items, the time between the items is unknown. Therefore, anextension of sequential patterns, called time-interval sequential patterns is proposed,which not only reveals the order of items but also the time intervals between successiveitems. The OGST algorithm finds out the time-interval sequential patternsspecially. The OGST algorithm uses the 2-time-interval sequence to construct an orientedgraph. Then, it uses the depth-first search method to traverse the graph, andfind out the all large-time-interval sequences. However, the OGST algorithm not onlywastes space to store the position of item i in a sequence, but also traverses manyunnecessary paths. The performance could be improved. Moreover, we find severalincorrect results and missing sequences based on the OGST algorithm. To avoid theseproblems, in this thesis, we propose the TIDT algorithm to find out the time-intervalsequential patterns efficiently. We propose a revised version of the oriented graph byrecording some more information (Sid, Stime, Etime). Based on the revised versionof the oriented graph, we use our proposed TIDT algorithm to efficiently find thetime-interval sequential patterns. We also use a terminal edge to reduce the searchcost which avoids the case of traversing the same vertex next time. We need not totraverse the same path again. From our performance study based on the syntheticdata, we show that our proposed TIDT algorithm is more efficient than the OGSTalgorithm.
目次 Table of Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Repeating Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Non-Trivial Repeating Patterns . . . . . . . . . . . . . . . . . 2
1.2.2 Polyphonic Repeating Patterns . . . . . . . . . . . . . . . . . 3
1.2.3 Maximum-Length Repeating Patterns . . . . . . . . . . . . . . 4
1.3 Mining Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Sequential Patterns without Time-Interval . . . . . . . . . . . 5
1.3.2 Sequential Patterns with Time-Interval . . . . . . . . . . . . . 6
1.3.3 Practical Applications . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 10
2. A Survey of Algorithms for Mining Sequential Patterns . . . . . . 12
2.1 The Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The M2P Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 The I-Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 The OGST Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 The OGST Algorithm . . . . . . . . . . . . . . . . . . . . . . 23
3. The Transation-ID-Time Algorithm . . . . . . . . . . . . . . . . . . 27
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Step 1: Scan the Database to Produce Large 1-Sequences (L1) 30
3.2.2 Step 2: Produce Large 2-Time Interval Sequences (LT2) from
IT Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Step 3: Construct a SIRG(Sid Item Relation Graph) . . . . . 39
3.2.4 Step 4: Search the SIRG to Produce the Large Sequences and
Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Generation of Synthetic Data . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Time Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 Sensitivity to Parameters . . . . . . . . . . . . . . . . . . . . . 70
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Very Large Data Bases, pp. 487–499, 1994.
[2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. of the 11th Int. Conf. on Data Engineering, pp. 3–14, 1995.
[3] F. Ao, Y. Yan, J. Huang, and K. Huang, “Mining Maximal Frequent Itemsets in Data Streams Based on FP Tree,” Proc. of the 5th Int. Conf. on Machine Learning and Data Mining in Pattern Recognition, pp. 479–489, 2007.
[4] C. I. Chang, H. E. Chueh, and N. P. Lin, “Sequential Patterns Mining with Fuzzy Time-Intervals,” Proc. of the 6th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 165–169, 2009.
[5] C. I. Chang, H. E. Chueh, and Y. C. Luo, “An Integrated Sequential Patterns Mining with Fuzzy Time-Intervals,” Proc. of Int. Conf. on Systems and Informatics, pp. 2294–2298, 2012.
[6] Y. I. Chang, C. E. Li, and T. H. Chen, “A Position-Join Method for Mining Maximum-Length Repeating Patterns in Music Databases,” Proc. of National Computer Symposium, pp. 1–12, 2011.
[7] J. Chen, “An Updown Directed Acyclic Graph Approach for Sequential Pattern Mining,” IEEE Trans. on Knowledge and Data Engineering, Vol. 22, No. 7, pp. 913–928, 2010.
[8] Y. L. Chen, M. C. Chiang, and M. T. Ko, “Discovering Time-Interval Sequential Patterns in Sequence Databases,” Expert Systems with Applications, Vol. 25, No. 3, pp. 343–354, 2003.
[9] Y. L. Chen and T. C.-K. Huang, “Discovering Fuzzy Time-Interval Sequential Patterns in Sequence Databases,” IEEE Trans. on Systems, Man, and Cybernetics, Vol. 35, No. 5, pp. 959–972, 2005.
[10] S. Chiu, M. Shan, J. Huang, and H. Li, “Mining Polyphonic Repeating Patterns from Music Data Using Bit-String Based Approaches,” Proc. of IEEE Int. Conf. on Multimedia and Expo, pp. 1170–1173, 2009.
[11] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” ACM SIGMOD Record, Vol. 29, No. 2, pp. 1–12, 2000.
[12] J. L. Hsu, C. C. Liu, and A. L. P. Chen, “Efficient Repeating Pattern Finding in Music Databases,” Proc. of the 7th Int. Conf. on Information and Knowledge Management, pp. 281–288, 1998.
[13] J. L. Hsu, C. C. Liu, and A. L. P. Chen, “Discovering Non-trivial Repeating Patterns in Music Data,” IEEE Trans. on Multimedia, Vol. 3, No. 3, pp. 311–325, 2001.
[14] Y. H. Hu, T. C.-K. Huang, H. R. Yang, and Y. L. Chen, “On Mining Multi-Time-Interval Sequential Patterns,” Data and Knowledge Engineering, Vol. 68, No. 10, pp. 1112–1127, 2009.
[15] Y. H. Hu, F. Wu, and C. I. Yang, “Mining Multi-Level Time-Interval Sequential Patterns in Sequence Databases,” Proc. of the 2th Int. Conf. on Software Engineering and Data Mining, pp. 416–421, 2010.
[16] C. R. Ji and Z. H. Deng, “Mining Frequent Ordered Patterns without Candidate Generation,” Proc. of the 4th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 402–406, 2007.
[17] S. Joshi and R. Jain, “A Dynamic Approach for Frequent Pattern Mining Using Transposition of Database,” Proc. of the 2th Int. Conf. on Communication Software and Networks, pp. 498–501, 2010.
[18] I. Karydis, A. Nanopoulos, and Y. Manolopoulos, “Finding Maximum-Length Repeating Patterns in Music Databases,” Multimedia Tools and Applications, Vol. 32, No. 1, pp. 49–71, 2006.
[19] H. Li and N. Zhang, “Mining Maximal Frequent Itemsets Over a Stream Sliding Window,” Proc. of Information Computing and Telecommunications, pp. 110–113, 2010.
[20] K. C. Lin, I. E. Liao, and Z. S. Chen, “An Improved Frequent Pattern Growth Method for Mining Association Rules,” Expert Systems with Applications, Vol. 38, No. 5, pp. 5154–5161, 2011.
[21] J. Liu, S. Yan, and J. Ren, “The Design of Storage Structure for Sequence in Incremental Sequential Patterns Mining,” Proc. of the 6th Int. Conf. on Networked Computing and Advanced Information Management, pp. 330–334, 2010.
[22] J. Liu, S. Yan, and J. Ren, “The Design of Frequent Sequence Tree in Incremental Mining of Sequential Patterns,” Proc. of IEEE Int. Conf. on Software Engineering and Service Science, pp. 679–682, 2011.
[23] E. H. C. Lu, V. S. Tseng, and P. S. Yu, “Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments,” IEEE Trans. On Knowledge and Data Engineering, Vol. 23, No. 6, pp. 914–927, 2011.
[24] G. Mao, X. Wu, X. Zhu, G. Chen, and C. Liu, “Mining Maximal Frequent Itemsets from Data Streams,” Information Sciences, Vol. 33, No. 3, pp. 251–262, 2007.
[25] D. Perera, J. Kay, I. Koprinska, K. Yacef, and O. R. Zaiane, “Clustering and Sequential Pattern Mining of Online Collaborative Learning Data,” IEEE Trans. On Knowledge and Data Engineering, Vol. 21, No. 6, pp. 759–772, 2009.
[26] J. M. Ren and J. R. Jang, “Discovering Time-Constrained Sequential Patterns for Music Genre Classification,” IEEE Trans. on Audio, Speech, and Language Processing, Vol. 20, No. 4, pp. 1134–1144, 2012.
[27] S. Tanbeer, C. Ahmed, B. Jeong, and Y. Lee, “Efficient Single-Pass Frequent Pattern Mining Using a Prefix-Tree,” Information Sciences, Vol. 179, No. 5, pp. 559–583, 2009.
[28] L.Wang and J. Liu, “Using Oriented Graph to Discover Time-Interval Sequential Patterns,” Proc. of Int. Conf. on Computer Science and Software Engineering, pp. 634–637, 2008.
[29] S. J. Yen, Y. S. Lee, C. K.Wang, and J. W.Wu, “An Efficient Approach for Mining Frequent Patterns Based on Traversing a Frequent Pattern Tree,” Proc. of Int. Conf. on Computer Science and Software Engineering, pp. 354–357, 2008.
[30] W. Zhang, H. Liao, and N. Zhao, “Research on the FP Growth Algorithm about Association Rule Mining,” Proc. of the 10th Int. Conf. on Business and Information Management, pp. 315–318, 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code