論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
ㄧ個以交易序號及時間值來探勘序列資料庫中之時間間隔序列樣式的方法 The Transation-ID-Time Algorithm for Mining Time-Interval Sequential Patterns in Sequence Databases |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
94 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2013-06-19 |
繳交日期 Date of Submission |
2013-06-27 |
關鍵字 Keywords |
深度優先搜尋、資料探勘、序列樣式、終端邊、時間間隔序列樣式 Data mining, Time-interval sequential patterns, Sequential patterns, Terminal edge, Depth-first search |
||
統計 Statistics |
本論文已被瀏覽 5705 次,被下載 152 次 The thesis/dissertation has been browsed 5705 times, has been downloaded 152 times. |
中文摘要 |
我們可以從大型資料庫中利用資料探勘技術取出一些資料庫內含的豐富資訊,包括一些先前未知的訊息,可能潛在有用的訊息。從大型資料庫中所發現的訊息和知識是非常有用的,它可管範的利用在各種應用,包括市場分析,決策支持,缺點發掘和企業管理等。在這之前許多學者提出了眾多方法來挖掘資料庫中的訊息,而序列樣式的挖掘是其中重要的方法之一。舉一個典型序列樣式的例子,當有一個顧客買了一台電腦後,過不久又返回購買一個掃描器和麥克風。儘管這種類型的序列樣式記錄了每個項目的購買順序,但卻無法提供我們每個項目間的時間間隔資訊。因此後來,一個延伸的新序列樣式被提出,叫時間間隔序列樣式,時間間隔序列樣式不但記錄了每個項目的順序,而且也提供了每個連續項目間的時間間隔資訊。挖掘時間間隔序列樣式有一個非常著名的OGST 演算法,OGST 演算法主要是利用所有長度為2 的時間間隔序列樣式來建構一個有方向性的graph。接著,OGST 演算法利用深度優先搜尋的方法去搜尋此graph,並找出所有時間間隔序列樣式。儘管如此,在一個序列中OGST 演算法不但浪費空間去儲存每個項目的位置,而且還搜尋了許多不需要的路徑。因此OGST 演算法的效能可被改善。而且,我們發現在OGST 演算法中會出現一些不正確的結果和一些遺失的序列樣式。為了避免OGST 演算法這些問題,我們提出了更有效率的TIDT演算法找出時間間隔序列樣式。我們提出針對有方向性的graph 修正版,將有方向性的graph 再多加一些有用的資訊(Sid, Stime, Etime)。有這些新加入資訊後,我們可避免搜尋到不需要的路徑。基於修正有方向性的graph 後,我們利用提出的TIDT 演算法便能更有效率的找出時間間隔序列樣式。我們也利用產生終端邊並且記錄曾經搜尋過的路徑資訊。動態地透過終端邊來修改原有graph,可以避免在搜尋graph 時重複搜尋某些路徑。根據模擬的結果,我們顯示出我們提出的TIDT 演算法比OGST 演算法更有效率。 |
Abstract |
Data mining extracts implicit, previously unknown and potentially useful informationfrom databases. The discovered information and knowledge are useful forvarious applications, including market analysis, decision support, flaw detection andbusiness management. Many approaches have been proposed to extract information,and the mining of sequential patterns is one of the most important methods. A typicalexample of a sequential pattern is like that a customer who has bought a computer,returns to buy a scanner and a microphone. Although this kind of sequential patternincludes the order of the items, the time between the items is unknown. Therefore, anextension of sequential patterns, called time-interval sequential patterns is proposed,which not only reveals the order of items but also the time intervals between successiveitems. The OGST algorithm finds out the time-interval sequential patternsspecially. The OGST algorithm uses the 2-time-interval sequence to construct an orientedgraph. Then, it uses the depth-first search method to traverse the graph, andfind out the all large-time-interval sequences. However, the OGST algorithm not onlywastes space to store the position of item i in a sequence, but also traverses manyunnecessary paths. The performance could be improved. Moreover, we find severalincorrect results and missing sequences based on the OGST algorithm. To avoid theseproblems, in this thesis, we propose the TIDT algorithm to find out the time-intervalsequential patterns efficiently. We propose a revised version of the oriented graph byrecording some more information (Sid, Stime, Etime). Based on the revised versionof the oriented graph, we use our proposed TIDT algorithm to efficiently find thetime-interval sequential patterns. We also use a terminal edge to reduce the searchcost which avoids the case of traversing the same vertex next time. We need not totraverse the same path again. From our performance study based on the syntheticdata, we show that our proposed TIDT algorithm is more efficient than the OGSTalgorithm. |
目次 Table of Contents |
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Repeating Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Non-Trivial Repeating Patterns . . . . . . . . . . . . . . . . . 2 1.2.2 Polyphonic Repeating Patterns . . . . . . . . . . . . . . . . . 3 1.2.3 Maximum-Length Repeating Patterns . . . . . . . . . . . . . . 4 1.3 Mining Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Sequential Patterns without Time-Interval . . . . . . . . . . . 5 1.3.2 Sequential Patterns with Time-Interval . . . . . . . . . . . . . 6 1.3.3 Practical Applications . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 10 2. A Survey of Algorithms for Mining Sequential Patterns . . . . . . 12 2.1 The Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The M2P Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 The I-Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 The OGST Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 The OGST Algorithm . . . . . . . . . . . . . . . . . . . . . . 23 3. The Transation-ID-Time Algorithm . . . . . . . . . . . . . . . . . . 27 3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Step 1: Scan the Database to Produce Large 1-Sequences (L1) 30 3.2.2 Step 2: Produce Large 2-Time Interval Sequences (LT2) from IT Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.3 Step 3: Construct a SIRG(Sid Item Relation Graph) . . . . . 39 3.2.4 Step 4: Search the SIRG to Produce the Large Sequences and Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.1 Generation of Synthetic Data . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.1 Time Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2.2 Sensitivity to Parameters . . . . . . . . . . . . . . . . . . . . . 70 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 |
參考文獻 References |
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Very Large Data Bases, pp. 487–499, 1994. [2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. of the 11th Int. Conf. on Data Engineering, pp. 3–14, 1995. [3] F. Ao, Y. Yan, J. Huang, and K. Huang, “Mining Maximal Frequent Itemsets in Data Streams Based on FP Tree,” Proc. of the 5th Int. Conf. on Machine Learning and Data Mining in Pattern Recognition, pp. 479–489, 2007. [4] C. I. Chang, H. E. Chueh, and N. P. Lin, “Sequential Patterns Mining with Fuzzy Time-Intervals,” Proc. of the 6th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 165–169, 2009. [5] C. I. Chang, H. E. Chueh, and Y. C. Luo, “An Integrated Sequential Patterns Mining with Fuzzy Time-Intervals,” Proc. of Int. Conf. on Systems and Informatics, pp. 2294–2298, 2012. [6] Y. I. Chang, C. E. Li, and T. H. Chen, “A Position-Join Method for Mining Maximum-Length Repeating Patterns in Music Databases,” Proc. of National Computer Symposium, pp. 1–12, 2011. [7] J. Chen, “An Updown Directed Acyclic Graph Approach for Sequential Pattern Mining,” IEEE Trans. on Knowledge and Data Engineering, Vol. 22, No. 7, pp. 913–928, 2010. [8] Y. L. Chen, M. C. Chiang, and M. T. Ko, “Discovering Time-Interval Sequential Patterns in Sequence Databases,” Expert Systems with Applications, Vol. 25, No. 3, pp. 343–354, 2003. [9] Y. L. Chen and T. C.-K. Huang, “Discovering Fuzzy Time-Interval Sequential Patterns in Sequence Databases,” IEEE Trans. on Systems, Man, and Cybernetics, Vol. 35, No. 5, pp. 959–972, 2005. [10] S. Chiu, M. Shan, J. Huang, and H. Li, “Mining Polyphonic Repeating Patterns from Music Data Using Bit-String Based Approaches,” Proc. of IEEE Int. Conf. on Multimedia and Expo, pp. 1170–1173, 2009. [11] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” ACM SIGMOD Record, Vol. 29, No. 2, pp. 1–12, 2000. [12] J. L. Hsu, C. C. Liu, and A. L. P. Chen, “Efficient Repeating Pattern Finding in Music Databases,” Proc. of the 7th Int. Conf. on Information and Knowledge Management, pp. 281–288, 1998. [13] J. L. Hsu, C. C. Liu, and A. L. P. Chen, “Discovering Non-trivial Repeating Patterns in Music Data,” IEEE Trans. on Multimedia, Vol. 3, No. 3, pp. 311–325, 2001. [14] Y. H. Hu, T. C.-K. Huang, H. R. Yang, and Y. L. Chen, “On Mining Multi-Time-Interval Sequential Patterns,” Data and Knowledge Engineering, Vol. 68, No. 10, pp. 1112–1127, 2009. [15] Y. H. Hu, F. Wu, and C. I. Yang, “Mining Multi-Level Time-Interval Sequential Patterns in Sequence Databases,” Proc. of the 2th Int. Conf. on Software Engineering and Data Mining, pp. 416–421, 2010. [16] C. R. Ji and Z. H. Deng, “Mining Frequent Ordered Patterns without Candidate Generation,” Proc. of the 4th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 402–406, 2007. [17] S. Joshi and R. Jain, “A Dynamic Approach for Frequent Pattern Mining Using Transposition of Database,” Proc. of the 2th Int. Conf. on Communication Software and Networks, pp. 498–501, 2010. [18] I. Karydis, A. Nanopoulos, and Y. Manolopoulos, “Finding Maximum-Length Repeating Patterns in Music Databases,” Multimedia Tools and Applications, Vol. 32, No. 1, pp. 49–71, 2006. [19] H. Li and N. Zhang, “Mining Maximal Frequent Itemsets Over a Stream Sliding Window,” Proc. of Information Computing and Telecommunications, pp. 110–113, 2010. [20] K. C. Lin, I. E. Liao, and Z. S. Chen, “An Improved Frequent Pattern Growth Method for Mining Association Rules,” Expert Systems with Applications, Vol. 38, No. 5, pp. 5154–5161, 2011. [21] J. Liu, S. Yan, and J. Ren, “The Design of Storage Structure for Sequence in Incremental Sequential Patterns Mining,” Proc. of the 6th Int. Conf. on Networked Computing and Advanced Information Management, pp. 330–334, 2010. [22] J. Liu, S. Yan, and J. Ren, “The Design of Frequent Sequence Tree in Incremental Mining of Sequential Patterns,” Proc. of IEEE Int. Conf. on Software Engineering and Service Science, pp. 679–682, 2011. [23] E. H. C. Lu, V. S. Tseng, and P. S. Yu, “Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments,” IEEE Trans. On Knowledge and Data Engineering, Vol. 23, No. 6, pp. 914–927, 2011. [24] G. Mao, X. Wu, X. Zhu, G. Chen, and C. Liu, “Mining Maximal Frequent Itemsets from Data Streams,” Information Sciences, Vol. 33, No. 3, pp. 251–262, 2007. [25] D. Perera, J. Kay, I. Koprinska, K. Yacef, and O. R. Zaiane, “Clustering and Sequential Pattern Mining of Online Collaborative Learning Data,” IEEE Trans. On Knowledge and Data Engineering, Vol. 21, No. 6, pp. 759–772, 2009. [26] J. M. Ren and J. R. Jang, “Discovering Time-Constrained Sequential Patterns for Music Genre Classification,” IEEE Trans. on Audio, Speech, and Language Processing, Vol. 20, No. 4, pp. 1134–1144, 2012. [27] S. Tanbeer, C. Ahmed, B. Jeong, and Y. Lee, “Efficient Single-Pass Frequent Pattern Mining Using a Prefix-Tree,” Information Sciences, Vol. 179, No. 5, pp. 559–583, 2009. [28] L.Wang and J. Liu, “Using Oriented Graph to Discover Time-Interval Sequential Patterns,” Proc. of Int. Conf. on Computer Science and Software Engineering, pp. 634–637, 2008. [29] S. J. Yen, Y. S. Lee, C. K.Wang, and J. W.Wu, “An Efficient Approach for Mining Frequent Patterns Based on Traversing a Frequent Pattern Tree,” Proc. of Int. Conf. on Computer Science and Software Engineering, pp. 354–357, 2008. [30] W. Zhang, H. Liao, and N. Zhao, “Research on the FP Growth Algorithm about Association Rule Mining,” Proc. of the 10th Int. Conf. on Business and Information Management, pp. 315–318, 2008. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |