國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,部分片段效益挖掘之研究,A Study of Partial Periodic Utility Mining

論文名稱 Title	部分片段效益挖掘之研究 A Study of Partial Periodic Utility Mining
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	88
研究生 Author	許仁豪 Jen-Hao Hsu
指導教授 Advisor	洪宗貝 Tzung-Pei Hong
召集委員 Convenor	林文揚 Wen-Yang Lin
口試委員 Advisory Committee	江明朝, 蘇家輝, 李淑敏 Ming-Chao Chiang; Ja-Hwung Su; Shu-Min Li
口試日期 Date of Exam	2017-07-27	繳交日期 Date of Submission	2017-09-15
關鍵字 Keywords	資料挖掘、高效益、部分週期樣式、投影、效益上界 data mining, high utility, partial periodic pattern, projection, utility upper bound
統計 Statistics	本論文已被瀏覽 5687 次，被下載 39 次 The thesis/dissertation has been browsed 5687 times, has been downloaded 39 times.

中文摘要
大部分現行的部分週期樣式探勘之研究都只考慮樣式在週期片段資料中的出現頻率來決定樣式的重要性，並假設每個事件的效益值是一樣的。因此，使用傳統部分週期樣式探勘方法將使得一些具高效益但卻出現頻率較低的事件項目不易被挖掘出來。在本論文中，我們將原始問題擴展到高效率部分週期性樣式挖掘，其不僅考慮事件的發生時間順序和周期長度，而且還考慮了它們的數量和利潤。我們設計了一個週期效用函數，並且基於此函數我們提出了三種挖掘高效益部分週期樣式的演算法。第一個方法使用了兩階段週期效益上界模型為基礎，以避免在挖掘過程中的資訊遺失，它並可以作為實驗比較的基礎。第二個方法則藉著逐漸收縮效益上界值來進一步增進演算法的效率。第三個方法則採用了投影技巧來避免不必要的檢查及減少執行時間。最後，在各種參數設置下對這三種算法的性能進行實驗比較而實驗結果顯示投影方法在這三種方法中表現最好。
Abstract
The existing studies related to partial periodic pattern mining only consider the frequency of patterns in periodic segment data to determine their significance, and the same utility is assumed for all events. Thus, some events with high utility but low frequency may not be found by using traditional partial periodic pattern mining techniques. In this thesis, we extend the original problem to high-utility partial periodic pattern mining (HUPPP), which considers not only the occurring time order and periodic length of events but also their quantities and individual profits. We have designed a periodic utility function, and based on it we have proposed three mining algorithms for finding high-utility partial periodic patterns. The first one is the basic algorithm that uses the two-phased periodic utility upper-bound (PUUB) model to avoid information loss in the mining process. It can be used as the ground-truth for experimental comparison. The second one further improves the efficiency by using the gradually pruning algorithm to shrink the utility upper-bounds. The third one adopts the projection technique to avoid unnecessary checking and reduce execution time. Finally, experiments are made to compare the performance of the three proposed algorithms under various parameter settings. Experimental results show the projection approach has the best performance among them.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv Contents v List of Tables vi List of Figures vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Contribution 3 1.3 Thesis Organization 5 Chapter 2 Related Works 6 2.1 Sequential pattern mining 6 2.2 Utility pattern mining 10 2.3 Periodic pattern mining 11 Chapter 3 The Proposed Algorithm 14 3.1 Definition 14 3.2 The High Utility Periodic Pattern Mining Algorithm (HUPPP) 23 3.2.1 HUPPP 24 3.2.2 An Example of HUPPP 28 3.3 The High Utility Periodic Pattern Mining with Gradually Pruning Algorithm (GPA) 36 3.3.1 GPA 37 3.3.2 An Example of the GPA 42 3.4 The High Utility Periodic Pattern Mining with Projected Database Algorithm 51 3.4.1 Projected Database Algorithm 52 3.4.2 An Example of the Projected Database Algorithm 58 Chapter 4 Experiments 68 4.1 Experimental Environment 68 4.2 Experimental Evaluation 69 Chapter 5 Conclusion and Future Work 75 5.1 Conclusion 75 5.2 Future Work 76 References 77

參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” The 20th International Conference on Very Large Data Bases, pp. 487-499, 1994. [2] R. Agrawal and R. Srikant, “Mining sequential patterns,” The 11th International Conference on Data Engineering, pp. 3-14, 1995. [3] J. Ayres, J. Flannick, J. Gehrke and T. Yiu, “Sequential pattern mining using a bitmap representation,” The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429-435, 2002. [4] S. Aseervatham, A. Osmani and E. Viennet, “bitSPADE: A lattice-based sequential pattern mining algorithm using bitmap representation,” The Sixth International Conference on Data Mining, pp. 792-797, 2006. [5] C. F. Ahmed, S. K. Tanbeer and B. S. Jeong, “A novel approach for mining high-utility sequential patterns in sequence databases,” Electronics and Telecommunications Research Institute journal, vol. 32, pp. 676-686, 2010. [6] M. S. Chen, J. Han and P. S. Yu, “Data mining: an overview from a database perspective,” IEEE Transactions on Knowledge and data Engineering, vol. 8, no. 6, pp. 866-883, 1996. [7] A. Erwin, R. P. Gopalan and N. R. Achuthan, “CTU-Mine: an efficient high utility itemset mining algorithm using the pattern growth approach,” The Seventh International Conference on Computer and Information Technology, pp. 71-76, 2007. [8] Frequent itemset mining dataset repository, http://ﬁmi.ua.ac.be/data/, 2012. [9] P. Fournier-Viger, T. Gueniche and V. S. Tseng, “Using partially-ordered sequential rules to generate more accurate sequence prediction,” The International Conference on Advanced Data Mining and Applications, pp. 431-442, 2012. [10] P. Fournier-Viger, C. W. Wu, S. Zida and V. S. Tseng, “FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning,” The 21st International Symposium on Methodologies for Intelligent Systems, pp. 83-92, 2014. [11] P. Fournier-Viger, A. Gomariz, M. Campos and R. Thomas, “Fast vertical mining of sequential patterns using co-occurrence information,” The 18th Paciﬁc-Asia Conference on Knowledge Discovery and Data Mining, pp. 40–52, 2014. [12] E. Z. Guan, X. Y Chang, Z. Wang and C. G. Zhou, “Mining maximal sequential patterns,” The Second International Conference on Neural Networks and Brain, pp. 525-528, 2005. [13] C. Gao, J. Wang, Y. He and L. Zhou, “Efficient mining of frequent sequence generators,” The 17th International Conference on the World Wide Web, pp. 1051-1052, 2008. [14] J. Han, W. Gong and Y. Yin, “Mining segment-wise periodic patterns in time-related databases,” The Fourth International Conference on Knowledge Discovery and Data Mining, pp. 214-218, 1998. [15] J. Han, G. Dong and Y. Yin, “Efficient mining of partial periodic patterns in time series databases,” The 15th International Conference on Data Engineering, pp. 106-115, 1999. [16] T. P. Hong, K. Y. Lin and S. L. Wang, “Mining fuzzy sequential patterns from multiple-items transactions,” The Ninth IFSA World Congress and The 20th NAFIPS International Conference, pp. 1317-1321, 2001. [17] J. Han, J. Pei, Y. Ying and R. Mao, “Mining frequent patterns without candidate generation: a frequent-pattern tree approach,” Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004. [18] K. Y. Huang and C. H. Chang, “Mining periodic patterns in sequence data,” The International Conference on Data Warehousing and Knowledge Discovery, pp. 401-410, 2004. [19] Y. Liu, W. Liao and A. Choudhary, “A two-phase algorithm for fast discovery of high utility itemsets,” The Ninth Paciﬁc-Asia Conference on Knowledge Discovery and Data Mining, pp. 689-695, 2005. [20] C. W. Lin, G. C. Lan and T. P. Hong, “An incremental mining algorithm for high utility itemsets,” Expert Systems with Applications, vol. 39, pp. 7173-7180, 2012. [21] G. C. Lan, T. P. Hong and V. S. Tseng, “An efficient gradual pruning technique for utility mining,” International Journal of Innovative Computing, Information and Control, vol. 8, pp. 5165-5178, 2012. [22] G. C. Lan, T. P. Hong, V. S. Tseng and S. L. Wang, “Applying the maximum utility measure in high utility sequential pattern mining,” Expert Systems with Applications, vol. 41, pp. 5071-5081, 2014. [23] G. C. Lan, T. P. Hong and V. S. Tseng, “An efficient projection-based indexing approach for mining high utility itemsets,” Knowledge and Information Systems, vol. 38, pp. 85-107, 2014. [24] H. Mannila, H. Toivonen and A. I. Verkamo, “Discovering frequent episodes in sequences,” The First International Conference on Knowledge Discovery and Data Mining, pp. 210-215, 1995. [25] M. Muzammal and R. Raman, “On probabilistic models for uncertain sequential pattern mining,” The International Conference on Advanced Data Mining and Applications, pp. 60-72, 2010. [26] M. A. Nishi, C. F. Ahmed, Md. Samiullah and B. S. Jeong, “Effective periodic pattern mining in time series databases,” Expert Systems with Applications, vol. 40, pp. 3015-3027, 2013. [27] B. Özden, S. Ramaswamy and A. Silberschatz. “Cyclic association rules,” The 14th International Conference on Data Engineering, pp. 412-421, 1998. [28] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal and M. C. Hsu, “Mining sequential patterns by pattern-growth: The preﬁxspan approach,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, pp. 1424-1440, 2004. [29] T. T. Pham, J. Luo, T. P. Hong and B. Vo, “MSGPs: a novel algorithm for mining sequential generator patterns,” The Fourth International Conference on Computational Collective Intelligence, pp. 393-401, 2012. [30] Y. Pokou, P. Fournier-Viger and C. Moghrabi, “Authorship attribution using small sets of frequent part-of-speech skip-grams,” The International Florida Artificial Intelligence Research Society Conference, pp. 86-91, 2016. [31] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” The International Conference on Extending Database Technology, pp. 1-17, 1996. [32] S. K. Tanbeer, C. F. Ahmed, B. S. Jeong and Y. K. Lee, “Discovering periodic-frequent patterns in transactional databases,” The 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 242-253, 2009. [33] V. S. Tseng, C. W. Wu, B. E. Shie and P. S. Yu, “UP-Growth: an efficient algorithm for high utility itemset mining,” The ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253-262, 2010. [34] J. Wang, J. Han and C. Li, “Frequent closed sequence mining without candidate maintenance,” IEEE Transactions on Knowledge Data Engineering, vol. 19, pp. 1042-1056, 2007. [35] X. Yan, J. Han and R. Afshar, “Clospan: mining closed sequential patterns in large datasets,” The 2003 SIAM International Conference on Data Mining, pp. 166-177, 2003. [36] H. Yao, H. J. Hamilton and C. J. Butz, “A foundational approach to mining itemset utilities from databases,” The 2004 SIAM International Conference on Data Mining, pp. 482-486, 2004. [37] U. Yun and J. J. Leggett, “WSpan: weighted sequential pattern mining in large sequence databases,” IEEE International Conference on Intelligent Systems, pp. 512-517, 2006. [38] K. J. Yang, T. P. Hong, Y. M. Chen and G. C. Lan, “Projection-based partial periodic pattern mining for event sequences.” Expert Systems with Applications, vol. 40, pp. 4232-4240, 2013. [39] U. Yun, G. Lee and E. Yoon, “Efficient high utility pattern mining for establishing manufacturing plans with sliding window control,” IEEE Transactions on Industrial Electronics, vol. 64, pp. 7239-7249, 2017. [40] M. J. Zaki, “SPADE: an eﬃcient algorithm for mining frequent sequences,” Machine Learning, vol. 42, pp. 31-60, 2001. [41] S. Ziebarth, I. A. Chounta and H. U. Hoppe, “Resource access patterns in exam preparation activities,” The 10th European Conference on Technology Enhanced Learning, pp. 497-502, 2015.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0814117-230426.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS