國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個以MCountP-Tree來探勘空間資料集合中的最大空間共同位置樣式之方法,The MCountP-Tree for Mining Maximal Spatial Co-Location Patterns from Spatial Data Sets

論文名稱 Title	一個以MCountP-Tree來探勘空間資料集合中的最大空間共同位置樣式之方法 The MCountP-Tree for Mining Maximal Spatial Co-Location Patterns from Spatial Data Sets
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	101 學年度第 2 學期 The spring semester of Academic Year 101	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	92
研究生 Author	王政鴻 Cheng-Hung Wang
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	陳健輝 Gen-huey Chen
口試委員 Advisory Committee	郭大維, 林宣華 Tei-Wei Kuo; Shian-hua Lin
口試日期 Date of Exam	2013-06-19	繳交日期 Date of Submission	2013-06-27
關鍵字 Keywords	空間資料庫、空間資料探勘、空間共同位置規則、空間共同位置樣式、共同位置規則 Spatial database, Spatial data mining, Spatial co-location rules, Spatial co-location patterns, Co-location rules
統計 Statistics	本論文已被瀏覽 5699 次，被下載 345 次 The thesis/dissertation has been browsed 5699 times, has been downloaded 345 times.

中文摘要
在最近幾年，地理資訊系統(GIS)發展快速並且在很多應用中扮演重要的角色。在這些大量的地理資訊當中，如何有效率在空間資料中探勘出最大共同位置樣式(the maximal co-locations patterns)，己經是空間資料庫探勘的重要議題。其中應用的例子包括手機服務、疾病衛生管理、犯罪防治…等應用。大部份研究(the full-join, the partial-join, the join-less)探勘的方法都是利用join-based，即是Apriori-like的方式探勘出最大共同位置樣式。但是，利用Apriori-like的方法必須付出龐大的計算成本，因為Apriori-like的方法要探勘出長度為k的共同位置樣式時，必需先探勘出長度為(k-1)的共同位置樣式。為了減少計算的成本，Lizhen Wang等學者提出一種order-clique的方法探勘出最大共同位置樣式。這個方法不同於先前那些join-based的方法，因為他們會先找出最大共同位置樣式的候選值，再利用四種樹的資料結構來探勘出最大共同位置樣式且這方法改進了過去使用表格的方式來探勘資料。因此，order-clique的效能優於過去那些join-based的方法。但是，當門檻值在遞增時，order-clique的效能也許會不太好，因它們的方法並沒有刪除的策略。因此，在此論文中，我們提出了一種含有刪除的策略的新方法探勘出最大共同位置樣式。我們的方法可以比order-clique的方法更準確的探勘出最大共同位置樣式的候選值，主要是因為我們在長度為2的候選值當中加入了刪除的策略。在我們的方法當中，我們提出了四種樹的資料結構，其中包含CountP-tree、MCountP-tree、NeighborI-tree、和CoLI-tree。 CountP-tree的優點是先刪除長度為2的候選值，然而刪除的方法不同於過去那些join-based。而MCountP-tree則可以找出最大共同位置樣式的候選值，我們找到的候選值總是可以比order-clique的方法來的小。NeighborI-tree記錄了空間中所有點的鄰居關係。CoLI-tree是利用MCountP-tree及NeighborI-tree的結果建立而成，並且決定最終的結果。從我們的實驗結果，我們顯示出我們所提出來的方法不管是在密度高或者是密度低的空間資料庫做探勘，效率都優於order-clique方法。
Abstract
In recent years, the geographic information system (GIS) databases develop quickly and play a significant role in many applications. How to efficient mine the maximal co-location patterns in the explosive growth of spatial data is an important issue in spatial data mining. The applications of spatial mining include mobile service request, and public health, public safety. Most of researches (the full-join, the partial-join, the join-less), join-based approaches, adopt the Apriori-like approach to mine the maximal co-location patterns. However, the Apriori-like approach has very expensive computation cost. Because the Apriori-like approach generate size-k prevalence co-locations after size-(k - 1) prevalence co-locations. In order to decrease computation cost of those join-based approaches, Lizhen Wang et al. have proposed an order-clique approach for mining the maximal co-location patterns. This approach is different from those join-based approaches, because it finds candidates of the maximal co-locations candidates first. They use tree data structures to mine the maximal co-location patterns, instead of table instances used in those join-based approaches. Therefore, the performance of the order-clique approach is better than that of those join-based approaches. However, when the threshold increases, the performance of the order-clique approach would not be good due to no use of the pruning strategy. Therefore, in this thesis, we propose a new approach with a pruning strategy to mine the maximal co-location patterns. Our approach would be more accurate than the order-clique approach to find the candidates of maximal co-location patterns, because we use a pruning strategy in the candidates of size 2. In our approach, we propose four tree data structures which include the CountP -tree, the MCountP -tree, the NeighborI-tree, and the CoLI-tree. The advantage of the CountP -tree is to prune the size-2 candidates of the maximal co-location patterns, which is different from pruning instances as used in those join-based approaches. The MCountP -tree can show the candidates of the maximal co-location patterns. The number of candidates of the maximal co-location patterns founded by our approach is smaller than that founded by the order-clique approach. The NeighborI-tree records every instance relation. The CoLI-tree is built from the result of the the MCountP -tree by referring to the NeighborI-tree to decide the final result. From our simulation results, we show that our proposed approach is more efficient than the order-clique approach no matter the data set is sparse or dense.

目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Spatial Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Spatial Co-Location Patterns . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 11 2. A Survey of Approaches for Mining Spatial Co-Location Patterns 12 2.1 The Full Join Approach to Mining Co-Location Patterns . . . . . . . 12 2.2 The Joinless Approach to Mining Co-Location Patterns . . . . . . . . 15 2.3 An Order-Clique-Based Approach to Mining Maximal Co-Location Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Generating Candidate Maximal Co-Locations . . . . . . . . . 17 2.3.2 Identifying Co-Location Table Instances . . . . . . . . . . . . 19 3. The Spatial Co-location Patterns Approach . . . . . . . . . . . . . 23 3.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 The Input of the Spatial Database . . . . . . . . . . . . . . . . . . . . 24 3.3 The Processing of the Proposal Approach . . . . . . . . . . . . . . . . 27 3.4 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

參考文獻 References
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. of the 20th Int. Conf. on Very Large Data Bases, pp. 487–499, 1994. [2] M. Celik, J. M. Kang, and S. Shekhar, “Zonal Co-Location Pattern Discovery with Dynamic Parameters,” Proc. of the 7th IEEE Int. Conf. on Data Mining, pp. 433–438, 2007. [3] B. R. Dai and M. Y. Lin, “Efficiently Mining Dynamic Zonal Co-Location Patterns Based on Maximal Co-Locations,” Proc. of IEEE 11th Int. Conf. on Data Mining Workshops, pp. 861–868, 2011. [4] W. Ding, C. Eick, J.Wang, and X. Yuan, “A Framework for Regional Association Rule Mining in Spatial Datasets,” Proc. of the 6th Int. Conf. on Data Mining, pp. 851–856, 2006. [5] G. Fang, J. Xiong, X. L. Du, and X. B. Tang, “Frequent Neighboring Class Set Mining,” Proc. of the 7th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 1442–1445, 2010. [6] T. Hu, S. Y. Sung, H. Xiong, and Q. Fu, “Discovery of Maximum Length Frequent Itemsets,” Information Sciences, Vol. 178, No. 1, pp. 69–87, Jan. 2008. [7] Y. Huang, S. Shekhar, and H. Xiong, “Discovering Colocation Patterns from Spatial Data Sets: A General Approach,” IEEE Trans. on Knowledge and Data Engineering, Vol. 16, No. 12, pp. 1472–1485, Dec. 2004. [8] Y. Huang and P. Zhang, “On the Relationships Between Clustering and Spatial Co-Location Pattern Mining,” Proc. of the 18th IEEE Int. Conf. on Tools with Artificial Intelligence, pp. 513–522, 2006. [9] K. S. Kim, Y. Kim, and U. Kim, “Maximal Cliques Generating Algorithm for Spatial Co-Location Pattern Mining,” Secure and Trust Computing, Data Management and Applications, pp. 241–250, 2011. [10] Y. Morimoto, “Mining Frequent Neighboring Class Sets in Spatial Databases,” Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 353–358, 2001. [11] S. Shekhar and Y. Huang, “Discovering Spatial Co-Location Patterns: A Summary of Results,” Proc. of the 7th Int. Symposium on Advances in Spatial and Temporal Databases, pp. 236–256, 2001. [12] S. Shekhar, P. Zhang, Y. Huang, and R. R. Vatsavai, “Trends in Spatial Data Mining,” Data Mining: Next Generation Challenges and Future Directions, AAAI/MIT Press, pp. 357–380, 2004. [13] F. Verhein and G. Al-Naymat, “Fast Mining of Complex Spatial Co-Location Patterns Using GLIMIT,” Proc. of the 7th IEEE Int. Conf. on Data Mining Workshops, pp. 679–684, 2007. [14] Y. Wan, J. Zhou, and F. Bian, “CODEM: A Novel Spatial Co-Location and De-location Patterns Mining Algorithm,” Proc. of the 5th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 576 –580, 2008. [15] L. Wang, Y. Bao, J. Lu, and J. Yip, “A New Join-less Approach for Co-Location Pattern Mining,” Proc. of CIT 8th IEEE Int. Conf. on Computer and Information Technology, pp. 197–202, 2008. [16] L. Wang, K. Xie, T. Chen, and X. Ma, “Efficient Discovery of Multilevel Spatial Association Rules Using Partitions,” Information Software Technology, Vol. 47, No. 13, pp. 829–840, Oct. 2005. [17] L.Wang, L. Zhou, J. Lu, and J. Yip, “An Order-Clique-Based Approach for Mining Maximal Co-Locations,” Information Sciences, Vol. 179, No. 19, pp. 3370– 3382, Sept. 2009. [18] J. S. Yoo and M. Bow, “Mining Top-k Closed Co-Location Patterns,” Proc. of IEEE Int. Conf. on Spatial Data Mining and Geographical Knowledge Services, pp. 100–105, 2011. [19] J. S. Yoo and M. Bow, “Mining Maximal Co-Located Event Sets,” Proc. of the 15th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining, pp. 351–362, 2011. [20] J. S. Yoo and M. Bow, “Mining Spatial Colocation Patterns: A Different Framework,” Data Min. Knowledge Discovery, Vol. 24, No. 1, pp. 159–194, Jan. 2012. [21] J. S. Yoo and J. Hwang, “A Framework for Discovering Spatio-Temporal Cohesive Networks,” Proc. of the 12th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining, pp. 1056–1061, 2008. [22] J. S. Yoo and S. Shekhar, “A Joinless Approach for Mining Spatial Colocation Patterns,” IEEE Trans. on Knowledge and Data Engineering, Vol. 18, No. 10, pp. 1323–1337, Oct. 2006. [23] J. S. Yoo, S. Shekhar, J. Smith, and J. P. Kumquat, “A Partial Join Approach for Mining Co-Location Patterns,” Proc. of the 12th Annual ACM Int. Workshop on Geographic Information Systems, pp. 241–249, 2004.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0527113-133645.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS