國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個空間資料庫中前k個確切關鍵字搜索的九方區域樹索引方法,A KSNA-Tree Algorithm for the Top-k Exact Keyword Search in Spatial Databases

論文名稱 Title	一個空間資料庫中前k個確切關鍵字搜索的九方區域樹索引方法 A KSNA-Tree Algorithm for the Top-k Exact Keyword Search in Spatial Databases
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	103 學年度第 2 學期 The spring semester of Academic Year 103	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	80
研究生 Author	楊凱甯 Kai-ning Yang
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	陳健輝 Gen-huey Chen
口試委員 Advisory Committee	李建億, 郭大維 Chien-i Lee; Tei-Wei Kuo
口試日期 Date of Exam	2015-06-05	繳交日期 Date of Submission	2015-06-23
關鍵字 Keywords	空間索引結構、空間資料庫、前k個、反向索引、關鍵字比對 Top-k, Spatial Index Structure, Spatial Database, Keyword Matching, Inverted Index
統計 Statistics	本論文已被瀏覽 5721 次，被下載 52 次 The thesis/dissertation has been browsed 5721 times, has been downloaded 52 times.

中文摘要
在最近幾年，地理資訊系統發展迅速並且在很多應用及網站中扮演重要的角色。其中許多的應用程式以及網站可以讓使用者找尋符合關鍵字搜尋並且靠近特定區域的物件。舉例來說，一個使用者想尋找靠近高雄的‘Snoopy hotel’。則其中‘Snoopy hotel’包含了兩個關鍵字，且設定高雄為特定的區域。在這個例子中，我們必須使用top-k空間關鍵字搜尋的演算法。Tao學者等人提出了一個稱作SI-index的資料結構，其整合了inverted index以及R-tree。他們使用了資料壓縮的方法，包含gap-keeping以及Z-value，來降低SI-index所佔用的空間。此外，他們提出了兩個解決top-k空間關鍵字搜尋的演算法，包含一個透過搜索R-tree來找尋答案的方法SI-b，以及一個透過合併inverted index來找尋答案的方法SI-m。然而，在他們的方法中，必須建立大量的R-tree來儲存物件的資料。對於n個關鍵字而言，會有n個R-tree被建立。這將會花費較長的時間來處理資料以及額外的時間將資料解壓縮。物件的資料只能透過搜索那些查詢關鍵字所相對應的R-tree來更新、刪除及查詢。他們必須交叉地查詢k個R-tree才能更新、刪除以及查詢物件的資料，其中k代表的是查詢關鍵字的數量，而且1≤k≤n。因此，在本論文中，我們提出了一個KSNA-tree的方法。KSNA-tree整合了一個由Chang學者等人所提出的NA-tree資料結構，以及inverted index。NA-tree是一個基於物件的位置以及利用spatial number來組織的樹狀結構。而我們方法的貢獻如下：首先，我們只建立一個KSNA-tree來儲存資料，而不是對於資料庫中的n個關鍵字建立n個R-tree。再來，我們利用了spatial number來組織物件的資料，透過直接存取物件的spatial number，可以避免在查詢程序時的隨機存取。最後，我們透過在每一個KSNA-tree的節點中儲存inverted index來加強KSNA-tree。一但我們發現其中一個關鍵字絕對不會出現於某節點時，我們可以省略搜尋該節點以及所有該節點的子節點。從我們的模擬結果顯示，我們所提出的KSNA-tree方法會比SI-index的方法來的有效率。 (關鍵詞：反向索引，關鍵字比對，空間資料庫，空間索引結構，前k個)
Abstract
In recent years, the geographic information system (GIS) develops quickly and plays a significant role in many applications and websites. Many websites and applications allow users to find objects which match with all of the query keywords and are close to a specified location. For instance, a user wants to find the 'Snoopy hotel' near Kaohsiung. The 'Snoopy hotel' has two keywords in the keyword set and Kaohsiung is the specified location. In this case, we have to use the algorithm of finding top-k spatial keyword query. Tao et al: propose a data access structure called the SI-index which integrates the inverted index with R-tree. They use data compression approaches, gap-keeping and Z-value, for reducing the size of the SI-index. Besides, they provide two algorithms for solving the top-k spatial keyword query, including an R-tree browsing algorithm SI-b and an index merging algorithm SI-m. However, in their data structure, a large number of R-trees are built for storing data of objects. For n keywords, n R-trees must be constructed. It takes long time for dealing with data of objects and some extra time for data decompression. When data objects are updated/deleted/queried, their algorithm must traverse all the R-trees of the query keywords. They have to traverse k R-trees in an interleaved fashion for updating, deleting and querying data of objects, where k is the size of query keyword set and 1≤k≤n. Therefore, in this thesis, we propose a KSNA-tree algorithm. The KSNA-tree integrates a spatial index NA-tree with inverted index. The NA-tree is a tree structure based on locations of data objects and organized by the spatial numbers. The contributions of our approach are as follows. First, our approach only construct one KSNA-tree, instead of building n R-trees for n keywords in the database. Second, we organize the data of objects according to their spatial number. This will avoid random access in the query processing by directly accessing the spatial number of a node. Third, we enhance each node in the KSNA-tree with the inverted index. We can prune a node and all of its child nodes immediately once we know that one of the query keywords is definitely not in the node. From our simulation results, we show that our proposed approach is more efficient than the SI-index. (Keywords: Inverted Index, Keyword Matching, Spatial Database, Spatial Index Structure, Top-k)

目次 Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Spatial Keyword Query . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Keyword Matching Approaches . . . . . . . . . . . . . . . . . . . . . 3 1.3 Spatial Index Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 15 2. A Survey of Algorithms for Nearest Neighbor Search with Key-words in Spatial Data-bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1 The IR2-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 The Inverted Linear Quadtree . . . . . . . . . . . . . . . . . . . . . . 19 2.3 The Spatial Inverted List . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 The Compression Scheme and Blocked SI-index . . . . . . . . 24 2.3.2 Query Method . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3. A KSNA-Tree Approach . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 The Partition Numbering Scheme . . . . . . . . . . . . . . . . 28 3.1.2 The NA-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.3 The KSNA Index Structure . . . . . . . . . . . . . . . . . . . 33 3.2 Top-k Spatial Keyword Query Processing . . . . . . . . . . . . . . . . 35 3.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Experiments Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.1 Uniform Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.2 Skew Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

參考文獻 References
[1] S. Alsubaiee, A. Behm, and C. Li, “Supporting Location-Based Approximate-Keyword Queries,” Proc. of the 18th SIGSPATIAL Int. Conf. on Advances in Geographic Information Systems, pp. 61-70, 2010. [2] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. of 1990 ACM SIGMOD Int. Conf. on Management of Data, pp. 322-331, 1990. [3] M. Bern, D. Eppstein, and S.-H. Teng, “Parallel Construction of Quadtrees and Quality Triangulations,” Int. Journal of Computational Geometry and Applications, Vol. 9, No. 6, pp. 517-532, Dec. 1999. [4] X. Cao, G. Cong, C. S. Jensen, and B. C. Ooi, “Collective Spatial Keyword Querying,” Proc. of 2011 ACM SIGMOD Int. Conf. on Management of Data, pp. 373-384, 2011. [5] Y. I. Chang, Z. S. Chen, and Y. G. Liou, “NAAK-Tree: An Index for Querying Spatial Approximate Keywords,” Proc. of 2013 National Computer Symposium, pp. 1-6, 2013. [6] Y. I. Chang, C. H. Liao, and H. L. Chen, “NA-Trees: A Dynamic Index for Spatial Data,” Journal of Information Science and Engineering, Vol. 19, No. 1, pp. 103-139, Jan. 2003. [7] B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal, “The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables,” Proc. of the 15th annual ACM-SIAM Symposium on Discrete Algorithms, pp. 30-39, 2004. [8] G. Cong, C. S. Jensen, and D. Wu, “Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects,” VLDB Endowment, Vol. 2, No. 1, pp. 337-348, Aug. 2009. [9] I. De Felipe, V. Hristidis, and N. Rishe, “Keyword Search on Spatial Databases,” Proc. of the 24th Int. Conf. on Data Engineering, pp. 656-665, 2008. [10] C. Faloutsos and S. Christodoulakis, “Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation,” ACM Trans. on Information Systems, Vol. 2, No. 4, pp. 267-288, Oct. 1984. [11] R. A. Finkel and J. L. Bentley, “Quad Trees: A Data Structure for Retrieval on Composite Keys,” Acta informatica, Vol. 4, No. 1, pp. 1-9, March 1974. [12] A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching,” Proc. of 1984 ACM SIGMOD Int. Conf. on Management of Data, pp. 47-57, 1984. [13] G. R. Hjaltason and H. Samet, “Distance Browsing in Spatial Databases,” ACM Trans. on Database Systems, Vol. 24, No. 2, pp. 265-318, June 1999. [14] I. Kamel and C. Faloutsos, “Hilbert R-tree: an Improved R-tree Using Fractals,” Proc. of 21th Int. Conf. on VLDB, No. 10, pp. 500-509, 1994. [15] A. Kumar, “G-Tree: A New Data Structure for Organizing Multidimensional Data,” IEEE Trans. on Knowledge and Data Engineering, Vol. 6, No. 2, pp. 341-347, April 1994. [16] A. Moffat and J. Zobel, “Self-indexing Inverted Files for Fast Text Retrieval,” ACM Trans. on Information System, Vol. 14, No. 4, pp. 349-379, Oct. 1996. [17] B. Moon, H. Jagadish, C. Faloutsos, and J. Saltz, “Analysis of the Clustering Properties of the Hilbert Space-Filling Curve,” IEEE Trans. on Knowledge and Data Engineering, Vol. 13, No. 1, pp. 124-141, Jan. 2001. [18] J. A. Orenstein, “Spatial Query Processing in an Object-oriented Database System,” Proc. of 1986 ACM SIGMOD Int. Conf. on Management of Data, pp. 326-336, 1986. [19] J. A. Orenstein and T. H. Merrett, “A Class of Data Structures for Associative Searching,” Proc. of the 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pp. 181-190, 1984. [20] T. K. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-tree: A Dynamic Index for Multi-Dimensional Objects,” Proc. of the 13th VLDB Conf., pp. 507-518, 1987. [21] S. Stiassny, “Mathematical Analysis of Various Superimposed Coding Methods,” American Documentation, Vol. 11, No. 2, pp. 155-169, April 1960. [22] Y. Tao and C. Sheng, “Fast Nearest Neighbor Search with Keywords,” IEEE Trans. on Knowledge and Data Engineering, Vol. 26, No. 4, pp. 878-888, April 2014. [23] C. Zhang, Y. Zhang, W. Zhang, and X. Lin, “Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search,” Proc. of the 29th Int. Conf. on Data Engineering, pp. 901-912, 2013. [24] D. Zhang, Y. M. Chee, A. Mondal, A. Tung, and M. Kitsuregawa, “Keyword Search in Spatial Databases: Towards Searching by Document,” Proc. of the 25th Int. Conf. on Data Engineering, pp. 688-699, 2009. [25] L. Zhang, X. Sun, and H. Zhuge, “Density Based Collective Spatial Keyword Query,” Proc. of 2012 IEEE 8th Int. Conf. on Semantics, Knowledge and Grids, pp. 213-216, 2012. [26] L. Zhang, X. Sun, and H. Zhuge, “Density-based Spatial Keyword Querying,” Future Generation Computer Systems, Vol. 32, No. 1, pp. 211-221, March 2014. [27] Y. Zhou, X. Xie, C.Wang, Y. Gong, and W. Y. Ma, “Hybrid Index Structures for Location-based Web Search,” Proc. of the 14th ACM Int. Conf. on Information and Knowledge Management, pp. 155-162, 2005. [28] J. Zobel, A. Moffat, and K. Ramamohanarao, “Inverted Files Versus Signature Files for Text Indexing,” ACM Trans. on Database System, Vol. 23, No. 4, pp. 453-490, Dec. 1998.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0523115-195758.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS