Responsive image
博碩士論文 etd-0627118-203130 詳細資訊
Title page for etd-0627118-203130
論文名稱
Title
ICQ-Tree:一個用於增量資料庫中前k個空間關鍵字查詢的反向編碼四元樹
ICQ-Tree: An Inverted Code Quadtree for the Top-K Spatial Keyword Query over the Incremental Database
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
74
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2018-07-27
繳交日期
Date of Submission
2018-07-29
關鍵字
Keywords
前k個、空間資料庫、四元樹、關鍵字搜尋、反向索引
Spatial Database, Top-k, Quadtree, Keyword Search, Inverted Index
統計
Statistics
本論文已被瀏覽 5669 次,被下載 0
The thesis/dissertation has been browsed 5669 times, has been downloaded 0 times.
中文摘要
隨著地理空間資料的快速發展,空間資料庫的應用也愈來愈廣泛。像是我們在Google Map上輸入關鍵字`restaurant',系統便會顯示位於你附近的多間餐廳。或是在Instagram上傳自動定位的照片,並且可以使用hashtag來為照片做分類。這些地理資訊的應用,為我們生活帶來許多方便及樂趣。由於空間物件及關鍵字物件的數量增加,資料的儲存及搜尋的效率也就顯得更加重要。一個空間關鍵字查詢是由空間資訊和關鍵字資訊組成。空間可能是地圖上的一個點、線或形狀;而關鍵字是指一段描述或使用者的註解。在一個空間查詢中回傳前k個符合關鍵字條件的結果,我們稱之為前k個空間關鍵字搜尋。Zhang學者提出的IL-Quadtree結構,結合inverted index和linear quadtree來有效減少儲存空間。為進一步提高前k個空間關鍵字查詢演算法的效率,他們提出signature的過濾技術。在查詢的過程中,此方法檢查signature來判斷該節點是否為候選節點。如此便可減少走訪節點的次數。然而,在他們的方法中,必須建立大量的quadtree來儲存物件資料。對於n個關鍵字而言,會有n個quadtree被建立。在進行signature檢查時,必須交叉檢查每個IL-Quadtree。這將會花費較多的時間來處理資料。因此,在本論文中,我們提出了一個ICQ-tree的方法。ICQ-tree結合quadtree以及我們從inverted index改進而成的inverted code。而我們的方法貢獻如下:首先我們只建立一個ICQ-tree來儲存資料,而不是對於資料庫中的n個關鍵字建立n個quadtree。再則,我們每個節點記錄不同的位置,且每個物件只記錄一次。最後,我們透過在每一個ICQ-tree節點中儲存inverted code來加強ICQ-tree。當我們發現其中一個關鍵字絕對不會出現在某節點時,我們可以馬上省略搜尋該節點以及所有該節點的子節點。從我們的實驗數據顯示,我們所提出的ICQ-tree方法會比IL-Quadtree方法來的更有效率。
Abstract
With the rapid development of geo-spatial data, the use of spatial databases has become more and more widespread. For example, if we enter the keyword ‘restaurant’ on Google Map, the system will display a number of restaurants near you, or upload photos with the location information in Instagram and uses hashtags to classify photos. The application of the geographic information has brought a lot of convenience and interests to our life. As the number of spatial objects and keyword objects increases, the data storage and the search efficiency become more important. A spatial keyword query consists of the spatial information and the textual information. Spatial information can be expressed as a point, a line, or the shape on a map; a keyword is the description or the user’s comment. In the space, for querying the top-k objects that match the keyword conditions, it is called the top-k spatial keyword query. Zhang et al. propose a data structure called the IL-Quadtree which combines the inverted index and the linear quadtree for reducing the storage size effectively. In order to further improve the efficiency of the top-k spatial keyword query algorithms, they propose a signature filtering technique. During the query process, it checks the signature to verify whether the node is a candidate node or not. This will reduce the times of traces around nodes. However, in their approach, a large number of quadtrees must be built to store data of objects. For n keywords, it creates n trees. When performing a signature check, each IL-Quadtree must be cross-checked. This will take long time to process the data. Therefore, in this thesis, we propose the ICQ-tree algorithm. Our ICQ-tree combines the quadtree and the inverted code that we have improved from the inverted index. The contribution of our approach is as follows. First, we only construct an ICQ-tree to store data, instead of building n quadtrees for the n keywords in the database. Second, each node of the ICQ-tree records the non-repetitive location, and each object is recorded only once. Finally, we enhance each node in the ICQ-tree with the inverted code. We can prune a node and all of its child nodes immediately, once we know that one of the query keywords is definitely not in the node. From our simulation results, we show that our proposed approach is more efficient than the IL-Quadtree algorithm.
目次 Table of Contents
THESIS VALIDATION LETTER i
ACKNOWLEDGEMENTS ii
ABSTRACT (CHINESE) iii
ABSTRACT iv
LIST OF FIGURES vii
LIST OF TABLES x
1. Introduction 1
1.1 The Top-k Spatial Keyword Query 1
1.2 Keyword Matching Approaches 3
1.3 Spatial Index Structure 7
1.4 Related Works 9
1.5 Motivation 11
1.6 Organization of the Thesis 12
2. A Survey of Algorithms for Top-k Spatial Keyword Query 13
2.1 Density Based Spatial Keyword Query 13
2.1.1 Density Based Search 14
2.1.2 The IR-Tree 15
2.2 NNS with Keywords Base on SI-Index 17
2.2.1 The IR2-Tree 17
2.2.2 The Spatial Inverted Index (SI-Index) 19
2.3 The Inverted Linear Quadtree 21
3. An ICQ-Tree Based Algorithm 25
3.1 Data Structure 25
3.1.1 The Quadtree 25
3.1.2 The ICQ-Tree Structure 26
3.1.3 The Inverted Code 30
3.2 Top-k Spatial Keyword Query Processing 31
3.3 Incremental Database 35
3.4 Comparison 37
4. Performance 45
4.1 The Performance Model 45
4.2 Experiments Results 47
4.2.1 Synthetic Dataset 48
4.2.2 Real Datasets 53
5. Conclusion 58
5.1 Summary 58
5.2 Future Work 59
BIBLIOGRAPHY 60
參考文獻 References
[1] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: An
Efficient and Robust Access Method for Points and Rectangles,” Proc. of the
1990 ACM SIGMOD Int. Conf. on Management of Data, Vol. 19, No. 2, pp. 322–
331, June 1990.
[2] X. Cao, G. Cong, C. S. Jensen, and B. C. Ooi, “Collective Spatial Keyword
Querying,” Proc. of the 2011 Int. Conf. on Management of Data, pp. 373–384,
2011.
[3] Y. I. Chang, C. H. Liao, and H. L. Chen, “NA-Trees: A Dynamic Index for
Spatial Data,” Journal of Information Science and Engineering, Vol. 19, No. 1,
pp. 103–139, Jan. 2003.
[4] Y.-Y. Chen, T. Suel, and A. Markowetz, “Efficient Query Processing in Geographic
Web Search Engines,” Proc. of the 2006 ACM SIGMOD Int. Conf. on
Management of data, pp. 277–288, 2006.
[5] D. W. Choi and C. W. Chung, “Nearest Neighborhood Search in Spatial
Databases,” Proc. of the 2015 IEEE 31st Int. Conf. on Data Engineering,
pp. 699–710, 2015.
[6] I. De Felipe, V. Hristidis, and N. Rishe, “Keyword Search on Spatial Databases,”
Proc. of the 2008 IEEE 24th Int. Conf. on Data Engineering, pp. 656–665, 2008.
[7] A. Dimitriou, D. Theodoratos, and T. Sellis, “Top-k-Size Keyword Search on
Tree Structured Data,” Information Systems, Vol. 47, No. 1, pp. 178–193, Jan.
2015.
[8] C. Faloutsos and S. Christodoulakis, “Signature files: An Access Method for
Documents and Its Analytical Performance Evaluation,” ACM Trans. on Information
Systems, Vol. 2, No. 4, pp. 267–288, Oct. 1984.
[9] H. Fang, , P. Zhao, , V. S. Sheng, , J. Wu, , J. Xu, , A. Liu, , and Z. Cui, “Effective
Spatial Keyword Query Processing on Road Networks,” Proc. of the 26th
Australasian Database Conf. on Databases Theory and Applications, Vol. 9093,
pp. 194–206, 2015.
[10] R. A. Finkel and J. L. Bentley, “Quad Trees A Data Structure for Retrieval on
Composite Keys,” Acta Informatica, Vol. 4, No. 1, pp. 1–9, March 1974.
[11] Y. Gao, J. Zhao, B. Zheng, and G. Chen, “Efficient Collective Spatial Keyword
Query Processing on Road Networks,” IEEE Trans. on Intelligent Transportation
Systems, Vol. 17, No. 2, pp. 469–480, Feb. 2016.
[12] A. P. Gopinath and A. Salim, “An Approach for Faster Processing of Top-k Spatial
Keyword Queries,” Proc. of the 2015 Int. Conf. on Control Communication
and Computing India, pp. 622–627, 2016.
[13] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc.
of the 1984 ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57, 1984.
[14] Z. Li, K. C. K. Lee, B. Zheng, W. C. Lee, D. Lee, and X. Wang, “IR-Tree: An
Efficient Index for Geographic Document Search,” IEEE Trans. on Knowledge
and Data Engineering, Vol. 23, No. 4, pp. 585–599, April 2011.
[15] G. Morton, “A Computer Oriented Geodetic Data Base; and a New Technique
in File Sequencing,” IBM, Ottawa, Canada, 1966.
[16] Y. Qiu, T. Ohmori, T. Shintani, and H. Fujita, “A New Algorithm for m-Closest
Keywords Query over Spatial Web with Grid Partitioning,” Proc. of the 2015
IEEE/ACIS 16th Int. Conf. on SNPD, pp. 1–8, 2015.
[17] T. K. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-tree: A Dynamic
Index for Multi-dimensional Objects,” Proc. of the 13th VLDB Conf., pp. 507–
518, 1987.
[18] S. B. Shinde and M. K. Chavan, “Fast Nearest Neighbor Search with Keyword
using Compressed Inverted Index,” Proc. of the 2016 Int. Conf. on Advanced
Communication Control and Computing Technologies, pp. 546–550, 2016.
[19] Y. Tao and C. Sheng, “Fast Nearest Neighbor Search with Keywords,” IEEE
Trans. on Knowledge and Data Engineering, Vol. 26, No. 4, pp. 878–888, April
2014.
[20] X. Wang, Y. Zhang, W. Zhang, X. Lin, and W. Wang, “AP-Tree: Efficiently
Support Continuous Spatial-Keyword Queries over Stream,” Proc. of the 2015
IEEE 31st Int. Conf. on Data Engineering, pp. 1107–1118, 2015.
[21] D. Wu, G. Cong, and C. S. Jensen, “A Framework for Efficient Spatial Web
Object Retrieval,” The VLDB Journal, Vol. 21, No. 6, pp. 797–822, Dec. 2012.
[22] C. Zhang, Y. Zhang, W. Zhang, and X. Lin, “Inverted Linear Quadtree: Efficient
Top K Spatial Keyword Search,” IEEE Trans. on Knowledge and Data
Engineering, Vol. 28, No. 7, pp. 1706–1721, July 2016.
[23] L. Zhang, X. Sun, and H. Zhuge, “Density-Based Spatial Keyword Querying,”
Future Generation Computer Systems, Vol. 32, No. 1, pp. 211–221, March 2014.
[24] R. Zhong, G. Li, K.-L. Tan, and L. Zhou, “G-Tree: An Efficient Index for KNN
Search on Road Networks,” IEEE Trans. on Knowledge and Data Engineering,
Vol. 27, No. 8, pp. 2175–2189, Aug. 2015.
[25] J. Zobel, A. Moffat, and K. Ramamohanarao, “Inverted Files Versus Signature
Files for Text Indexing,” ACM Trans. on Database Systems, Vol. 23, No. 4,
pp. 453–490, Dec. 1998.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code