Responsive image
博碩士論文 etd-0710102-104442 詳細資訊
Title page for etd-0710102-104442
論文名稱
Title
尋找最近鄰居方法之設計與分析
Design and Analysis of Nearest Neighbor Search Strategies
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
105
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2002-06-21
繳交日期
Date of Submission
2002-07-10
關鍵字
Keywords
四方區域樹、空間曲線、空間查詢、九方區域樹、最近鄰居
nearest neighbor, NA-tree, spatial query, quadtree, space-filling curve
統計
Statistics
本論文已被瀏覽 5693 次,被下載 1659
The thesis/dissertation has been browsed 5693 times, has been downloaded 1659 times.
中文摘要
隨著無線通訊及科技的快速進步,面對空間資料庫 (Spatial Database) 資料龐大且複雜的特性,必須要有一套有效率的演算法來回應查詢的問題。空間資料 (spatial data) ,包括二維以上的資料,可視為三部分:(1)圖形資料,(2)座標資料,(3)空間關係。在此篇論文中,我們僅考慮圖形資料中的點資料。在空間資料庫中,“尋找最近鄰居”的查詢即是目前最重要且熱門的空間查詢,例如:“開車路途中,離自己最近的加油站在哪裡?”而且在最近幾年,許多的研究都在於為“找尋最近鄰居的問題”謀得一個有效率的方法,以節省空間查詢的時間。找尋最近鄰居的問題即是在龐大的資料集中,搜尋離查詢點最近距離的一個點,其相關的應用在於網際地理資訊系統 (GIS) ,例如:電子地圖的應用。在一個二維平面上,區域 B 可以說是區域 A 的鄰居的條件是鄰近 A 且與 A 有相同的性質,例如相同的大小。Jozef Voros 在其研究中提出,在以 Quadtree 結構表示的圖形資料中,尋找某一查詢點位於區域 A 的四個相同大小鄰居區域(包括東、西、南、北的方向)的方法。然而以Voros 的方法為基礎,在對角線上的鄰居區域(包括東南、東北、西南、西北的方向)卻被忽略了,如此一來,離查詢點最近的點的正確性就會有些出入,因為最近點可能出現在對角線上的鄰居區域內。另外,當我們想將高維的空間資料轉換成一維資料時,最主要的困難在於無法用一個規則去規劃空間資料彼此的順序及距離。為了處理維度轉換的順序問題,相關的研究即是 Space-Filling Curves,這些曲線經過在空間中的每一筆資料,並以一對一的對應關係將高維的資料轉換成一維的號碼順序。Orenstein 提出的 Peano曲線,將二維的空間資料以穿插X與Y座標相對應二進位位元的方式轉換成一維的點資料。我們發現這個穿插位元的性質可以簡單地應用在找尋最近鄰居的問題上,因為可以由轉換後的一維資料去分析資料在空間中的位置。然而如果資料是以 RBG 曲線或 Hilbert 曲線串聯起來的,則因為曲線有經過旋轉或反射的特性,使得在這些曲線找尋鄰居的過程中,變得很複雜。RBG 曲線在範圍查詢中,能減少隨機存取次數。而 Hilbert 曲線在範圍查詢中,能達到存取的叢集數 (clusters) 最少。所以在此篇論文研究中,我們會先舉出 Voros的方法中忽略的情況,並設計方法來解決。接著,我們會提出如何以Peano曲線的方法來協助尋找最近鄰居的演算法。當資料以其他曲線表示時,我們歸納出在Peano 曲線及 RBG 曲線,在Peano 曲線及 Hilbert 曲線之間轉換的規則,以協助我們正確且快速地找到離查詢點最近的點資料。最後我們改進NA-Trees,它是一個龐大且動態的索引,可以處理直接對應查詢 (exact match query) 及範圍查詢 (range query)。所謂龐大,指的是絕大部分的索引資料都存在輔助記憶體中。所謂動態,指的是索引會隨著查詢而對樹結構作新增或刪除的動作,即索引不是事先建立好的。直接對應查詢是在空間資料庫中搜尋一筆符合查詢條件的資料,而範圍查詢則是在一個特定的範圍內,搜尋所有符合查詢條件的資料。

Abstract
With the proliferation of wireless communications and rapid advances in technologies, algorithms for efficiently answering queries about large number of spatial data are needed. Spatial data consists of spatial objects including data of higher dimension. Neighbor finding is one of the most important spatial operations in the field of spatial data structures. In recent years, many
researchers have focused on finding efficient solutions to the nearest neighbor problem (NN) which involves determining the point in a data set that is the nearest to a given query point. It
is frequently used in Geographical Information Systems (GIS). A block B is said to be the neighbor of another block A, if block B has the same property as block A has and covers an
equal-sized neighbor of block A. Jozef Voros has proposed a neighbor finding strategy on images represented by quadtrees, in which the four equal-sized neighbors (the east, west, north, and south directions) of block A can be found. However, based on Voros's strategy, the case that the nearest neighbor occurs in the diagonal directions (the northeast, northwest, southeast, and southwest directions) will be ignored. Moreover, there is no total ordering that preserve proximity when mapping a spatial data from a higher dimensional space to a 1D-space. One way of effecting such a mapping is to utilize
space-filling curves. Space-filling curves pass through every point in a space and give a one-one correspondence between the coordinate and the 1D-sequence number of the point. The Peano curve, proposed by Orenstein, which maps the 1D-coordinate of a point by simply interleaving the bits of the X and Y coordinates in the 2D-space, can be easily used in neighbor finding. But with the data ordered by the RBG curve or the Hilbert curve, the neighbor finding would be complex.
The RBG curve achieves savings in random accesses on the disk for range queries and the Hilbert curve achieves the best clustering for range queries. Therefore, in this thesis, we first show the missing case in the Voros's strategy and show the ways to find it. Next, we show that the Peano curve is the best mapping function used in the nearest neighbor finding. We also show the
transformation rules between the Peano curve and the other curves such that we can efficiently find the nearest neighbor, when the data is linearly ordered by the other curves. From our simulation, we show that our proposed two strategies can work correctly and faster than the conventional strategies in nearest neighbor finding. Finally, we present a revised version of NA-Trees, which can work for exact match queries and range queries from a large, dynamic index, where an exact match query means finding the specific data object in a spatial database and a range query means reporting all data objects which are located in a specific range. By large, we mean that most of the index must be stored in secondary memory. By dynamic, we mean that insertions and deletions are intermixed with queries, so that the index cannot be built beforehand.

目次 Table of Contents
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
1. Introduction
1.1 Spatial Databases
1.2 Query Types
1.3 The Nearest Neighbor
1.4 Motivations
2. Survey
2.1 Based on Quadtrees
2.1.1 Definition
2.1.2 Equal Sized Neighbors
2.2 Based on Space Filling Curves
2.2.1 Properties
2.2.2 Measures
2.3 Based on R-trees
3. A Note on Nearest Neighbor Finding in Images Represented by Quadtrees
3.1 A Missing Case
3.2 The Improved Version of Neighbor Finding
3.3 Examples
4. Neighbor Finding Based on Space Filling curves
4.1 Space Filling Curves
4.1.1 Region Code vs. Bit Shuffling
4.2 Nearest Neighbor Finding Based on the Peano Curve
4.3 Transformation Rules
4.3.1 Transformation Between the Peano Curve and the RBG Curve
4.3.2 Transformation Between the Peano Curve and the Hilbert Curve
5. The Revised Version of the NA-tree
5.1 Problems of NA-Trees
5.2 Data Structure
5.3 The Insertion Algorithm
5.4 The Deletion Algorithm
5.5 Exact Match and Range Queries
6. Performance Study
6.1 Simulation Study of the Nearest Neighbor Finding Strategies
6.2 Simulation Study of the NA-Trees
7. Conclusion
7.1 Summary
7.2 Future Work
BIBLIOGRAPHY
A. The Old Version of NA-Trees
A.1 The Bucket Numbering Schemes
A.2 Data Structure
A.3 Algorithms
B. Source code of for the Space-Filling Curves
參考文獻 References
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger,
"The R*-tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 322-331, 1990.

Thomas Brinkhoff, Habns-Peter Kriegel and Bernhard Seeger,
"Multi-Step Processing of Spatial Joins,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 197-208, 1994.

Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel,
"The X-Tree: An Index Structure for High-Dimensional Data,"
Proc. of the 22nd VLDB Conf., pp. 28-39, 1996.


Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel, and Thomas Seidl ,
"Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space,"
Trans. on Knowledge and Data Engineering, Vol. 12, No. 1, pp. 45-57, Jan./Feb. 2000.

Rimantas Benetis, Christian S. Jensen, Gytis Karciauskas, and Simonas Saltenis,
"Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects,"
http://www.cs.auc.dk/tbp/Teaching/DAT5E01/benetis.pdf, pp. 1-18,
2001.

Alberto Belussi, Elisa Bertino, and Barbara Catania,
"Using Spatial Data Access Structures for Filtering Nearest Neighbor Queries," Data and Knowledge Engineering, Vol. 40, No. 1, pp. 1-31, 2002.

Kuo-Liang Chung, Jung-Gen Wu, and Jer-Kuang Lan,
"Efficient Search Algorithm on Compact S-trees,"
Pattern Recognitions Letters, Vol.18, No. 14, pp. 1427-1434, Dec. 1997.

King Lum Cheung and Ada Wai-chee Fu,
"Enhanced Nearest Neighbor Search on the R-tree,"
ACM SIGMOD Record, Vol. 27, No. 3, pp. 16-21, Sept. 1998.

Jeang-Kuo Chen and Yeh-Hao Chin,
"An Efficient Algorithm for Searching Nearest Objects in Spatial Database," Proc. of National Computer Sys., pp.38-44, 1999.

Ye-In Chang and Cheng-Huang Liao,
"NA-Trees: A Nine Areas Tree for Efficient Data Access in Spatial Database Systems," Proc. of National Computer Symposium, pp.~108--115, 1999.

Antonio Corral, Yannis Manolopoulos, Yannis Theodoridis, and Michael Vassilakopoulos, "Closest Pair Queries in Spatial Databases,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp.~189--200, 2000.

Ye-In Chang, Cheng-Huang Liao, and Hue-ling Chen,
"NA-Trees: A Nine Areas Tree for Efficient Data Access in Spatial Database Systems,"
Journal of Information Science and Eng.,
Vol. 18, No. 3, pp. 108-115, 2002.

Christos Faloutsos,
"Gray Codes for Partial Match and Range Queries,"
IEEE Trans. on Software Eng., Vol. 14, No. 10, pp. 1381-1393, Aug. 1988.

Christos Faloutsos and Shari Roseman,
"Fractals for Secondary Key Retrieval,"
ACM SIGACT-SIGMOD-SIGART Symposium on PODS, pp. 247-252, 1989.

Irene Gargantini,
"An Effective Way to Represent Quadtrees," Comm. of ACM,
Vol. 12, No. 25, pp. 905-910, Dec. 1982.

Autonin Guttman,
"R-trees: A Dynamic Index Structure for Spatial Searching,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data,
pp. 47-57, 1984.

Diane Greene,
"An Implementation and Performance Analysis of Spatial Data Access,"
Proc. of IEEE Data Engineering, pp. 606-615, 1989.

Ralf Hartmut Guting,
"An Introduction to Spatial Database Systems,"
Specail Issue on Spatial Database Systems of VLDB Journal,
Vol. 3, No. 4, pp. 1-32, Oct. 1994.

Volker Gaede and Oliver Gunther,
"Multidimensional Access Methods,"
ACM Computing Surveys, Vol. 30. No. 2, pp. 123-169, 1998.

Andreas Hutflesz, Hans-Werner Six, and Peter Widmayer,
"The R-File: An Efficient Access Structure for Proximity Queries,"
Proc. of IEEE Int. Conf. on Data Eng., pp. 372-379, 1990.


H. V. Jagadish,
"Linear Clustering of Objects with Multiple Attributes,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data,
pp. 332-342, 1990.

Ibrahim Kamel and Christos Faloutsos,
"Parallel R-Trees,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 195-204, 1992.

Ibrahim Kamel and Christos Faloutsos,
"Hilbert R-tree: an Improved R-tree Using Fractals,"
Proc. of 21th Int. Conf. on VLDB}, pp. 500-509, 1994.

Akhil Kumar,
"G-Tree: A New Data Structure for Organizing Multidimensional Data,"
IEEE Trans. on Knowledge and Data Eng., Vol. 6, No. 2, pp. 341-347, April 1994.

George Kollios, Dimitrios Gunopulos, and Vassilis J.~Tsotras,
"Nearest Neighbor Queries in a Mobile Environment,"
Proc. of Int. Workshop on Spatio-Temporal Database Management, pp. 119-134, 1999.

Ki-Joune Li and Robert Laurini,
"The Spatial Locality and a Spatial Indexing Method by Dynamic Clustering in Hypermap Systems,"
Proc. of IEEE Int. Conf. on Data Eng., pp. 207-223, 1992.

Scott T. Leutenegger and Mario A. Lopez,
"The Effect of Buffering on the Performance of R-Trees,"
Proc. of the 14th Int. Conf. on Data Engineering, pp. 164-171, 1998.

Dong-Ho Lee and Hyoung-Joo Kim,
"SPY-TEC: An Efficient Indexing Method for Similarity Search in High Dimensional Data Spaces,"
Data and Knowledge Engineering, Vol. 34, No. 1, pp. 77-97,
2000.

Jonathan K. Lawder and Peter J. H. King,
"Querying Multi-dimensional Data Indexed Using the Hilbert Space-Filling Curve," ACM SIGMOD Record, Vol. 30, No. 1, pp. 19-24, Mar. 2001.

BongKi Moon, H. V. Jagadish, Christos Faloutsos, and Joel H. Saltz,
"Analysis of the Clustering Properties of the Hilbert Space-Filling Curve,
IEEE Trans. on Knowledge and Data Eng., Vol. 13, No. 1, pp. 124-141, Jan. 2001.

Yasuaki Nakamura, Shigeru Abe, Yutaka Ohsawa, and Masao Sakauchi,
"A Balanced Hierarchical Data Structure for Multidimensional Data
with Highly Efficient Dynamic Characteristics,"
IEEE Trans. on Knowledge and Data Eng., Vol. 5, No. 4, pp. 682-694, Aug.
1993.

Jack A. Orenstein and T. H. Merrett,
"A Class of Data Structures for Associative Searching,
Proc. Symp. on PODS, pp. 181-190, 1984.

Jack A. Orenstein,
"Spatial Query Processing in an Object-Oriented Database System,"
Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 326-336, 1986.

Yutaka Ohsawa and Masao Sakauchi,
"A New Tree Type Data Structure with Homogeneous Nodes Suitable for a Very Large Spatial Database,"
Proc. of IEEE Int. Conf. on Data Eng., pp. 296-303, 1990.

Dimitris Papadias, Yannis Theodoridis, Timos K. Sellis, and Max J. Egenhofer,
"Topological Relations in the World of Minimum Bounding Rectangles: a Study with R-trees,"
Prof. of ACM SIGMOD Int. Conf. on Management of Data, pp. 92-103, 1995.

E. M. Remgold, J Nievergelt, and N Deo,
"Combinatorial Algorithms: Theory and Practice,"
Prentice-Hall Inc, Englewood Chiffs, New Jersey, 1977.

Raghu Ramakrishnan and Johannes Gehrke,
"Database Management Systems," McGraw-Hill, Aug. 1999.

Nick Roussopoulos, Stephen Kelley, and Fredeic Vincent,
"Nearest Neighbor Queries,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 71-79, 1995.

Nick Roussopoulos and Daniel Leifker,
"Direct Spatial Search on Pictorial Databases Using Packed R-trees,"
Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 17-31, 1985.

Hanan Samet ,
"Neighbor Finding Techniques for Images Represented by Quadtrees,
Comput. Graphics Image Process, Vol. 18, No. 1, pp. 37-57, Jan. 1982.

Hanan Samet ,
"Design and Analysis of Spatial Data Structures,"
Addison-Wesley, Reading Mass., 1990.

Hanan Samet,
"Spatial Data Structure,"
Modern Database Systems: The Object Model, Interoperability, and Beyond, W. Kim, ed., Addison Wesley/ACM Press, Reading, MA, pp. 361-385, 1994.

Gunther Schback,
"Finding Neighbors of Equal Size in Linear Quadtrees and Octrees
in Constant Time,"
CVGIP: Image Understanding, Vol. 35, No. 3, pp. 221-230, 1992.

Bernhard Seeger and Hans-Peter Kriegel,
"The Buddy-tree: An Efficient and Robust Access Method for
Spatial Data Base Systems,"
Proc. of the 16th VLDB Conf. Brisbane, pp. 590-601, 1990.

Timos K. Sellis, Nick Roussopoulos and Christos Faloutsos,
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects,"
Proc. of the 13th VLDB Conf., pp. 507-518, 1987.

Shashi Shekhar, Sanjay Chawla, Siva Ravada, Andrew Fetterer, Xuan Liu, and Chang-Tien Lu,
"Spatial Databases - Accomplishments and Research Needs,"
IEEE Trans. on Knowledge And Data Engineering, Vol. 11, No. 1, pp. 45-55,
Jan. 1999.

Ioana Stanoi, Divyakant Agrawal, Amr El Abbadi,
"Reverse Nearest Neighbor Queries for Dynamic Databases,"
Proc. of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 45-53, 2000.

C. D. Tung, W. C. Hou, and J. H. Chu,
"Multi-Priority Tree: An Index Structure for Spatial Data,"
Proc. of Int. Computer Symposium, pp. 1285-1290, 1994.

Yannis Theodoridis and Timos K. Sellis,
"A Model for the Prediction of R-tree Performance,"
Proc. of the 15th ACM Sympos. on Principles of Database Systems, pp. 161-171, 1996.

Kian-Lee Tan, Beng Chin Ooi, and Lay Foo Thiang, "Indexing Shapes in Image Databases Using the Centroid-Radii Model,"
Data and Knowledge Engineering, Vol. 32, No. 3, pp. 271-289, 2000.

Yannis Theodoridis, Emmanuel Stefanakis and Timos K. Sellis,
"Efficient Cost Model for Spatial Queries Using R-Trees,"
IEEE Trans. on Knowledge and Data Engineering, Vol. 12, No. 1,
pp.19-32, 2000.

Jozef Voros,
"A Strategy for Repetitive Neighbor Finding in Images Represented by Quadtrees,"
Pattern Recognition Letters, Vol. 18, No. 10, pp. 955-962, Oct. 1997.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code