Responsive image
博碩士論文 etd-0706108-230108 詳細資訊
Title page for etd-0706108-230108
論文名稱
Title
點對點系統中有效率地支援最近鄰居搜尋之可調式多維二元樹
AKDB-Tree: An Adjustable KDB-tree for Efficiently Supporting Nearest Neighbor Queries in P2P Systems
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
95
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2008-05-30
繳交日期
Date of Submission
2008-07-06
關鍵字
Keywords
空間資料、樹狀架構、P2P系統、雜湊函式、最近鄰居搜尋
Spatial Data, Hashing Function, Nearest Neighbor Query, P2P Systems, Tree Structure
統計
Statistics
本論文已被瀏覽 5703 次,被下載 1355
The thesis/dissertation has been browsed 5703 times, has been downloaded 1355 times.
中文摘要
未來,會有更多資料密集的應用,譬如P2P(點對點系統, peer to peer)拍賣網路、P2P工作搜尋網路、P2P複數玩家的遊戲,將要求能力反應更加複雜的搜尋(譬如最近鄰居搜尋涉及許多包括空間成分的資料類型)。為了能在P2P的環境裡解答空間資料最近鄰居搜尋的問題,以樹狀架構的四分樹(quadtree)為基底的結構也許是一個好選擇。但是,四分樹只將資料存放在樹葉節點。這在P2P系統中會造成每個peer增加負載不平衡和搜尋昂貴的代價。而MX-CIF四分樹可以解決此問題,它有三種特性:(1)有效地控制樹的高度,(2)資料可儲存在所有節點,(3)利用半徑原理減少最近鄰居搜尋的範圍。雖然P2P MX-CIF(MX-CIF四分樹和Chord的結合)四分樹可以有效率地做最近鄰居搜尋,但它還有一些問題,如下:(1)最近鄰居搜尋準確率低,(2)建構樹時的代價昂貴,(3)最近鄰居搜尋的搜尋代價太大,(4)容易負載不平衡。事實上,我們可以利用處理點資料(point data)的系統一樣來處理區域資料(region data),因為點資料是區域資料的一種退化的型態。因此,我們可以利用多維二元樹(KDB-tree,有名的處理點資料的演算法)來減輕負載不平衡的問題,但它和四分樹有一樣的問題(資料只存放在樹葉節點)。在這篇論文,於P2P系統中我們提出一個可調式多維二元樹(Adjustable KDB-tree)來改善這個情形。可調式多維二元樹有五種特性:(1)減輕負載不平衡的問題,(2)建構樹的代價不高,(3)資料可存放在所有節點,(4)最近鄰居搜尋準確率極高,(5)最近鄰居搜尋的搜尋代價少。另ㄧ方面,Chord是個眾所皆知的結構化(structured)P2P系統,是以雜湊函式來對資料做搜尋,來取代在大部分非架構化(unstructured)P2P系統中flooding的方式。因為,Chord是利用雜湊函式,所以系統可以很容易地處理peer進出的問題。此外,為了結合可調式多維二元樹與Chord,我們把每個在樹架構中的節點都設有ID以方便與Chord結合。這些ID可以用來區別邊節點(在二維平面上是表示邊的節點)是垂直的邊或水平的邊,以及兩個節點在二維平面上的相對位置。而且,在二維平面上,我們可以根據一個區域的ID來算出與他有關的邊的ID。因此,我們可以利用ID的特性大幅的減少最近鄰居搜尋的搜尋代價。在我們模擬的研究中,根據P2P MC-CIF四分樹的四種狀況來跟我們的方法在五種成果測試(最近鄰居搜尋的準確度和搜尋代價,建構P2P系統的搜尋代價、樹的利用度,以及在集中式系統的架構時間)下比較。從我們的模擬的結果顯示:在最近鄰居搜尋上,我們的方法比P2P MX-CIF四分樹準確、走訪的peer數更少;在系統負載上,我們的方法比較平衡;不論在集中式系統或P2P系統的建構上,我們的方法花費的時間皆比較少。
Abstract
In the future, more data intensive applications, such as P2P auction networks, P2P job--search networks, P2P multi--player games, will require the capability to respond to more complex queries such as the nearest neighbor queries involving numerous data types. For the problem of answering nearest neighbor queries (NN query) for spatial region data in the P2P environment, a quadtree-based structure probably is a good choice. However, the quadtree stores the data in the leaf nodes, resulting in the load unbalance and expensive cost of any query. The MX--CIF quadtree can solve this problem. The MX--CIF quadtree has three properties: controlling efficiently the height of the tree, reducing load unbalance, and reducing the NNquery scope with controlling the value of the radius. Although the P2P MX--CIF quadtree can do the NN query efficiently, it still has some problems as follows: low accuracy of the nearest neighbor query, the expensive cost of the tree construction, the high search cost of the NN query, and load unbalance. In fact, the index structures for the region data can also work for the point data which can be considered as the degenerated case of the region data. Therefore, the KDB--tree which is a well-known algorithm for the point data can be used to reduce load unbalance, but it has the same problem as the quadtree. The data is stored only in the leaf nodes of the KDB--tree. In this thesis, we propose an Adjustable KDB--tree (AKDB--tree) to improve this situation for the P2P system. The AKDB--tree has five properties: reducing load unbalance, low cost of the tree construction, storing the data in the internal nodes and leaf nodes, high accuracy and low search cost of the NN query. The Chord system is a well--known structured P2P system in which the data search is performed by a hash function, instead of flooding used in most of the unstructured P2P system. Since the Chord system is a hash approach, it is easy to deal with peers joining/exiting. Besides, in order to combine AKDB--tree with the Chord system, we design the IDs of the nodes in the AKDB--tree. Each node is hashed to the Chord system by the ID. The IDs can be used to differentiate the edge node in the AKDB-tree is a vertical edge or a horizontal edge and the relative position of two nodes in the 2D space. And, we can calculate the related edge of a region in the 2D space according to the ID of the region. As discussed above, we make use of the property of IDs to reduce the search cost of the NN query by a wide margin. In our simulation study, we compare our method with the P2P MX--CIF quadtree by considering five performance measures under four different situations of the P2P MX--CIF quadtree. From our simulation results, for the NN query, our AKDB-tree can provide the higher accuracy and lower search cost than the P2P MX--CIF quadtree. For the problem of load, our AKDB-tree is more balance than the P2P MX--CIF quadtree. For the time of the tree construction, our AKDB-tree needs shorter time than the P2P MX--CIF quadtree.
目次 Table of Contents
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Client–Server Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Peer–to–Peer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Unstructured P2P Systems . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structured P2P Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Tree Structures for Indexing Spatial Data . . . . . . . . . . . . . . . . 6
1.7 The P2P Systems with Tree Structures . . . . . . . . . . . . . . . . . 9
1.8 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.9 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2. A Survey of Spatial Index Structures . . . . . . . . . . . . . . . . . 16
2.1 KDB–tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 P2P R–tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 P2P MX–CIF Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . 22
3. An AKDB–Tree Approach . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Tree Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 The Nearest Neighbor Query . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Hashing Functions Between the AKDB–tree and the Chord System . 55
4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Simulation Results of the Tree Construction . . . . . . . . . . . . . . 63
4.3 Simulation Result of the Nearest Neighbor Query Strategies . . . . . 69
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
參考文獻 References
[1] K. Aberer and M. Hauswirth, “An Overview on Peer-to-Peer Information System,” Proc. of Int. Workshop on Distributed Data and Structures (WDAS), pp. 171–188, 2002.
[2] R. Bayer and E. McCreight, “Organization and Maintenance of Large Ordered Indexes,” Acta Informatica, pp. 173–189, 1972.
[3] N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD, pp. 322–331, 1990.
[4] J. L. Bentley, “Multidimensional Binary Search Trees Used for Associative Searching,” Communications of the ACM, Vol. 18, No. 9, pp. 509–517, Sept. 1975.
[5] S. Brakatsoulas, D.Pfoser, and Y. Theodoridis, “Revisiting R-tree Construction
Principles,” http://citeseer.nj.nec.com/586207.html.
[6] S. Fanning, “Napster,” http://www.napster.com/, 1999.
[7] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. of the ACM SIGMOD Int. Conf. On Management of Data, pp. 47–57, 1984.
[8] M. Handley, P. Francis, R. Karp, S. Shenker, and S. Ratnasamy, “A Scalable Content-Addressable Network,” Proc. of the ACM SIGCOMM, pp. 161–172, Aug. 2001.
[9] G. Kedem, “The Quad-CIF Tree: A Data Structure for Hierarchical On-Line Algorithms,” Proc. of the 19th Conf. on Design Automation, pp. 352–357, 1982.
[10] T. Klingberg and R. Manfredi, “The Gnutella 0.6 Protocol Draft,” http://rfcgnutella.sourceforge.net/, 2002.
[11] A. Luther, R. Buyya, R. Ranjan, and S. Venugopal, High Performance Computing: Paradigm and Infrastructure, ch. Resource Discovery in Peer-to-Peer Infrastructure, pp. 1–28. Wiley, Jan. 2005.
[12] P. Maymounkov and D. Mazieres, “Kademlia: A Peer-to-Peer Information System Based on the XOR Metric,” EDBT Workshops, pp. 53–65, 2005.
[13] A. Mondal, Y. Lifu, and M. Kitsuregawa, “P2PR-tree: An R-tree-based Spatial Index for Peer-to-Peer Environments,” Proc. of the Int. Workshop on Peer-to-Peer Computing and Databases (held in conjunction with EDBT), pp. 516–525, March 2004.
[14] J. W. M. O. Kwon and K. J. Li, “DisTIN – A Distributed Spatial Index for P2P Environment,” Proc. of Data Engineering Workshop, pp. 11–17, 2006.
[15] R. Ramakrishnan and J. Gehrke, “Database Management Systems,” McGraw-Hill, Aug. 1999.
[16] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A Scalable Content-Addressable Network,” Proc. of Int. Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 161–172, 2001.
[17] J. Risson and T. Moors, “Survey of Research Towards Robust Peer-to-Peer Networks: Search Methods,” Technical Report, University of New South Wales, pp. UNSW–EE–P2P–1–1, 2004.
[18] J. T. Robinson, “The KDB-tree: A Search Structure for Large Multidimensional Dynamic Indexes,” Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 10–18, 1981.
[19] M. Roussopoulos, M. Baker, D. Rosenthal, T. Giuli, P. Maniatis, and J. Mogul, “2 P2P or Not 2 P2P,” Proc. of the 3rd Int. Workshop on Peer-to-Peer Systems, pp. 5–15, 2004.
[20] A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems,” Proc. of IFIP/ACM Int. Conf. on Distributed Systems Platforms, pp. 329–350, 2001.
[21] H. Samet, “Neighbor Finding Techniques for Images Represented by Quadtrees,” Modern Database Systems: The Object Model, Interoperability, and Beyond, W. Kim, ed., Addison Wesley/ACM Press, Reading, MA, pp. 361–385, Jan. 1982.
[22] H. Samet and A. Rosenfeld, “Quad Tree Structures for Region Processing,” Proc. Image Understanding Workshop, Nov. 1979.
[23] C. Schmidt and M. Parashar, “Enabling Flexible Queries with Guarantees in P2p Systems,” IEEE Internet Computing, Vol. 8, No. 3, pp. 19–26, May–June 2004.
[24] T. K. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-tree: A Dynamic Index for Multi-Dimensional Objects,” Proc. of the 13th Int. Conf. on Very Large Data Bases, pp. 507–518, 1987.
[25] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana, “Internet Indirection Infrastructure,” Proc. of the 2002 Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 73–86, 2002.
[26] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. of the 9th Int. Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 149–160, 2001.
[27] J. Stribling, I. G. Councill, J. Li, M. F. Kaashoek, D. R. Karger, R. Morris, and S. Shenker, “Overcite: A Cooperative Digital Research Library,” Proc. of the 4th Int. Workshop on Peer-To-Peer Systems, pp. 5–15, 2005.
[28] E. Tanin, A. Harwood, and H. Samet, “Using A Distributed Quadtree Index in Peer-to-Peer Networks,” The Int. Journal on Very Large Data Bases, Vol. 16, No. 2, pp. 165–178, April 2007.
[29] B. Yang and H. Garcia-Molina, “Improving Search in Peer-to-Peer Networks,” Proc. of the 22nd IEEE Int. Conf. on Distributed Computing Systems, pp. 5–15, 2002.
[30] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. Kubiatowicz, “Tapestry: A Global-Scale Overlay for Rapid Service Deployment,” IEEE Journal on Selected Areas in Communications, Vol. 22, No. 1, pp. 41–53, Jan. 2004.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code