Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/56889
|
Title: | 在高度分散式環境下對高維度資料建立索引 Indexing high-dimensional data in highly distributed environments |
Authors: | 黃齡葦 Huang, Ling Wei |
Contributors: | 陳良弼 Chen , Arbee L.P. 黃齡葦 Huang, Ling Wei |
Keywords: | 索引 KNN查詢 高度分散式環境 參考點 Index P2P Distributed Enviroments Reference Point |
Date: | 2012 |
Issue Date: | 2013-02-01 16:53:38 (UTC+8) |
Abstract: | 目前,隨著資料急速地增加,大規模可擴充性的高度分散式資料庫服務已逐漸成為一種趨勢。在資料如此分散的環境下,如何讓資料的查詢更有效率,建立一個好的索引扮演著相當重要的角色,加上越來越多的資料庫程式應用像是生物、圖像、音樂和視訊等等,皆是處理高維度的資料,而在這些應用程式中,經常需要做相似資料的查詢,但是在高維度的資料且分散式的資料做相似資料的查詢,需耗費大量的時間與運算成本。
基於在高度分散式的環境下,針對高維度的資料有效地做KNN的查詢。我們提出一個利用reference point[2,13]的作法RP-CAN( Reference Point-Content Addressable Network )來改善查詢的效率。RP-CAN 主要是結合CAN [14] 的路由協定和使用reference point建立索引的方式來幫助在高度分散式環境下有效率的對高維的資料做查詢處理。
最後會實作出我們所提出的RP-CAN索引並與RT-CAN[1]做比較。我們發現我們所提出的RP-CAN索引在高維度資料作KNN的查詢時比RT-CAN索引來的有效率。 There has been an increasing interest in deploying a storage system in a highly distributed environment because of the rapid increasing data. And many database applications such as time series, biological and multimedia database, handle high-dimensional data. In these systems, k nearest-neighbors query is one of the most frequent queries but costly operation that is to find objects in the high-dimensional database that are similar to a given query object. As in conventional DBMS, indexes can indeed improve query performance but cannot deploy directly in highly distributed systems because the environment has become more complex. To efficiently support k nearest-neighbors query, a high-dimensional indexing strategy, is developed for the highly distributed environment.
In this paper, we propose an efficient indexing strategy, RP-CAN( Reference Point-Content Addressable Network ), to improve the performance of the k nearest-neighbors query in a highly distributed environment. In the end of this paper, we designed an experiment to demonstrate that the performance of RP-CAN is better than RT-CAN in high dimensional space. Thus, our RP-CAN index could efficiently handle the high dimensional data. |
Reference: | [1] Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi: Indexing multi-dimensional data in a cloud system. SIGMOD Conference 2010: 591-602
[2] H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2): 364-397 (2005)
[3] Cui Yu, Beng Chin Ooi, Kian-Lee Tan, H. V. Jagadish: Indexing the Distance: An Efficient Method to KNN Processing. VLDB 2001: 421-430
[4] Nikolaos Kouiroukidis, Georgios Evangelidis: The Effects of Dimensionality Curse in High Dimensional kNN Search. Panhellenic Conference on Informatics 2011: 41-45
[5] Hoang Tam Vo, Chun Chen, Beng Chin Ooi: Towards Elastic Transactional Cloud Storage with Range Query Support. PVLDB 3(1): 506-517 (2010)
[6] Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun-Lung Wu: Efficient B-tree Based Indexing for Cloud Data Processing. PVLDB 3(1): 1207-1218 (2010)
[7] H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005.
[8] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001.
[9] I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001.
[10] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John Kubiatowicz: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 2004: 41-53
[11] http://www.napster.com/.
[12] http://www.darkridge.com/~jpr5/doc/gnutella.html.
[13] https://freenetproject.org/.
[14] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard M. Karp, Scott Shenker: A scalable content-addressable network. SIGCOMM 2001: 161-172
[15] Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou: C2: A New Overlay Network Based on CAN and Chord. GCC (1) 2003: 42-50
[16] H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu, Rong Zhang, Aoying Zhou: VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes. ICDE 2006: 34
[17] Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100
[18] Paolo Ciaccia, A. Nanni, Marco Patella: A Query-sensitive Cost Model for Similarity Queries with M-tree. Australasian Database Conference 1999: 65-76
[19] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful? In: Beeri C, Buneman P, eds. Proc. of the 7th Int`l Conf. on Database Theory. LNCS 1540, Jerusalem: Springer-Verlag, 1999. 217-235.
[20] Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data VLDB 1996: 28-39
[21] Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447.
[22] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57
[23] J. B. MacQueen: Some Methods for classification and Analysis of Multivariate Observations. Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability.1967: 281-297.
[24] Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238
[25] Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749
[26] David Novak , Pavel Zezula, M-Chord: a scalable distributed similarity search structure, Proceedings of the 1st international conference on Scalable information systems, 2006, p.19-es
[27] Li M, Lee W-C, Sivasubramaniam A, J. Zhou. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Frontiers of Computer Science in China, 2008, 2(3):p. 234-247 .
[28] Tang, Y. Xu, J. Zhou, S. Lee, W. Deng, D. Wang, Y. A Lightweight Multi-Dimensional Index for Complex Queries Over DHTs. IEEE Transactions on Parallel and Distributed Systems,2011, 22: p. 2046-2054 |
Description: | 碩士 國立政治大學 資訊科學學系 99753036 101 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0099753036 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 334 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|