政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/56330

政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/56330

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文笔数/总笔数 : 115256/146303 (79%)
造访人次 : 54538723 在线人数 : 7

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜寻范围

查询小技巧：

您可在西文检索词汇前后加上"双引号"，以获取较精准的检索结果

若欲以作者姓名搜寻，建议至进阶搜寻限定作者字段，可获得较完整数据

进阶搜寻

主页 ‧ 登入 ‧ 上传 ‧ 说明 ‧ 关于政大典藏 ‧ 管理

到手机版

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/56330

请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/56330

题名:	在高度分散式環境下進行Top-k相似文件檢索 Similar Top-k documents retrieval in highly distributed environments
作者:	王俊閎 Wang, Chun Hung
贡献者:	陳良弼 Chen, Arbee L.P. 王俊閎 Wang, Chun Hung
关键词:	分散式環境 Tok-k 相似文件檢索端對端網路 distributed environments similar top-k documents retrieval peer-to-peer network
日期:	2012
上传时间:	2012-12-03 11:27:23 (UTC+8)
摘要:	在文件資料庫的查詢處理上，Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中，檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度，選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫，因其覆蓋性和可擴充性的不足，使得這種排名傾向的文件查詢處理，需耗費大量時間及運算成本。近年來，使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢，但在高度分散式環境下，支援排名傾向的相似文件查詢是困難的，因為缺乏全域資訊和適當的系統協調者。在本研究中，我們先針對各節點資料庫作分群前處理，並提出一個利用區域切割的作法[1]，將P2P環境劃分成數個子區塊後，建立特徵索引表。因此在查詢處理時，可透過索引表加快挑選出Top-k相似群集的速度，並且確保有適當數量的回傳結果。最後在實驗中，我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較，在高度分散式環境下，我們的方法在執行Top-k相似文件查詢時，會比上述兩種作法有較為優異的表現。 On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty. In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment.
參考文獻:	[1] Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis. 2008. Peer-to-peer similarity search over widely distributed document collections. LSDS-IR 35-42. [2] Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H. 2001. Chord : A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM 149-160. [3] Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S. 2001. A scalable contentaddressable network. In Proceedings of the ACM SIGCOMM 161-172. [4] Rowstron, A., Druschel, P. 2001. Pastry : Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of the Middleware [5] Chunqiang Tang, Zhichen Xu, Sandhya Dwarkadas. 2003. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proceedings of the ACM SIGCOMM 175-186. [6] BitTorrent. <http://bittorrent.com/>. [7] eMula. <http://www.emula-project.net/>. [8] Beverly Yang, Hector Garcia-Molina. 2003. Designing a Super-Peer Network. ICDE 49-60. [9] The Gnutella protocol specification v0.6. <http://rfcgnutella.sourceforge.net>. [10] KaZaA. <http://www.kazaa.com>. [11] Salton, G., Wong, A., Yang, C.S. 1975. A vector space model for automatic indexing. Communications of the ACM Volume 18 Issue 11 613-620. [12] Bernard J. Jansen, Soumen Chakrabarti. 2006. Mining the Web : Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, 352 pp., ISBN: 1-55860-754-4. Inf. Process. Manage. (IPM) 42(1) 317-318. [13] Christos. Doulkeridis, Kjetil Nørvåg, and Michalis Vazirgiannis. 2007. DESENT: Decentralized and distributed semantic overlay generation in P2P networks. IEEE Journal on Selected Areas in Communications (J-SAC) 25(1) 25–34. [14] Hersh, W.R., Buckley, C., J.Leone, T., Hickam, D.H. 1994. Ohsumed: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the ACM SIGIR. 192–201 [15] GT-ITM : Georgia Tech Internetwork Topology Models. <http://www.cc.gatech.edu/projects/gtitm/>. [16] Wolf-Tilo Balke, Wolfgang Nejdl, Wolf Siberski, Uwe Thaden. 2005. Progressive Distributed Top k Retrieval in Peer-to-Peer Networks. ICDE 174-185. [17] Wolf-Tilo Balke. 2005. Supporting Information Retrieval in Peer-to-Peer Systems. Peer-to-Peer Systems and Applications 337-352. [18] C. Gkantsidis, M. Mihail, and A. Saberi. 2005. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of INFOCOM. [19] Inderjit S. Dhillon, Dharmendra S. Modha. 2001. Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning 42(1/2): 143-175. [20] Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis. 2008. On efficient top-k query processing in highly distributed environments. SIGMOD 753-764. [21] Shiwei Zhu, Junjie Wu, Hui Xiong, Guoping Xia. 2011. Scaling up top-K cosine similarity search. Data Knowl. Eng. (DKE) 70(1) 60-83. [22] Aoying Zhou, Rong Zhang, Weining Qian, Quang Hieu Vu, Tianming Hu. 2008. Adaptive indexing for content-based search in P2P systems. Data Knowl. Eng. (DKE) 67(3) 381-398.
描述:	碩士國立政治大學資訊科學學系 99753034 101
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0099753034
数据类型:	thesis
显示于类别:	[資訊科學系] 學位論文

文件中的档案:

档案	大小	格式	浏览次数
303401.pdf	1496Kb	Adobe PDF2	866	检视/开启

在政大典藏中所有的数据项都受到原著作权保护.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回馈