政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/56330
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113648/144635 (79%)
造访人次 : 51667247      在线人数 : 524
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/56330


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/56330


    题名: 在高度分散式環境下進行Top-k相似文件檢索
    Similar Top-k documents retrieval in highly distributed environments
    作者: 王俊閎
    Wang, Chun Hung
    贡献者: 陳良弼
    Chen, Arbee L.P.
    王俊閎
    Wang, Chun Hung
    关键词: 分散式環境
    Tok-k 相似文件檢索
    端對端網路
    distributed environments
    similar top-k documents retrieval
    peer-to-peer network
    日期: 2012
    上传时间: 2012-12-03 11:27:23 (UTC+8)
    摘要: 在文件資料庫的查詢處理上,Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中,檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度,選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫,因其覆蓋性和可擴充性的不足,使得這種排名傾向的文件查詢處理,需耗費大量時間及運算成本。近年來,使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢,但在高度分散式環境下,支援排名傾向的相似文件查詢是困難的,因為缺乏全域資訊和適當的系統協調者。
    在本研究中,我們先針對各節點資料庫作分群前處理,並提出一個利用區域切割的作法[1],將P2P環境劃分成數個子區塊後,建立特徵索引表。因此在查詢處理時,可透過索引表加快挑選出Top-k相似群集的速度,並且確保有適當數量的回傳結果。最後在實驗中,我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較,在高度分散式環境下,我們的方法在執行Top-k相似文件查詢時,會比上述兩種作法有較為優異的表現。
    On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty.
    In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment.
    參考文獻: [1] Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis. 2008. Peer-to-peer similarity search over widely distributed document collections. LSDS-IR 35-42.
    [2] Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H. 2001. Chord : A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM 149-160.
    [3] Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S. 2001. A scalable contentaddressable network. In Proceedings of the ACM SIGCOMM 161-172.
    [4] Rowstron, A., Druschel, P. 2001. Pastry : Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of the Middleware
    [5] Chunqiang Tang, Zhichen Xu, Sandhya Dwarkadas. 2003. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proceedings of the ACM SIGCOMM 175-186.
    [6] BitTorrent. <http://bittorrent.com/>.
    [7] eMula. <http://www.emula-project.net/>.
    [8] Beverly Yang, Hector Garcia-Molina. 2003. Designing a Super-Peer Network. ICDE 49-60.
    [9] The Gnutella protocol specification v0.6. <http://rfcgnutella.sourceforge.net>.
    [10] KaZaA. <http://www.kazaa.com>.
    [11] Salton, G., Wong, A., Yang, C.S. 1975. A vector space model for automatic indexing. Communications of the ACM Volume 18 Issue 11 613-620.
    [12] Bernard J. Jansen, Soumen Chakrabarti. 2006. Mining the Web : Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, 352 pp., ISBN: 1-55860-754-4. Inf. Process. Manage. (IPM) 42(1) 317-318.
    [13] Christos. Doulkeridis, Kjetil Nørvåg, and Michalis Vazirgiannis. 2007. DESENT: Decentralized and distributed semantic overlay generation in P2P networks. IEEE Journal on Selected Areas in Communications (J-SAC) 25(1) 25–34.
    [14] Hersh, W.R., Buckley, C., J.Leone, T., Hickam, D.H. 1994. Ohsumed: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the ACM SIGIR. 192–201
    [15] GT-ITM : Georgia Tech Internetwork Topology Models. <http://www.cc.gatech.edu/projects/gtitm/>.
    [16] Wolf-Tilo Balke, Wolfgang Nejdl, Wolf Siberski, Uwe Thaden. 2005. Progressive Distributed Top k Retrieval in Peer-to-Peer Networks. ICDE 174-185.
    [17] Wolf-Tilo Balke. 2005. Supporting Information Retrieval in Peer-to-Peer Systems. Peer-to-Peer Systems and Applications 337-352.
    [18] C. Gkantsidis, M. Mihail, and A. Saberi. 2005. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of INFOCOM.
    [19] Inderjit S. Dhillon, Dharmendra S. Modha. 2001. Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning 42(1/2): 143-175.
    [20] Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis. 2008. On efficient top-k query processing in highly distributed environments. SIGMOD 753-764.
    [21] Shiwei Zhu, Junjie Wu, Hui Xiong, Guoping Xia. 2011. Scaling up top-K cosine similarity search. Data Knowl. Eng. (DKE) 70(1) 60-83.
    [22] Aoying Zhou, Rong Zhang, Weining Qian, Quang Hieu Vu, Tianming Hu. 2008. Adaptive indexing for content-based search in P2P systems. Data Knowl. Eng. (DKE) 67(3) 381-398.
    描述: 碩士
    國立政治大學
    資訊科學學系
    99753034
    101
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0099753034
    数据类型: thesis
    显示于类别:[資訊科學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    303401.pdf1496KbAdobe PDF2805检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈