政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/99417
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 114105/145137 (79%)
Visitors : 52142242      Online Users : 536
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/99417


    Title: 設計與實作一個針對遊戲論壇的中文文章整合系統
    Design and Implementation of a Chinese Document Integration System for Game Forums
    Authors: 黃重鈞
    Huang, Chung Chun
    Contributors: 徐國偉
    Hsu, Kuo Wei
    黃重鈞
    Huang, Chung Chun
    Keywords: 中文遊戲論壇文件摘要
    關鍵字擷取
    K-Means分群
    Chinese game forum summary
    keyword selection
    K-means clustering
    Date: 2016
    Issue Date: 2016-07-21 10:02:19 (UTC+8)
    Abstract: 現今網路發達便利,人們資訊交換的方式更多元,取得資訊的方式,不再僅是透過新聞,透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增,就算透過搜尋引擎,使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例,期望可以為使用者提供一個全面且精要的遊戲角色描述,讓使用者至少對該角色有大概的認知。

    本研究參考網路論壇探勘及新聞文件摘要系統,設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性,實驗如何從論壇挖掘出潛藏的資訊,並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段:1. 資料前處理:論壇文章與新聞文章不同,很難直接將名詞、動詞作為關鍵字,因此使用TF-IDF篩選出論壇文章中有代表性的詞彙,作為句子的向量空間維度。2. 分群:使用K-Means分群法分辨哪些句子是比較相似的,並將相似的句子分在同一群。 3. 句子挑選:根據句子的分群結果,依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。

    我們發現實驗分析過程中可以看到一些有用的相關資訊,在論文的最後提出可能的改善方法,期望未來可以開發更好的論壇文章分類方式。
    With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum.

    We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF.

    We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future.
    Reference: [1] 劉千里 and 古永嘉, “網頁探勘技術應用於論壇用戶文章-以mobile01電影版為例,” 國立臺北大學, 2010.
    [2] 陳光華, “資訊的組織與擷取,” 台灣大學圖書館學刊, vol. 12, pp. 127–142, 1997.
    [3] J. Tan, D. Yang, W. C. Chen, C. Liao, and S. Chien, “網路論壇之知識搜尋 Knowledge Searching over BBS,” 電腦學刊, vol. 第十七卷第三期, no. September, 2006.
    [4] R. J. Brachman, T. Khabaza, W. Kloesgen, G. Piatetsky-Shapiro, and E. Simoudis, “Mining business databases,” Commun. ACM, vol. 39, pp. 42–48, 1996.
    [5] 趙銘 and 林俊博, “遊戲論壇搜尋引擎之設計 A Design of Game Forum Search Engine,” 逢甲大學, 2003.
    [6] F. F. Gey, H.-M. Chen, B. Norgard, M. Buckland, Y. Kim, A. Chen, B. Lam, J. Purat, and R. Larson, “Advanced search technologies for unfamiliar metadata,” System, 2001.
    [7] N. J. Belkin, “Helping people find what they don’t know,” Commun. ACM, vol. 43, pp. 58–61, 2000.
    [8] C. H. Chang and C. C. Hsu, “Enabling concept-based relevance feedback for information retrieval on the WWW,” IEEE Trans. Knowl. Data Eng., vol. 11, pp. 595–609, 1999.
    [9] M. Kobayashi and K. Takeda, “Information retrieval on the web,” ACM Comput. Surv., vol. 32, pp. 144–173, 2000.
    [10] 楊瑞敏李嘉晃, “多文件摘要系統基於Mutual Reinforcement原理 Multi-Document Summarization System Based on Mutual Reinforcement Principle,” 國立交通大學, 2010.
    [11] K. S. Jones and others, “Automatic summarizing: factors and directions,” Adv. Autom. text Summ., pp. 1–12, 1999.
    [12] S. Afantenos, V. Karkaletsis, and P. Stamatopoulos, “Summarization from medical documents: A survey,” Artificial Intelligence in Medicine, vol. 33. pp. 157–177, 2005.
    [13] D. McDonald and H. Chen, “Using sentence-selection heuristics to rank text segments in TXTRACTOR,” in Management Information Systems, 2002, pp. 28–35.
    [14] C. Liu, H.-R. Ke, and W.-P. Yan, “以概念分群為基礎之新聞文件自動摘要系統 Concept Cluster Based News Document Summarization,” 國立交通大學, 2005.
    [15] A. H. Oh, “Generating Multiple Summaries Based on Computational Model of Perspective,” Massachusetts Institute of Technology, 2008.
    [16] J. G. Stewart, “Genre Oriented Summarization,” Carnegie Mellon University, 2008.
    [17] 施旭峰 and 李蔡彥, “災難事件下新媒體資訊傳播方式分析與自動化 分類設計─ 以八八風災為例 Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakot,” 國立政治大學, 2013.
    [18] F. C. and K. H. and G. Chen, “An Approach to Sentence-Selection-Based Text Summarization,” IEEE Reg. 10 Conf. Comput. Commun. Control Power Eng. (TENCON ’02), vol. 1, 2002.
    [19] R. Angheluta, R. De Busser, and M. Moens, “The Use of Topic Segmentation for Automatic Summarization,” in DUC 2002, 2002.
    [20] C. N. S. J. and C. A. A. K. and A. A.Freitas, “A Non-Linear Topic Detection Method for Text Summarization Using Wordnet,” Work. Technol. Inf. Lang. Hum., 2003.
    [21] V.-W. Soo and S.-J. Huang, “使用潛在語意分析與自我組織映射於中文文件摘要 Using Latent Semantic Analysis and Self-Organizing Map in Chinese Text Summarization,” 國立清華大學, 2008.
    [22] C. N. Silla Jr., C. A. A. Kaestner, and A. A.Freitas, “A Non-Linear Topic Detection Method for Text Summarization Using Wordnet,” in Workshop of Technology Information Language Human (TIL’2003), 2003.
    [23] H. Jiawei and M. Kamber, “Data mining: concepts and techniques,” San Fr. CA, itd Morgan Kaufmann, vol. 5, pp. 377–385, 2001.
    Description: 碩士
    國立政治大學
    資訊科學學系
    101753024
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0101753024
    Data Type: thesis
    Appears in Collections:[Department of Computer Science ] Theses

    Files in This Item:

    File SizeFormat
    302401.pdf3011KbAdobe PDF21008View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback