政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/99338
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113160/144130 (79%)
Visitors : 50753499      Online Users : 718
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大典藏 > College of Commerce > Department of MIS > Theses >  Item 140.119/99338
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/99338


    Title: 應用文字探勘於影評文章自動摘要之研究
    A Study on Application of Text Mining for Automatic Text Summarization of Film Review
    Authors: 鄧亦安
    Teng, I An
    Contributors: 楊建民
    鄧亦安
    Teng, I An
    Keywords: 文字探勘
    電影影評摘要
    自動文章摘要
    Text-mining
    Film review summary
    Automatic text summarization
    Date: 2016
    Issue Date: 2016-07-20 17:15:39 (UTC+8)
    Abstract: 隨著網路世界的興起,在面臨選擇難題時,民眾不僅會接收口耳相傳的資訊,也會以關鍵字上網搜尋目標資訊,但是在海量資料的浪潮中,如何快速的整合資料是一大挑戰。電影影評文章摘要可以幫助民眾進電影院前了解電影的資訊,透過這樣的方式確認電影是自身有興趣的電影。
    本研究以電影:復仇者聯盟2影評66篇4616句、蝙蝠俠對超人:正義曙光60篇9345句、動物方城市60篇5545句、星際效應50篇4616句、高年級實習生62篇5622句為資料來源,以分群概念結合摘句之方法生成影評摘要。其中,利用K-Means演算法將五部電影的多篇影評特徵詞、句子進行分群後,使用TFIDF評比各分群語句的重要性來選取高權重語句,再以WWA方法挑選分群中不同面向的語句,最後以相似度計算最佳範本與各分群內容的相似度來決定每一群聚的排序順序,產生一篇具有相似內容段落和段落順序的影評多篇摘要。
    研究結果顯示,原本五部電影影評對最佳範本之相似度為15.87%,經由本研究方法產生之摘要對最佳範本單篇摘要之相似度為21.19%。另外,因為影評中各分群的順序是比對最佳範本相似度而產生的排序,整篇摘要會具有與最佳範本相似段落排序的摘要內容,其中內容包含了電影影評中廣泛提到的相似內容,不同的相似段落讓文章摘要的呈現更具廣泛性。藉由此摘要方法,可以幫助民眾藉由自動化彙整、萃取的摘要快速了解相關電影資訊內容和協助決策。
    Abstract
    As Facing the Big Data issue, there are too many information on the website for reader to understand. How to perform and summarize essential information quickly is a challenge. People who want to go to a movie will also face this situation. Before choosing movies, they will search relative information of the movies. However, there are many film reviews all over the websites. Automatic text summarization can efficiently extract important information for readers, and conclude concepts of reviews on the websites. Through this method, readers can easily comprehend the best idea of all the reviews and save their time.
    The research presents a multi-concept and extractive film review summary for readers. It generates film review summary from the most popular blog platform, PIXNET, with extract-based method and clustering concept. The method using K-Means algorism let the film review summary focus on specific film to cluster the sentences by features, and having statistical sense and WWA method to measure the weight of sentences in order to choose the representative sentences. On the last step, it will compare to templates to decide the sequence of classified sentences and summary all represent sentences from each cluster. The research provides a multi-concept and extractive film review summary for people.
    From the result, there are five movies, which are used summary method increase the average similarity to 21.19% that comparing between the film reviews summary and templates summary. It shows that the automatic film reviews summarization can extract the important sentences from the reviews. Also, with comparing template method to order the cluster, it can sequentially list the cluster of the sentences to generate a movie review, which saves readers’ time and easily comprehend.
    Reference: 黃仁鵬、張貞瑩。2014。運用詞彙權重技術於自動文件摘要之研究。中華民國資訊管理學報12(4)。
    黃純敏、黃世源、盧韋秀。2011。自動摘要方法於新聞解讀之比較。商管與資訊研討會論文集(TBI 2011)(4)。
    張云濤、龔玲。2012。資料探勘原理與技術,台北市:五南圖書。
    袁立安。2007。混合式之自動文件摘要方法。碩士論文。國立中山大學資訊管理研究所。
    陶幼慧、黃清俊、楊誌欽。2006。網路論壇FAQ知識之自動轉換設計。資訊管理學報13(2),89-112。
    陳稼興、謝佳倫、許芳誠。2006。以遺傳演算法為基礎的中文斷詞研究。碩士論文。資訊管理研究。
    楊維邦、葉鎮源、劉政璋 、柯皓仁。2006。以概念分群為基礎之新聞文件自動摘要系統。碩士論文。國立交通大學資訊科學系所。
    劉政璋。2005。以概念分群為基礎之新聞文件自動摘要系統。碩士論文。國立交通大學資訊科學研究所。
    張奇、黃萱菁、吳立德。2013。一種新的句子相似度度量及其在文本自動摘要的應用。中文訊息學報19(2)。
    葛加銀。2004。文本自動摘要技術的研究。碩士論文。復旦大學。

    英文文獻
    Sullivan, D. (2001). Document Warehousing and Text Mining. Wiley.
    Dalal, M.K. and Zaveri M.A. (2011). Heuristics based automatic text summarization of unstructured text. Proceedings of the International Conference & Workshop on Emerging Trends in Technology (ICWET 2011), Mumbai, India, February 25-26.
    Das, D. and Martins A.F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU, Vol. 4, pp. 192-195.
    Gupta, V. and Lehal G.S. (2010). A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence, Vol. 2, No. 3, pp. 258-268.
    Mani, I. and Maybury M.T. (1999). Advances in Automatic Text Summarization. Vol. 293, Cambridge: MIT press.
    S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, vol. 41, pp. 391-407.
    T. Hofmann (1999). Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, USA, 1999.
    Dempster, N. Laird, and D. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, Series B, vol. 39, pp. 1-38, 1977.
    D. M. Blei, A. Y. Ng, and M. I. Jordan (2003). Latent dirichlet allocation. J. Mach. Learn. Res., vol. 3, pp. 993-1022.
    D. Newman, K. Hagedorn, C. Chemudugunta, and P. Smyth (2007). Subject metadata enrichment using statistical topic models. Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, Vancouver, BC, Canada.
    J. Boyd-Graber, J. Chang, S. Gerrish, C. Wang, and D. Blei (2009). Reading tea leaves: how humans interpret topic models. Neural Information Processing Systems NIPS.
    Description: 碩士
    國立政治大學
    資訊管理學系
    103356032
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0103356032
    Data Type: thesis
    Appears in Collections:[Department of MIS] Theses

    Files in This Item:

    File SizeFormat
    603201.pdf2336KbAdobe PDF2154View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback