Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/110798
|
Title: | 應用情感分析於媒體新聞傾向之研究-以中央社為例 Applying sentiment analysis to the tendency of media news: a case study of central news agency |
Authors: | 吳信維 Wu, Xin-Wei |
Contributors: | 姜國輝 Chiang, Kuo-Huie 吳信維 Wu, Xin-Wei |
Keywords: | 情感分析 LDA主題模型 n-gram a-priori Sentiment analysis LDA N-gram A-priori |
Date: | 2017 |
Issue Date: | 2017-07-11 11:29:39 (UTC+8) |
Abstract: | 本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫,並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法,從中央通訊社中抓取國民黨以及民進黨的相關新聞文本,建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。 本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘,於這些詞中具有真正意義的詞共有28838筆,成功率可達88%。 本研究比較兩種分群方法建立主題模型,分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中,因為文本數量遠大於議題詞數量,造成TFIDF矩陣過於稀疏,造成分群效果不佳。在LDA的分群結果底下,因為LDA模型其多文章多主題共享的特性,主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本,採用LDA模型來進行議題詞分群會有較佳的表現。 本研究透過結合不同的資料時間區間,呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候,而近三次總統大選的結果顯示,相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出,不論執政黨為何,中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容,大抵不會特別偏向單一政黨 The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results. This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%. In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering. Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party. |
Reference: | [1] 中央通訊社,(2004)。全球新聞神經大透視。台北:中央通訊社。 [2] 王正豪, & 李啟菁. (2010). 中文部落格文章之意見分析. 碩士論文, 國立台北科技大學資訊工程研究所. [3] 李日斌. (2014). 探討臺灣網民對鄰國的情感. 中山大學資訊管理學系研究所學位論文, 1-66. [4] 杜嘉忠、徐健、劉穎,(2014)。網絡商品評論的特徵-情感詞本體構建與情感分析方法研究,現代圖書情報技術,30(5),74-82。 [5] 林育龍. (2013). 對使用者評論之情感分析研究-以Google Play市集為例. 國立政治大學資訊管理所碩士論文 [6] 洪崇洋. (2012). LDA 和使用紀錄為基礎的線上電子書主題趨勢發掘方法. 國立中山大學資訊管理所碩士論文 [7] 張日威. (2014). 應用LDA進行Plurk主題分類及使用者情緒分析. 國立雲林科技大學資訊管理所碩士論文 [8] 許桓瑜. (2012). 長句斷詞法和遺傳演算法對新聞分類的影響. 淡江大學資訊工程學系碩士班學位論文, [9] 陳昭元. (2016). 應用情感分析於輿情之研究-以台灣 2016. 國立政治大學資訊管理學系碩士班學位論文, [10] 黃居仁 (2007-2009),謝舒凱 (2009-2010)。《跨語言知識表徵基礎架構─面向多語化與 全球化的語言學研究》。國科會專題補助計畫 [11] 黃居仁,謝舒凱,洪嘉馡,陳韻竹,蘇依莉,陳永祥,黃勝偉。中文詞彙網路: 跨語言知識處理基礎架構的設計理念與實踐. 中國語文,24卷第二期 [12] 黃運高, 王妍, 邱武松, 向林泓, & 趙學良. (2014). 基於 K-means 和 TF-IDF 的中文藥名聚類分析. 計算機應用, 1. [13] 劉吉軒, & 吳建良. (2007). 以情緒為中心之情境資訊觀察與評估. Paper presented at the NCS全國計算機會議. [14] 龔建彰. (2014). 基於新聞字詞漲跌極性之股價趨勢分類預測. 交通大學資訊管理研究所學位論文, 1-36. [15] Agrawal, R., & Srikant, R. (1994, September). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499). [16] Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 2200-2204). [17] Basu, T., & Murthy, C. A. (2012, December). Effective text classification by a supervised feature selection approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 918-925). IEEE. [18] Basu, T., & Murthy, C. A. (2012, December). Effective text classification by a supervised feature selection approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 918-925). IEEE. [19] Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. [20] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. [21] Breakthrough Analysis. Retrieved 2015-02-23, from https://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/ [22] Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-479. [23] Changqiu, S., Xiaolong, W., & Jun, X. (2009). Study on Feature Selection in Finance Text Categorization. In Conference Proceedings-IEEE International Conference on Systems, Man and Cybernetics, Art (No. 5346030, pp. 5077-5082). [24] Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics. [25] Chen, K. J., & Ma, W. Y. (2002, August). Unknown word extraction for Chinese documents. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics. [26] Chen, K. J., & Ma, W. Y. (2002, August). Unknown word extraction for Chinese documents. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics. [27] Chu-Ren Huang and Shu-Kai Hsieh. (2010). Infrastructure for Cross-lingual Knowledge Representation ─ Towards Multilingualism in Linguistic Studies. Taiwan NSC-granted Research Project (NSC 96-2411-H-003-061-MY3) [28] Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons. [29] Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048-1054. [30] DTREG, Retrieved February 14 2017, from https://www.dtreg.com/solution/view/20 [31] Erkan G, Özgür A, Radev D R (2007) Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing. Proceedings of EMNLP-CoNLL 228–237. [32] Farhadloo, M., & Rolland, E. (2013, December). Multi-class sentiment analysis with clustering and score representation. In Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on (pp. 904-912). IEEE. [33] Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89. [34] Gary King, Jennifer Pan, and Margaret E Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review, 2 (May), 107: 1-18. [35] Gong, Z., & Yu, T. (2010, November). Chinese web text classification system model based on Naive Bayes. In E-Product E-Service and E-Entertainment (ICEEE), 2010 International Conference on (pp. 1-4). IEEE. [36] Griffiths, T., & Steyvers, M. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440. [37] Hao, L., & Hao, L. (2008, December). Automatic identification of stop words in chinese text classification. In Computer Science and Software Engineering, 2008 International Conference on (Vol. 1, pp. 718-722). IEEE. [38] Hotho, A., Nürnberger, A., & Paaß, G. (2005, May). A brief survey of text mining. In Ldv Forum (Vol. 20, No. 1, pp. 19-62). [39] Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM. [40] Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (pp. 137-142). Springer Berlin Heidelberg. [41] Jonathan Hassid(2012). Safety Valve or Pressure Cooker. Journal of Communication 62: 212–230 [42] Kim, S. M., & Hovy, E. H. (2007, June). Crystal: Analyzing Predictive Opinions on the Web. In EMNLP-CoNLL (pp. 1056-1064). [43] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167. [44] Liu, H., Sun, J., Liu, L., & Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330-1339. [45] LOPE Lab, Retrieved February 14 2017, from http://lope.linguistics.ntu.edu.tw/cwn/ [46] Lowe W. (2015) ‘Yoshikoder: Cross-platform multilingual content analysis’. Java software version 0.6.5, URL http://www.yoshikoder.org [47] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, No. 1, p. 496). Cambridge: Cambridge university press. [48] Mouthami, K., Devi, K. N., & Bhaskaran, V. M. (2013, February). Sentiment analysis and classification based on textual reviews. In Information communication and embedded systems (ICICES), 2013 international conference on (pp. 271-276). IEEE. [49] Newman, D., Asuncion, A. U., Smyth, P., & Welling, M. (2007, December). Distributed Inference for Latent Dirichlet Allocation. In NIPS (Vol. 20, pp. 1081-1088). [50] Oelke, D., Hao, M., Rohrdantz, C., Keim, D. A., Dayal, U., Haug, L. E., & Janetzko, H. (2009, October). Visual opinion analysis of customer feedback data. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on (pp. 187-194). IEEE. [51] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135. [52] Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics. [53] Qamar, A. M., Gaussier, E., Chevallet, J. P., & Lim, J. H. (2008, December). Similarity learning for nearest neighbor classification. In Data Mining, 2008. ICDM`08. Eighth IEEE International Conference on (pp. 983-988). IEEE. [54] Soliman, T. H. A., Elmasry, M. A., Hedar, A. R., & Doss, M. M. (2012, October). Utilizing support vector machines in mining online customer reviews. In Computer Theory and Applications (ICCTA), 2012 22nd International Conference on (pp. 192-197). IEEE. [55] Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307. [56] Tata, S., & Patel, J. M. (2007). Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Record, 36(2), 7-12. [57] Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics. [58] Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346. [59] Valakunde, N. D., & Patwardhan, M. S. (2013, November). Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process. In Cloud & Ubiquitous Computing & Emerging Technologies (CUBE), 2013 International Conference on (pp. 188-192). IEEE. [60] Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on neural networks, 10(5), 988-999. [61] Wang, Y., & Huang, S. T. (2005, August). Chinese word segmentation based on A-priori and adjacent characters. In Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on (Vol. 6, pp. 3808-3813). IEEE. [62] Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In Icml (Vol. 97, pp. 412-420). |
Description: | 碩士 國立政治大學 資訊管理學系 104356023 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0104356023 |
Data Type: | thesis |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
602301.pdf | 3139Kb | Adobe PDF2 | 82 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|