Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/68235
|
Title: | 對使用者評論之情感分析研究-以Google Play市集為例 Research into App user opinions with Sentimental Analysis on the Google Play market |
Authors: | 林育龍 Lin, Yu Long |
Contributors: | 姜國輝 林育龍 Lin, Yu Long |
Keywords: | 情感分析 文字分類 支援向量機 社會網路分析 對應分析 Sentiment Analysis Text Classification Support Vector Machine Social Network Analysis Correspondence Analysis |
Date: | 2013 |
Issue Date: | 2014-08-06 11:41:16 (UTC+8) |
Abstract: | 全球智慧型手機的出貨量持續提升,且熱門市集的App下載次數紛紛突破500億次。而在iOS和Android手機App市集中,App的評價和評論對App在市集的排序有很大的影響;對於App開發者而言,透過評論確實可掌握使用者的需求,並在產生抱怨前能快速反應避免危機。然而,每日多達上百篇的評論,透過人力逐篇查看,不止耗費時間,更無法整合性的瞭解使用者的需求與問題。 文字情感分析通常會使用監督式或非監督式的方法分析文字評論,其中監督式方法被證實透過簡單的文件量化方法就可達到很高的正確率。但監督式方法有無法預期未知趨勢的限制,且需要進行耗費人力的文章類別標注工作。 本研究透過情感傾向和熱門關注議題兩個面向來分析App評論,提出一個混合非監督式與監督式的中文情感分析方法。我們先透過非監督式方法標注評論類別,並作視覺化整理呈現,最後再用監督式方法建立分類模型,並驗證其效果。 在實驗結果中,利用中文詞彙網路所建立的情感詞集,確實可用來判斷評論的正反情緒,唯判斷負面評論效果不佳需作改善。在議題擷取方面,嘗試使用兩種不同分群方法,其中使用NPMI衡量字詞間關係強度,再配合社群網路分析的Concor方法結果有不錯的成效。最後在使用監督式學習的分類結果中,情感傾向的分類正確率達到87%,關注議題的分類正確率達到96%,皆有不錯表現。 本研究利用中文詞彙網路與社會網路分析,來發展一個非監督式的中文類別判斷方法,並建立一個中文情感分析的範例。另外透過建立全面性的視覺化報告來瞭解使用者的正反回饋意見,並可透過分類模型來掌握新評論的內容,以提供App開發者在市場上之競爭智慧。 While the number of smartphone shipment is continuesly growing, the number of App downloads from the popular app markets has been already over 50 billion. By Apple App Store and Google Play, ratings and reviews play a more important role in influencing app difusion. While app developers can realize users’ needs by app reviews, more than thousands of reviews produced by user everday become difficult to be read and collated. Sentiment Analysis researchs encompass supervised and unsupervised methods for analyzing review text. The supervised learning is proven as a useful method and can reach high accuracy, but there are limits where future trend can not be recognized and the labels of individual classes must be made manually. We concentrate on two issues, viz Sentiment Orientation and Popular Topic, to propose a Chinese Sentiment Analysis method which combines supervised and unsupervised learning. At First, we use unsupervised learning to label every review articles and produce visualized reports. Secondly, we employee supervised learning to build classification model and verify the result. In the experiment, the Chinese WordNet is used to build sentiment lexicon to determin review’s sentiment orientation, but the result shows it is weak to find out negative review opinions. In the Topic Extraction phase, we apply two clustering methods to extract Popular Topic classes and its result is excellent by using of NPMI Model with Social Network Analysis Method i.e. Concor. In the supervised learning phase, the accuracy of Sentiment Orientation class is 87% and the accuracy of Popular Topic class is 96%. In this research, we conduct an exemplification of the unsupervised method by means of Chinese WorkNet and Social Network Analysis to determin the review classes. Also, we build a comprehensive visualized report to realize users’ feedbacks and utilize classification to explore new comments. Last but not least, with Chinese Sentiment Analysis of this research, and the competitive intelligence in App market can be provided to the App develops. |
Reference: | A. Abbasi, H. Chen, and A. Salem, “Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums,” ACM Trans. Inf. Syst., vol. 26, no. 3, pp. 12:1–12:34, Jun. 2008. A. E. Stefano Baccianella, “Using Micro-Documents for Feature Selection: The Case of Ordinal Text Classification.,” Expert Systems with Applications, vol. 40, no. 11, 2011. A. Hotho, A. Nürnberger, and G. Paaß, “A brief survey of text mining,” LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 2005. A. M. Qamar, E. Gaussier, J.-P. Chevallet, and J.-H. Lim, “Similarity Learning for Nearest Neighbor Classification,” in Eighth IEEE International Conference on Data Mining, 2008. ICDM ’08, 2008, pp. 983–988. Ai-xiang Sun, L. Ming-hui, H. Shun-liang, and Z. Jun, “A new hypersphere multi-class support vector machine applied in text classification,” in 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN), 2011, pp. 478–481. ARO Mobile Audience eXplorer, 2013, “台灣首份智慧型手機使用行為測量報告,” (accessed January 31, 2014), [ http://www.insightxplorer.com/news/news_03_23_13.html]. ATKearney, 2013, “GSMA Mobile Economy” B. Liu, “Sentiment Analysis and Opinion Mining,” Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167, May 2012. B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found. Trends Inf. Retr., vol. 2, no. 1–2, pp. 1–135, Jan. 2008. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, Stroudsburg, PA, USA, 2002, pp. 79–86. BI Intelligence Estimates, 2014, “Global Smartphone Shipment Forecast” C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995. C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008. C. Sun, X. Wang, and J. Xu, “Study on feature selection in finance text categorization,” in IEEE International Conference on Systems, Man and Cybernetics, 2009. SMC 2009, 2009, pp. 5077–5082. C. Yin and Q. Peng, “Sentiment Analysis for Product Features in Chinese Reviews Based on Semantic Association,” in International Conference on Artificial Intelligence and Computational Intelligence, 2009. AICI ’09, 2009, vol. 3, pp. 81–85. Chu-Ren Huang and Shu-Kai Hsieh. (2010). Infrastructure for Cross-lingual Knowledge Representation ─ Towards Multilingualism in Linguistic Studies. Taiwan NSC-granted Research Project (NSC 96-2411-H-003-061-MY3) D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Categorization Research,” J. Mach. Learn. Res., vol. 5, pp. 361–397, Dec. 2004. D. Oelke, M. Hao, C. Rohrdantz, D. A. Keim, U. Dayal, L. Haug, and H. Janetzko, “Visual opinion analysis of customer feedback data,” in IEEE Symposium on Visual Analytics Science and Technology, 2009. VAST 2009, 2009, pp. 187–194. D. Wu, “Fuzzy sets and systems in building closed-loop affective computing systems for human-computer interaction: Advances and new research directions,” in 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2012, pp. 1–8. Distimo, 2013, “2013 Year in Review” E. Srisukha, S. Jinarat, C. Haruechaiyasak, and A. Rungsawang, “Naïve bayes based language-specific web crawling,” in 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2008. ECTI-CON 2008, 2008, vol. 1, pp. 113–116. E.-H. (Sam) Han, G. Karypis, and V. Kumar, “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,” in Advances in Knowledge Discovery and Data Mining, D. Cheung, G. J. Williams, and Q. Li, Eds. Springer Berlin Heidelberg, 2001, pp. 53–65. Ericsson, 2013, “Ericsson Mobility Report-On The Pulse of The Networked Society” G Erkan, A Hassan, Q Diao, D Radev, “Improved Nearest Neighbor Methods For Text Classification. ” 2011 G. Uchyigit, “Experimental evaluation of feature selection methods for text classification,” in 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2012, pp. 1294–1298. G. Uchyigit, “Experimental evaluation of feature selection methods for text classification,” in 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2012, pp. 1294–1298. G. Zheng and Y. Tian, “Chinese Web Text Classification System Model Based on Naive Bayes,” in 2010 International Conference on E-Product E-Service and E-Entertainment (ICEEE), 2010, pp. 1–4. Garner, 2013, “Worldwide Smartphone Sales to End Users by Operating System in 3Q13” H. Drucker, S. Wu, and V. N. Vapnik, “Support vector machines for spam categorization,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1048–1054, Sep. 1999. H. H. Lek and D. C. C. Poo, “Aspect-Based Twitter Sentiment Classification,” in 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI), 2013, pp. 366–373. H. Sui, Y. Jianping, Z. Hongxian, and Z. Wei, “Sentiment analysis of Chinese micro-blog using semantic sentiment space model,” in 2012 2nd International Conference on Computer Science and Network Technology (ICCSNT), 2012, pp. 1443–1447. H. Sui, Y. Jianping, Z. Hongxian, and Z. Wei, “Sentiment analysis of Chinese micro-blog using semantic sentiment space model,” in 2012 2nd International Conference on Computer Science and Network Technology (ICCSNT), 2012, pp. 1443–1447. H. Zhang, Z. Yu, M. Xu, and Y. Shi, “Feature-level sentiment analysis for Chinese product reviews,” in 2011 3rd International Conference on Computer Research and Development (ICCRD), 2011, vol. 2, pp. 135–140. IDC, 2013, “Top Four Operation Systems, Shipments, and Market Share, Q3 2013” Ipsos MediaCT, 2013, “Our Mobile Planet: 台灣-瞭解行動上網的消費者” J. Xu, R.-F. Xu, and X.-L. Wang, “Language model based Chinese financial news sentiment classification,” in 2012 International Conference on Machine Learning and Cybernetics (ICMLC), 2012, vol. 5, pp. 2025–2030. J. Yang and Z. Liu, “A feature selection based on deviation from feature centroid for text categorization,” in 2011 2nd International Conference on Intelligent Control and Information Processing (ICICIP), 2011, vol. 1, pp. 180–184. J. Zhang, Q. Wang, Y. Li, D. Li, and Y. Hao, “A Method for Chinese Text Classification Based on Three-Dimensional Vector Space Model,” in 2012 International Conference on Computer Science Service System (CSSS), 2012, pp. 1324–1327. J.-H. Wang and C.-C. Lee, “Unsupervised Opinion Phrase Extraction and Rating in Chinese Blog Posts,” in Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference on social computing (socialcom), 2011, pp. 820–823. K. Mouthami, K. N. Devi, and V. M. Bhaskaran, “Sentiment analysis and classification based on textual reviews,” in 2013 International Conference on Information Communication and Embedded Systems (ICICES), 2013, pp. 271–276. L. Hao and L. Hao, “Automatic Identification of Stop Words in Chinese Text Classification,” in 2008 International Conference on Computer Science and Software Engineering, 2008, vol. 1, pp. 718–722. L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie Review Mining and Summarization,” in Proceedings of the 15th ACM International Conference on Information and Knowledge Management, New York, NY, USA, 2006, pp. 43–50. L.-Q. Qiu, R.-Y. Zhao, G. Zhou, and S.-W. Yi, “An Extensive Empirical Study of Feature Selection for Text Categorization,” in Seventh IEEE/ACIS International Conference on Computer and Information Science, 2008. ICIS 08, 2008, pp. 312–315. L.W. Ku, Y. T. Liang, H. H. Chen, “Opinion extraction, summarization and tracking in news and blog corpora,” in Proceedings of AAAI-CAAW`06. 2006 M. Farhadloo and E. Rolland, “Multi-Class Sentiment Analysis with Clustering and Score Representation,” in 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), 2013, pp. 904–912. M. Farhoodi and A. Yari, “Applying machine learning algorithms for automatic Persian text classification,” in 2010 6th International Conference on Advanced Information Management and Service (IMS), 2010, pp. 318–323. M. Harman, Y. Jia, and Y. Zhang, “App store mining and analysis: MSR for app stores,” in 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), 2012, pp. 108–111. M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2004, pp. 168–177. M. Ida, “Textual information and correspondence analysis in curriculum analysis,” in IEEE International Conference on Fuzzy Systems, 2009. FUZZ-IEEE 2009, 2009, pp. 666–669. M. J. Greenacre, Correspondence analysis in practice. Boca Raton: Chapman & Hall/CRC, 2007. M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning techniques,” in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013, pp. 1–5. M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based Methods for Sentiment Analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267–307, Jun. 2011. N. D. Valakunde and M. S. Patwardhan, “Multi-aspect and Multi-class Based Document Sentiment Analysis of Educational Data Catering Accreditation Process,” in 2013 International Conference on Cloud Ubiquitous Computing Emerging Technologies (CUBE), 2013, pp. 188–192. N. Jakob and I. Gurevych, “Extracting Opinion Targets in a Single- and Cross-domain Setting with Conditional Random Fields,” in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2010, pp. 1035–1045. P. D. Turney, “Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, USA, 2002, pp. 417–424. P. Hao, D. Ying, and T. Longyuan, “Application for Web Text Categorization Based on Support Vector Machine,” in International Forum on Computer Science-Technology and Applications, 2009. IFCSTA ’09, 2009, vol. 2, pp. 42–45. R. Feldman, “Techniques and applications for sentiment analysis,” Communications of the ACM, vol. 56, no. 4, p. 82, Apr. 2013. S. Eyheramendy, D. D. Lewis, and D. Madigan, On the Naive Bayes Model for Text Categorization. 2003. S. Kovelamudi,S. Ramalingam, A. Sood, V. Varma,“Domain Independent product attribute extraction from user reviews using Wikipedia.” In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP-2010). 2011. S. Tata and J. M. Patel, “Estimating the Selectivity of Tf-idf Based Cosine Similarity Predicates,” SIGMOD Rec., vol. 36, no. 2, pp. 7–12, Jun. 2007. S. Wei, J. Guo, Z. Yu, P. Chen, and Y. Xian, “The instructional design of Chinese text classification based on SVM,” in Control and Decision Conference (CCDC), 2013 25th Chinese, 2013, pp. 5114–5117. S.-B. Kim, K.-S. Han, H.-C. Rim, and S.-H. Myaeng, “Some Effective Techniques for Naive Bayes Text Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1457–1466, Nov. 2006. S.-M. Kim and E. Hovy, “Crystal: Analyzing Predictive Opinions on the Web,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007. Statista, 2013, “Cumulative number of apps downloaded from the Google Play Android app store as of July 2013,” (accessed January 31, 2014),[ http://www.statista.com/statistics/281106/number-of-android-app-downloads-from-google-play/]. Statista, 2013, “Google Overtakes Apple-Number of apps available in the top app Stores,” (accessed January 31, 2014),[ http://www.statista.com/chart/812/number-of-apps-available-in-the-top-app-stores/]. Statista, 2013, “Google Play Looks Set to Overtake Apple App Store-Total number of apps downloaded,” (accessed January 31, 2014),[ http://www.statista.com/chart/1109/google-play-looks-set-to-overtake-apple-s-app-store/]. Statista, 2013, “iOS Stagnates as Android Steams Ahead,” (accessed January 30, 2014),[ http://www.statista.com/chart/1099/smartphone-operating-system-market-share/]. Statista, 2013, “Messaging & Social App Use Triples in 2013,” (accessed January 31, 2014),[ http://www.statista.com/chart/1778/app-use-in-2013/]. Statista, 2013, “Number of available applications in the Google Play Store from December 2009 to July 2013,” (accessed January 31, 2014),[ http://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/]. T. Basu and C. A. Murthy, “Effective Text Classification by a Supervised Feature Selection Approach,” in 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), 2012, pp. 918–925. T. Basu and C. A. Murthy, “Effective Text Classification by a Supervised Feature Selection Approach,” in 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), 2012, pp. 918–925. T. H. A. Soliman, M. A. Elmasry, A. R. Hedar, and M. M. Doss, “Utilizing support vector machines in mining online customer reviews,” in 2012 22nd International Conference on Computer Theory and Applications (ICCTA), 2012, pp. 192–197. T. Joachims, “Text Categorization with Suport Vector Machines: Learning with Many Relevant Features,” in Proceedings of the 10th European Conference on Machine Learning, London, UK, UK, 1998, pp. 137–142. T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” in Machine Learning: ECML-98, C. Nédellec and C. Rouveirol, Eds. Springer Berlin Heidelberg, 1998, pp. 137–142. T. M. Cover and Thomas, Elements of information theory. New York: Wiley, 1991. V. K. Singh, R. Piryani, A. Uddin, and P. Waila, “Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification,” in 2013 International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), 2013, pp. 712–717. V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988–999, Sep. 1999. Vpon Inc., 2012, “2012台灣行動廣告市場年終報告” Vpon Inc., 2013, “2013台灣行動廣告市場年終報告” W. Fan and M. D. Gordon, “The Power of Social Media Analytics,” Commun. ACM, vol. 57, no. 6, pp. 74–81, Jun. 2014. W. Zhang, H. Xu, and W. Wan, “Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis,” Expert Systems with Applications, vol. 39, no. 11, pp. 10283–10291, Sep. 2012. X. Ding, B. Liu, and P. S. Yu, “A Holistic Lexicon-based Approach to Opinion Mining,” in Proceedings of the 2008 International Conference on Web Search and Data Mining, New York, NY, USA, 2008, pp. 231–240. X. Yan, “A Study for Important Criteria of Feature Selection in Text Categorization,” in 2010 2nd International Workshop on Intelligent Systems and Applications (ISA), 2010, pp. 1–4. X. Zhang, M. Zhou, G. Geng, and N. Ye, “A Combined Feature Selection Method for Chinese Text Categorization,” in International Conference on Information Engineering and Computer Science, 2009. ICIECS 2009, 2009, pp. 1–4. X. Zhou, X. Tao, J. Yong, and Z. Yang, “Sentiment analysis on tweets for social events,” in 2013 IEEE 17th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2013, pp. 557–562. Y. Liu and Y. Sun, “Can reputation manipulation boost app sales in Android market?,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8707–8711. Y. Xu, “A comparative study on feature selection in Chinese Spam Filtering,” in 2012 6th International Conference on Application of Information and Communication Technologies (AICT), 2012, pp. 1–6. Y. Xu, G. Jones, J. Li, B. Wang, C. Sun. A Study on Mutual Information-based Feature Selection for Text Categorization. Journal of Computational Information Systems, 3:3 pp. 1007-1012, 2007 Y. Yang and J. O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” 1997, pp. 412–420. Z. Faguo, Z. Fan, Y. Bingru, and Y. Xingang, “Research on Short Text Classification Algorithm Based on Statistics and Rules,” in 2010 Third International Symposium on Electronic Commerce and Security (ISECS), 2010, pp. 3–7. Z. Hai, K. Chang, and J. Kim, “Implicit Feature Identification via Co-occurrence Association Rule Mining,” in Computational Linguistics and Intelligent Text Processing, A. F. Gelbukh, Ed. Springer Berlin Heidelberg, 2011, pp. 393–404. Z.-Q. Wang, X. Sun, D. Zhang, and X. Li, “An Optimal SVM-Based Text Classification Algorithm,” in 2006 International Conference on Machine Learning and Cybernetics, 2006, pp. 1378–1381. 吳國芳, “高雄市六家醫院形象定位之研究-對應分析的應用,” 2002 李啟菁, “中文部落格文章之意見分析”, 2010 陳家倫, ”台灣宗教行動圖像的初步建構.”台灣社會變遷基本調查之研究分析研討會 2001 黃居仁, 謝舒凱, 洪嘉馡, 陳韻竹, 蘇依莉, 陳永祥, 黃勝偉. 中文詞彙網路:跨語言知識處理基礎架構的設計理念與實踐. 中國語文,24卷第二期 黃居仁,謝舒凱, “跨語言知識表徵基礎架構─面向多語化與全球化的語言學研究”, 國科會專題補助計畫 (NSC 96-2411-H-003-061-MY3) 溫傑華, 陳韋穎, “運用多重對應分析探討航空公司市場定位-以台北至東京航線為例,” 中華民國運輸學會98年學術論文研討會, 2009 資策會, 2012, “臺灣資行動裝置應用程式使用與偏好,” (accessed January 31, 2014),[ http://www.find.org.tw/find/home.aspx?page=many&id=332]. 資策會FIND,2014, “百大APP活躍使用者調查分析報告”, (accessed June 30, 2014), [http://www.find.org.tw/find/home.aspx?page=many&id=385] 劉吉軒, 吳建良, “以情緒為中心之情境資訊觀察與評估, ” 2007NCS全國計算機會議, 2007,pp. 12-20~21 |
Description: | 碩士 國立政治大學 資訊管理研究所 101356028 102 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0101356028 |
Data Type: | thesis |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
602801.pdf | | 2683Kb | Adobe PDF2 | 193 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|