Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/124712
|
Title: | 應用文字探勘於實用推薦文辨別之研究 -以愛評網美食評論為例 A study of identifying useful review comments for food recommendation with text mining approach |
Authors: | 黃怡蓁 Huang, Yi-Chen |
Contributors: | 楊建民 洪為璽 Yang, Chien-Min Hung, Wei-Hsi 黃怡蓁 Huang, Yi-Chen |
Keywords: | 中文斷詞 資料探勘 網路爬蟲 群集分析 用戶生成內容 Chinese word segmentation Data mining Web crawling Cluster analysis User-generated content |
Date: | 2019 |
Issue Date: | 2019-08-07 16:07:15 (UTC+8) |
Abstract: | 網路是世界上最有用的資訊查詢工具,隨著電子商務網站大幅度興起,消費者常於進行購買前閱讀網路相關產品與店家推薦文,並於消費後上網進行經驗回饋分享,在這樣的相互作用之下,網路上的相關產品用戶生成推薦文越垂手可得,資訊與雜訊的分辨逐漸重要。 相關產品推薦文有效影響個人的購買行為與企業發展產品決策,本研究提出一監督式學習的迭代模型,探討非結構性之推薦文對於潛在消費者是否實用,以達到辨別評論為實用或非實用文的目的。 本研究採用愛評網(ipeen)之美食評論發表時間於2008年1月至2018年12月內,共1,219篇實用文與478篇非實用文作為檢測實驗資料,透過使用者與評論層級之雙層過濾,以主題性分析建立特徵詞庫,再以Support Vector Machine、Naive Bayes classifier、Random Forests進行分類,藉由分析結果建立預測模型,並定期擴增詞庫以自適應地學習新實用文迭代模式,因應時代用詞變化。 研究結果顯示最佳模型之準確度為80.20%,精確度為0.924,召回率為0.6886,F-score則可達 0.7891,後續研究可進一步拓展跨領域評論辨別。 E-commerce is growing at an unprecedented rate all over the globe and the internet is becoming an increasingly important query tool in the world. Consumers often read the related online review to get more comments about the products before purchasing and share their opinion and experiences on the products they`ve purchased. Under the interaction, the more user-generated content on the internet, the more important it is to distinguish between information and noise.Reviews of the related product effectively influence the purchase decision of individuals and organizations and predict product trends. In this study, we present an iterative and supervised framework, exploring the differences between the participial construction of unstructured recommended reviews for potential consumers, in order to achieve the purpose of distinguishing comments as useful or non-useful. The reviews which we used python to do web crawler to collect from iPeen was published from January 2008 to December 2018. There are 1,219 useful reviews and 478 non- useful reviews were used as our dataset, which were filtered by double layer of user and comment level. We utilized topic model to find the implicit features in the dataset and then it were be used by Support Vector Machine, Naive Bayes classifier, Random Forests for classification At last, we use the analyzing results of the classification to establish a prediction model. The dataset will periodically update to amplify the keyword thesaurus and adaptively learn the new implicit features. The accuracy of the model is 80.20%, the precision is 0.924, and the recall rate is 0.6886. F-score can reach 0.7891. |
Reference: | 中文文獻 江義平、溫演福、廖奕翔、陳靖翔、陳佳駿(2012)。網路文字探勘技術運用於 智慧型手機口碑之分析研究,國立台北大學資訊管理研究所。 吳珮菁(2012)。意見探勘分析顧客行為之研究。國立成功大學資訊管理研究所碩士論文,台南市。 取自https://hdl.handle.net/11296/8h9d86 任柏衛(2015)。基於文章分析的美食推薦系統。國立清華大學通訊工程研究所碩士論文,新竹市。 取自https://hdl.handle.net/11296/vj93b7 林名彥(2015)。應用文字探勘技術於客訴資料之研究-以台大PPT論壇為例。龍華科技大學資訊管理系碩士班碩士論文,桃園縣。 取自https://hdl.handle.net/11296/8u7ft9 李啟誠、李羽喬 (2010)。網路口碑對消費者購買決策之影響── 以產品涉入及品牌形象為干擾變項. 中華管理評論學報, 第十三卷一期, 1-23. 林國仲(2017)。運用情緒分析結合產品多面向自動分類於消費者評價之研究。國立臺南大學數位學習科技學系數位學習科技碩士在職專班碩士論文,台南市。 取自https://hdl.handle.net/11296/r4fdnz 陳世榮(2015)。"社會科學研究中的文字探勘應用: 以文意為基礎的文件分類及其問題." 人文及社會科學集刊 27.4 : 683-718. 劉力華(2010)。"應用資料探勘於手機評論文章分類之研究." 電子化企業經營管理理論暨實務研討會 : 294-303. 王力弘(2015)。社群媒體新詞偵測系統 以 PTT 八卦版為例 (Doctoral dissertation, 王力弘). 蕭昱維(2014)。基於多階 LDA 技術尋找 Twitter 文章的隱含主題之研究. 樹德科技大學資訊工程系碩士班學位論文, 1-47. 英文文獻 Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 4(510), 126. Bickart, B., & Schindler, R. M. (2001). Internet forums as influential sources of consumer information. Journal of interactive marketing, 15(3), 31-40. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297. Chavoshi, N., Hamooni, H., & Mueen, A. (2016, December). DeBot: Twitter Bot Detection via Warped Correlation. In ICDM(pp. 817-822). Chen, Z., Tanash, R. S., Stoll, R., & Subramanian, D. (2017, September). Hunting Malicious Bots on Twitter: An Unsupervised Approach. In International Conference on Social Informatics (pp. 501-510). Springer, Cham. Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273-274). International World Wide Web Conferences Steering Committee. Eagly, A. H., Wood, W., & Chaiken, S. (1978). Causal inferences about communicators and their effect on opinion change. Journal of Personality and social Psychology, 36(4), 424. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235. Garg, R. (2011, July). Study of text based mining. In Proceedings of the International Conference on Advances in Computing and Artificial Intelligence (pp. 5-8). ACM. Narayan, R., Rout, J. K., & Jena, S. K. (2018). Review spam detection using semi-supervised technique. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications (pp. 281-286). Springer, Singapore. Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011, June). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1 (pp. 309-319). Association for Computational Linguistics. Kohavi, R., & Provost, F. (1998). Glossary of Terms Journal of Machine Learning. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47. Sedhai, S., & Sun, A. (2017). Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 5(1), 169-175. Tsur, O., Davidov, D., & Rappoport, A. (2010, May). ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Fourth International AAAI Conference on Weblogs and Social Media. |
Description: | 碩士 國立政治大學 資訊管理學系 106356027 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106356027 |
Data Type: | thesis |
DOI: | 10.6814/NCCU201900451 |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
602701.pdf | 996Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|