政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/124712

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 118786/149850 (79%)
造訪人次 : 81788686 線上人數 : 15

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 > Item 140.119/124712

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/124712

題名:	應用文字探勘於實用推薦文辨別之研究 -以愛評網美食評論為例 A study of identifying useful review comments for food recommendation with text mining approach
作者:	黃怡蓁 Huang, Yi-Chen
貢獻者:	楊建民洪為璽 Yang, Chien-Min Hung, Wei-Hsi 黃怡蓁 Huang, Yi-Chen
關鍵詞:	中文斷詞資料探勘網路爬蟲群集分析用戶生成內容 Chinese word segmentation Data mining Web crawling Cluster analysis User-generated content
日期:	2019
上傳時間:	2019-08-07 16:07:15 (UTC+8)
摘要:	網路是世界上最有用的資訊查詢工具，隨著電子商務網站大幅度興起，消費者常於進行購買前閱讀網路相關產品與店家推薦文，並於消費後上網進行經驗回饋分享，在這樣的相互作用之下，網路上的相關產品用戶生成推薦文越垂手可得，資訊與雜訊的分辨逐漸重要。相關產品推薦文有效影響個人的購買行為與企業發展產品決策，本研究提出一監督式學習的迭代模型，探討非結構性之推薦文對於潛在消費者是否實用，以達到辨別評論為實用或非實用文的目的。本研究採用愛評網(ipeen)之美食評論發表時間於2008年1月至2018年12月內，共1,219篇實用文與478篇非實用文作為檢測實驗資料，透過使用者與評論層級之雙層過濾，以主題性分析建立特徵詞庫，再以Support Vector Machine、Naive Bayes classifier、Random Forests進行分類，藉由分析結果建立預測模型，並定期擴增詞庫以自適應地學習新實用文迭代模式，因應時代用詞變化。研究結果顯示最佳模型之準確度為80.20%，精確度為0.924，召回率為0.6886，F-score則可達 0.7891，後續研究可進一步拓展跨領域評論辨別。 E-commerce is growing at an unprecedented rate all over the globe and the internet is becoming an increasingly important query tool in the world. Consumers often read the related online review to get more comments about the products before purchasing and share their opinion and experiences on the products they`ve purchased. Under the interaction, the more user-generated content on the internet, the more important it is to distinguish between information and noise.Reviews of the related product effectively influence the purchase decision of individuals and organizations and predict product trends. In this study, we present an iterative and supervised framework, exploring the differences between the participial construction of unstructured recommended reviews for potential consumers, in order to achieve the purpose of distinguishing comments as useful or non-useful. The reviews which we used python to do web crawler to collect from iPeen was published from January 2008 to December 2018. There are 1,219 useful reviews and 478 non- useful reviews were used as our dataset, which were filtered by double layer of user and comment level. We utilized topic model to find the implicit features in the dataset and then it were be used by Support Vector Machine, Naive Bayes classifier, Random Forests for classification At last, we use the analyzing results of the classification to establish a prediction model. The dataset will periodically update to amplify the keyword thesaurus and adaptively learn the new implicit features. The accuracy of the model is 80.20%, the precision is 0.924, and the recall rate is 0.6886. F-score can reach 0.7891.
參考文獻:	中文文獻江義平、溫演福、廖奕翔、陳靖翔、陳佳駿（2012）。網路文字探勘技術運用於智慧型手機口碑之分析研究，國立台北大學資訊管理研究所。吳珮菁（2012）。意見探勘分析顧客行為之研究。國立成功大學資訊管理研究所碩士論文，台南市。取自https://hdl.handle.net/11296/8h9d86 任柏衛（2015）。基於文章分析的美食推薦系統。國立清華大學通訊工程研究所碩士論文，新竹市。取自https://hdl.handle.net/11296/vj93b7 林名彥（2015）。應用文字探勘技術於客訴資料之研究-以台大PPT論壇為例。龍華科技大學資訊管理系碩士班碩士論文，桃園縣。取自https://hdl.handle.net/11296/8u7ft9 李啟誠、李羽喬 (2010)。網路口碑對消費者購買決策之影響── 以產品涉入及品牌形象為干擾變項. 中華管理評論學報, 第十三卷一期, 1-23. 林國仲（2017）。運用情緒分析結合產品多面向自動分類於消費者評價之研究。國立臺南大學數位學習科技學系數位學習科技碩士在職專班碩士論文，台南市。取自https://hdl.handle.net/11296/r4fdnz 陳世榮（2015）。"社會科學研究中的文字探勘應用: 以文意為基礎的文件分類及其問題." 人文及社會科學集刊 27.4 : 683-718. 劉力華（2010）。"應用資料探勘於手機評論文章分類之研究." 電子化企業經營管理理論暨實務研討會 : 294-303. 王力弘（2015）。社群媒體新詞偵測系統以 PTT 八卦版為例 (Doctoral dissertation, 王力弘). 蕭昱維（2014）。基於多階 LDA 技術尋找 Twitter 文章的隱含主題之研究. 樹德科技大學資訊工程系碩士班學位論文, 1-47. 英文文獻 Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 4(510), 126. Bickart, B., & Schindler, R. M. (2001). Internet forums as influential sources of consumer information. Journal of interactive marketing, 15(3), 31-40. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297. Chavoshi, N., Hamooni, H., & Mueen, A. (2016, December). DeBot: Twitter Bot Detection via Warped Correlation. In ICDM(pp. 817-822). Chen, Z., Tanash, R. S., Stoll, R., & Subramanian, D. (2017, September). Hunting Malicious Bots on Twitter: An Unsupervised Approach. In International Conference on Social Informatics (pp. 501-510). Springer, Cham. Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273-274). International World Wide Web Conferences Steering Committee. Eagly, A. H., Wood, W., & Chaiken, S. (1978). Causal inferences about communicators and their effect on opinion change. Journal of Personality and social Psychology, 36(4), 424. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235. Garg, R. (2011, July). Study of text based mining. In Proceedings of the International Conference on Advances in Computing and Artificial Intelligence (pp. 5-8). ACM. Narayan, R., Rout, J. K., & Jena, S. K. (2018). Review spam detection using semi-supervised technique. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications (pp. 281-286). Springer, Singapore. Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011, June). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1 (pp. 309-319). Association for Computational Linguistics. Kohavi, R., & Provost, F. (1998). Glossary of Terms Journal of Machine Learning. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47. Sedhai, S., & Sun, A. (2017). Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 5(1), 169-175. Tsur, O., Davidov, D., & Rappoport, A. (2010, May). ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Fourth International AAAI Conference on Weblogs and Social Media.
描述:	碩士國立政治大學資訊管理學系 106356027
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0106356027
資料類型:	thesis
DOI:	10.6814/NCCU201900451
顯示於類別:	[資訊管理學系] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
602701.pdf	996Kb	Adobe PDF2	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋