政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/146906

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 115417/146442 (79%)
造訪人次 : 55304455 線上人數 : 24

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 商學院 > 統計學系 > 學位論文 > Item 140.119/146906

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/146906

題名:	應用象徵性資料分析法於電影推薦系統之研究 The application of symbolic data analysis to movie recommendation systems
作者:	張順益 CHANG, SHUN-YI
貢獻者:	吳漢銘 Wu, Han-Ming 張順益 CHANG, SHUN-YI
關鍵詞:	推薦系統象徵性資料分析法分群演算法遺失值補值 Recommendation System Symbolic Data Analysis Clustering Algorithm Missing Value Imputation
日期:	2023
上傳時間:	2023-09-01 14:57:45 (UTC+8)
摘要:	推薦系統（Recommendation System）如今已廣泛應用於商業行銷，涵蓋範疇包括電影、音樂、新聞、書籍、餐廳、3C 商品以及金融服務等產品的推薦。推薦系統能為用戶提供精確的個性化推薦，從而提高商家的營利。協同過濾算法（collaborative filtering）\\citep{Resnick} 是推薦算法中最常見的一種，其根據用戶對商品的評分進行協同過濾，以便找出合適的產品進行推薦。該演算法的理論基礎在於消費行為相近的用戶應該會偏好類似的商品。然而，協同過濾算法面臨新用戶冷啟動（亦稱新商品問題）和稀疏矩陣等問題。在本研究中，我們針對電影推薦系統，根據用戶群的特徵將其對電影的評分依照電影類型轉換成多值模態象徵性資料（multi-valued modal symbolic data）。此轉換方法考慮到每部電影可能具有多種類型的特點，旨在克服新用戶冷啟動問題並減少缺失值導致的稀疏矩陣問題。我們進行了模擬實驗並分析了實際的電影評分資料，以驗證我們提出的新方法。結果顯示，應用象徵性資料分析法不僅可以提升推薦的效果，更為推薦系統的發展開創了一條新的思考途徑和方法。 Recommendation systems are now widely used in business marketing, spanning various domains such as movies, music, news, books, restaurants, 3C products, and financial services. Collaborative filtering, the most common recommendation algorithm, utilizes user ratings on products to perform collaborative filtering and identify suitable items for recommendations. The theoretical basis of this algorithm is that users with similar consumption behaviors are likely to prefer similar items. However, collaborative filtering algorithms face challenges such as the cold start problem for new users (also known as the new item problem) and the sparsity issue in matrices. In this study, we focus on a movie recommendation system and transform user ratings for movies into multi-valued modal symbolic data based on user group characteristics. This transformation method takes into account the multiple genres or characteristics that a movie may have, aiming to overcome the cold start problem for new users and reduce the sparsity issue caused by missing values in the matrix. We conducted simulation experiments and analyzed real movie rating data to validate the proposed approach. The results showed that the symbolic data analysis method not only improves recommendation effectiveness but also provides a new approach and method for the development of recommendation systems.
參考文獻:	Abdollahi, B. and Nasraoui, O. (2016). Explainable matrix factorization for collaborative filtering. In Proceedings of the 25th International Conference Companion on World Wide Web, pages 5–6. Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487–499. Santiago, Chile. Ahuja, R., Solanki, A., and Nayyar, A. (2019). Movie recommender system using k-means clustering and k-nearest neighbor. In 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pages 263–268. IEEE. Basu, C., Hirsh, H., Cohen, W., et al. (1998). Recommendation as classification: Using social and content-based information in recommendation. In Aaai/iaai, pages 714–720. Bi, X., Qu, A., and Shen, X. (2018). Multilayer tensor factorization with applications to recommender systems. The Annals of Statistics, 46(6B):3308–3333. Bi, X., Qu, A., Wang, J., and Shen, X. (2017). A group-specific recommender system. Journal of the American Statistical Association, 112(519):1344–1353. Billard, L. and Diday, E. (2002). Symbolic regression analysis. In Classification, clustering, and data analysis: recent advances and applications, pages 281–288. Springer. Billard, L. and Diday, E. (2003). From the statistics of data to the statistics of knowledge: symbolic data analysis. Journal of the American Statistical Association, 98(462):470–487. Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley Series in Computational Statistics. Wiley. Brito, P. (2003). Hierarchical and pyramidal clustering for symbolic data. Journal of the Japanese Society of Computational Statistics, 15:231–244. Cai, Q. and Tan, W. (2022). Box Office Forecast Model Based on Random Forest and BP Neural Network, page 69–75. Association for Computing Machinery, New York, NY, USA. de Carvalho, F. d. A. (2007). Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognition Letters, 28(4):423–437. Deng, F., Ren, P., Qin, Z., Huang, G., and Qin, Z. (2018). Leveraging image visual features in content-based recommender system. Scientific Programming, 2018. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. Domingues, M. A., de Souza, R. M., and Cysneiros, F. J. A. (2010). A robust method for linear regression of symbolic interval data. Pattern Recognition Letters, 31(13):1991–1996. Dutta, S. and Dasgupta, K. (2021). A shallow approach to gradient boosting (xgboosts) for prediction of the box office revenue of a movie. In Mandal, J. K., Mukhopadhyay, S., Unal, A., and Sen, S. K., editors, Proceedings of International Conference on Innovations in Software Architecture and Computational Systems, pages 207–219, Singapore. Springer Singapore. Feng, K. and Liu, X. (2020). Adaptive attention with consumer sentinel for movie box office prediction. Complexity, 2020:1–9. Gandhi, U. D., Malarvizhi Kumar, P., Chandra Babu, G., and Karthick, G. (2021). Sentiment analysis on twitter data by using convolutional neural network (cnn) and long short term memory (lstm). Wireless Personal Communications, pages 1–10. Guo, X., Lin, W., Li, Y., Liu, Z., Yang, L., Zhao, S., and Zhu, Z. (2020). Dken: Deep knowledge-enhanced network for recommender systems. Information Sciences, 540:263–277. Gupta, B., Prakasam, P., and Velmurugan, T. (2022). Integrated bert embeddings, bilstm-bigru and 1-d cnn model for binary sentiment classification analysis of movie reviews. Multimedia Tools and Applications, 81(23):33067–33086. Gupta, C., Chawla, G., Rawlley, K., Bisht, K., and Sharma, M. (2021). Senti_alstm: Sentiment analysis of movie reviews using attention-based-lstm. In Abraham, A., Castillo, O., and Virmani, D., editors, Proceedings of 3rd International Conference on Computing Informatics and Networks, pages 211– 219, Singapore. Springer Singapore. Hoyt, E., Ponto, K., and Roy, C. (2014). Visualizing and analyzing the hollywood screenplay with scripthreads. DHQ: Digital Humanities Quarterly, 8(4). Irpino, A. and Verde, R. (2006). A new wasserstein based distance for the hierarchical clustering of histogram symbolic data. In Data science and classification, pages 185–192. Springer. Irpino, A. and Verde, R. (2015). Basic statistics for distributional symbolic variables: a new metric-based approach. Advances in Data Analysis and Classification, 9:143–175. Irpino, A., Verde, R., et al. (2013). Dimension reduction techniques for distributional symbolic data. In Advances in Latent Variables, pages 1–8. Vita e Pensiero. Iwata, T., Yamada, T., and Ueda, N. (2008). Probabilistic latent semantic visualization: Topic model for visualizing documents. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, page 363–371, New York, NY, USA. Association for Computing Machinery. Johnstone, D. J., Barnard, G. A., and Lindley, D. V. (1986). Tests of significance in theory and practice. Journal of the Royal Statistical Society. Series D (The Statistician), 35(5):491–504. Kandel, S., Parikh, R., Paepcke, A., Hellerstein, J. M., and Heer, J. (2012). Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces, AVI ’12, page 547–554, New York, NY, USA. Association for Computing Machinery. Kang, D. (2021). Box-office forecasting in korea using search trend data: a modified generalized bass diffusion model. Electronic Commerce Research, 21(1): 41–72. Khan, F. H., Qamar, U., and Bashir, S. (2016a). Multi-objective model selection (moms)-based semi-supervised framework for sentiment analysis. Cognitive Computation, 8:614–628. Khan, F. H., Qamar, U., and Bashir, S. (2016b). Swims: Semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowledge-Based Systems, 100:97–111. Khan, F. H., Qamar, U., and Bashir, S. (2017). A semi-supervised approach to sentiment analysis using revised sentiment strength based on sentiwordnet. Knowledge and information Systems, 51:851–872. Kim, J.-M., Xia, L., Kim, I., Lee, S., and Lee, K.-H. (2020). Finding nemo: Predicting movie performances by machine learning methods. Journal of Risk and Financial Management, 13(5). Korovkinas, K., Danėnas, P., and Garšva, G. (2017). Svm and naïve bayes classification ensemble method for sentiment analysis. Baltic journal of modern computing, 5(4):398–409. Lauro, C. N. and Palumbo, F. (2000). Principal component analysis of interval data: a symbolic data analysis approach. Computational statistics, 15:73–87. Li, A., Yang, B., Huo, H., and Hussain, F. K. (2021). Leveraging implicit relations for recommender systems. Information Sciences, 579:55–71. Li, F., Wang, S., Liu, S., and Zhang, M. (2014). Suit: A supervised user-item based topic model for sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). MacQueen, J. et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA. Mangolin, R. B., Pereira, R. M., Britto Jr, A. S., Silla Jr, C. N., Feltrim, V. D., Bertolini, D., and Costa, Y. M. (2022). A multimodal approach for multi-label movie genre classification. Multimedia Tools and Applications, 81(14):19071–19096. Maulana, R., Rahayuningsih, P. A., Irmayani, W., Saputra, D., and Jayanti, W. E. (2020). Improved accuracy of sentiment analysis movie review using support vector machine based information gain. Journal of Physics: Conference Series, 1641(1):012060. Mutinda, J., Mwangi, W., and Okeyo, G. (2023). Sentiment analysis of text reviews using lexicon-enhanced bert embedding (lebert) model with convolutional neural network. Applied Sciences, 13:1445. Ni, Y., Dong, F., Zou, M., and Li, W. (2022). Movie box office prediction based on multi-model ensembles. Information, 13(6). Nilashi, M., Ibrahim, O., and Bagherifard, K. (2018). A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Systems with Applications, 92:507–520. Pouransari, H. and Ghili, S. (2014). Deep learning for sentiment analysis of movie reviews. CS224N Proj, pages 1–8. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW ’94, page 175–186, New York, NY, USA. Association for Computing Machinery. Samsir, S., Kusmanto, K., Dalimunthe, A. H., Aditiya, R., and Watrianthos, R. (2022). Implementation naïve bayes classification for sentiment analysis on internet movie database. Building of Informatics, Technology and Science (BITS), 4(1):1–6. Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. Tahmasebi, H., Ravanmehr, R., and Mohamadrezaei, R. (2021). Social movie recommender system based on deep autoencoder network using twitter data. Neural Computing and Applications, 33. Valdiviezo-Díaz, P. and Bobadilla, J. (2019). A hybrid approach of recommendation via extended matrix based on collaborative filtering with demographics information. In Technology Trends: 4th International Conference, CITT 2018, Babahoyo, Ecuador, August 29–31, 2018, Revised Selected Papers 4, pages 384–398. Springer. Vilakone, P., Park, D.-S., Xinchang, K., and Hao, F. (2018). An efficient movie recommendation algorithm based on improved k-clique. Human-centric Computing and Information Sciences, 8(1):1–15. Vozalis, M. G. and Margaritis, K. G. (2006). Applying svd on generalized itembased filtering. Int. J. Comput. Sci. Appl., 3(3):27–51. Wang, D. (2022). Research on sentiment analysis of movie reviews based on mlp model. World Scientific Research Journal, 8(10):81–85. Wang, X., Wei, F., Liu, X., Zhou, M., and Zhang, M. (2011). Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, page 1031–1040, New York, NY, USA. Association for Computing Machinery. Widiyaningtyas, T., Hidayah, I., and Adji, T. B. (2021). User profile correlationbased similarity (upcsim) algorithm in movie recommendation system. Journal of Big Data, 8:1–21. Xu, M., Wei, D., Zhu, T., and Zhang, Y. (2020). Box-office revenue predictions based on xgboost and sentiment analysis. World Scientific Research Journal, 6(11):46–56. Yang, C., Chen, X., Liu, L., Liu, T., and Geng, S. (2018). A hybrid movie recommendation method based on social similarity and item attributes. In Advances in Swarm Intelligence: 9th International Conference, ICSI 2018, Shanghai, China, June 17-22, 2018, Proceedings, Part II 9, pages 275–285. Springer. Zhang, N. (2021). Design of movie data visualization system based on web crawler. Journal of Physics: Conference Series, 1971(1):012029.
描述:	碩士國立政治大學統計學系 110354026
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0110354026
資料類型:	thesis
顯示於類別:	[統計學系] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
402601.pdf	29225Kb	Adobe PDF2	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋