Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/146906
|
Title: | 應用象徵性資料分析法於電影推薦系統之研究 The application of symbolic data analysis to movie recommendation systems |
Authors: | 張順益 CHANG, SHUN-YI |
Contributors: | 吳漢銘 Wu, Han-Ming 張順益 CHANG, SHUN-YI |
Keywords: | 推薦系統 象徵性資料分析法 分群演算法 遺失值補值 Recommendation System Symbolic Data Analysis Clustering Algorithm Missing Value Imputation |
Date: | 2023 |
Issue Date: | 2023-09-01 14:57:45 (UTC+8) |
Abstract: | 推薦系統(Recommendation System)如今已廣泛應用於商業行銷,涵蓋範疇包括電影、音樂、新聞、書籍、餐廳、3C 商品以及金融服務等產品的推薦。推薦系統能為用戶提供精確的個性化推薦,從而提高商家的營利。協同過濾算法(collaborative filtering)\\citep{Resnick} 是推薦算法中最常見的一種,其根據用戶對商品的評分進行協同過濾,以便找出合適的產品進行推薦。該演算法的理論基礎在於消費行為相近的用戶應該會偏好類似的商品。然而,協同過濾算法面臨新用戶冷啟動(亦稱新商品問題)和稀疏矩陣等問題。在本研究中,我們針對電影推薦系統,根據用戶群的特徵將其對電影的評分依照電影類型轉換成多值模態象徵性資料(multi-valued modal symbolic data)。此轉換方法考慮到每部電影可能具有多種類型的特點,旨在克服新用戶冷啟動問題並減少缺失值導致的稀疏矩陣問題。我們進行了模擬實驗並分析了實際的電影評分資料,以驗證我們提出的新方法。結果顯示,應用象徵性資料分析法不僅可以提升推薦的效果,更為推薦系統的發展開創了一條新的思考途徑和方法。 Recommendation systems are now widely used in business marketing, spanning various domains such as movies, music, news, books, restaurants, 3C products, and financial services. Collaborative filtering, the most common recommendation algorithm, utilizes user ratings on products to perform collaborative filtering and identify suitable items for recommendations. The theoretical basis of this algorithm is that users with similar consumption behaviors are likely to prefer similar items. However, collaborative filtering algorithms face challenges such as the cold start problem for new users (also known as the new item problem) and the sparsity issue in matrices. In this study, we focus on a movie recommendation system and transform user ratings for movies into multi-valued modal symbolic data based on user group characteristics. This transformation method takes into account the multiple genres or characteristics that a movie may have, aiming to overcome the cold start problem for new users and reduce the sparsity issue caused by missing values in the matrix. We conducted simulation experiments and analyzed real movie rating data to validate the proposed approach. The results showed that the symbolic data analysis method not only improves recommendation effectiveness but also provides a new approach and method for the development of recommendation systems. |
Reference: | Abdollahi, B. and Nasraoui, O. (2016). Explainable matrix factorization for collaborative filtering. In Proceedings of the 25th International Conference Companion on World Wide Web, pages 5–6. Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487–499. Santiago, Chile. Ahuja, R., Solanki, A., and Nayyar, A. (2019). Movie recommender system using k-means clustering and k-nearest neighbor. In 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pages 263–268. IEEE. Basu, C., Hirsh, H., Cohen, W., et al. (1998). Recommendation as classification: Using social and content-based information in recommendation. In Aaai/iaai, pages 714–720. Bi, X., Qu, A., and Shen, X. (2018). Multilayer tensor factorization with applications to recommender systems. The Annals of Statistics, 46(6B):3308–3333. Bi, X., Qu, A., Wang, J., and Shen, X. (2017). A group-specific recommender system. Journal of the American Statistical Association, 112(519):1344–1353. Billard, L. and Diday, E. (2002). Symbolic regression analysis. In Classification, clustering, and data analysis: recent advances and applications, pages 281–288. Springer. Billard, L. and Diday, E. (2003). From the statistics of data to the statistics of knowledge: symbolic data analysis. Journal of the American Statistical Association, 98(462):470–487. Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley Series in Computational Statistics. Wiley. Brito, P. (2003). Hierarchical and pyramidal clustering for symbolic data. Journal of the Japanese Society of Computational Statistics, 15:231–244. Cai, Q. and Tan, W. (2022). Box Office Forecast Model Based on Random Forest and BP Neural Network, page 69–75. Association for Computing Machinery, New York, NY, USA. de Carvalho, F. d. A. (2007). Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognition Letters, 28(4):423–437. Deng, F., Ren, P., Qin, Z., Huang, G., and Qin, Z. (2018). Leveraging image visual features in content-based recommender system. Scientific Programming, 2018. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. Domingues, M. A., de Souza, R. M., and Cysneiros, F. J. A. (2010). A robust method for linear regression of symbolic interval data. Pattern Recognition Letters, 31(13):1991–1996. Dutta, S. and Dasgupta, K. (2021). A shallow approach to gradient boosting (xgboosts) for prediction of the box office revenue of a movie. In Mandal, J. K., Mukhopadhyay, S., Unal, A., and Sen, S. K., editors, Proceedings of International Conference on Innovations in Software Architecture and Computational Systems, pages 207–219, Singapore. Springer Singapore. Feng, K. and Liu, X. (2020). Adaptive attention with consumer sentinel for movie box office prediction. Complexity, 2020:1–9. Gandhi, U. D., Malarvizhi Kumar, P., Chandra Babu, G., and Karthick, G. (2021). Sentiment analysis on twitter data by using convolutional neural network (cnn) and long short term memory (lstm). Wireless Personal Communications, pages 1–10. Guo, X., Lin, W., Li, Y., Liu, Z., Yang, L., Zhao, S., and Zhu, Z. (2020). Dken: Deep knowledge-enhanced network for recommender systems. Information Sciences, 540:263–277. Gupta, B., Prakasam, P., and Velmurugan, T. (2022). Integrated bert embeddings, bilstm-bigru and 1-d cnn model for binary sentiment classification analysis of movie reviews. Multimedia Tools and Applications, 81(23):33067–33086. Gupta, C., Chawla, G., Rawlley, K., Bisht, K., and Sharma, M. (2021). Senti_alstm: Sentiment analysis of movie reviews using attention-based-lstm. In Abraham, A., Castillo, O., and Virmani, D., editors, Proceedings of 3rd International Conference on Computing Informatics and Networks, pages 211– 219, Singapore. Springer Singapore. Hoyt, E., Ponto, K., and Roy, C. (2014). Visualizing and analyzing the hollywood screenplay with scripthreads. DHQ: Digital Humanities Quarterly, 8(4). Irpino, A. and Verde, R. (2006). A new wasserstein based distance for the hierarchical clustering of histogram symbolic data. In Data science and classification, pages 185–192. Springer. Irpino, A. and Verde, R. (2015). Basic statistics for distributional symbolic variables: a new metric-based approach. Advances in Data Analysis and Classification, 9:143–175. Irpino, A., Verde, R., et al. (2013). Dimension reduction techniques for distributional symbolic data. In Advances in Latent Variables, pages 1–8. Vita e Pensiero. Iwata, T., Yamada, T., and Ueda, N. (2008). Probabilistic latent semantic visualization: Topic model for visualizing documents. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, page 363–371, New York, NY, USA. Association for Computing Machinery. Johnstone, D. J., Barnard, G. A., and Lindley, D. V. (1986). Tests of significance in theory and practice. Journal of the Royal Statistical Society. Series D (The Statistician), 35(5):491–504. Kandel, S., Parikh, R., Paepcke, A., Hellerstein, J. M., and Heer, J. (2012). Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces, AVI ’12, page 547–554, New York, NY, USA. Association for Computing Machinery. Kang, D. (2021). Box-office forecasting in korea using search trend data: a modified generalized bass diffusion model. Electronic Commerce Research, 21(1): 41–72. Khan, F. H., Qamar, U., and Bashir, S. (2016a). Multi-objective model selection (moms)-based semi-supervised framework for sentiment analysis. Cognitive Computation, 8:614–628. Khan, F. H., Qamar, U., and Bashir, S. (2016b). Swims: Semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowledge-Based Systems, 100:97–111. Khan, F. H., Qamar, U., and Bashir, S. (2017). A semi-supervised approach to sentiment analysis using revised sentiment strength based on sentiwordnet. Knowledge and information Systems, 51:851–872. Kim, J.-M., Xia, L., Kim, I., Lee, S., and Lee, K.-H. (2020). Finding nemo: Predicting movie performances by machine learning methods. Journal of Risk and Financial Management, 13(5). Korovkinas, K., Danėnas, P., and Garšva, G. (2017). Svm and naïve bayes classification ensemble method for sentiment analysis. Baltic journal of modern computing, 5(4):398–409. Lauro, C. N. and Palumbo, F. (2000). Principal component analysis of interval data: a symbolic data analysis approach. Computational statistics, 15:73–87. Li, A., Yang, B., Huo, H., and Hussain, F. K. (2021). Leveraging implicit relations for recommender systems. Information Sciences, 579:55–71. Li, F., Wang, S., Liu, S., and Zhang, M. (2014). Suit: A supervised user-item based topic model for sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). MacQueen, J. et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA. Mangolin, R. B., Pereira, R. M., Britto Jr, A. S., Silla Jr, C. N., Feltrim, V. D., Bertolini, D., and Costa, Y. M. (2022). A multimodal approach for multi-label movie genre classification. Multimedia Tools and Applications, 81(14):19071–19096. Maulana, R., Rahayuningsih, P. A., Irmayani, W., Saputra, D., and Jayanti, W. E. (2020). Improved accuracy of sentiment analysis movie review using support vector machine based information gain. Journal of Physics: Conference Series, 1641(1):012060. Mutinda, J., Mwangi, W., and Okeyo, G. (2023). Sentiment analysis of text reviews using lexicon-enhanced bert embedding (lebert) model with convolutional neural network. Applied Sciences, 13:1445. Ni, Y., Dong, F., Zou, M., and Li, W. (2022). Movie box office prediction based on multi-model ensembles. Information, 13(6). Nilashi, M., Ibrahim, O., and Bagherifard, K. (2018). A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Systems with Applications, 92:507–520. Pouransari, H. and Ghili, S. (2014). Deep learning for sentiment analysis of movie reviews. CS224N Proj, pages 1–8. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW ’94, page 175–186, New York, NY, USA. Association for Computing Machinery. Samsir, S., Kusmanto, K., Dalimunthe, A. H., Aditiya, R., and Watrianthos, R. (2022). Implementation naïve bayes classification for sentiment analysis on internet movie database. Building of Informatics, Technology and Science (BITS), 4(1):1–6. Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. Tahmasebi, H., Ravanmehr, R., and Mohamadrezaei, R. (2021). Social movie recommender system based on deep autoencoder network using twitter data. Neural Computing and Applications, 33. Valdiviezo-Díaz, P. and Bobadilla, J. (2019). A hybrid approach of recommendation via extended matrix based on collaborative filtering with demographics information. In Technology Trends: 4th International Conference, CITT 2018, Babahoyo, Ecuador, August 29–31, 2018, Revised Selected Papers 4, pages 384–398. Springer. Vilakone, P., Park, D.-S., Xinchang, K., and Hao, F. (2018). An efficient movie recommendation algorithm based on improved k-clique. Human-centric Computing and Information Sciences, 8(1):1–15. Vozalis, M. G. and Margaritis, K. G. (2006). Applying svd on generalized itembased filtering. Int. J. Comput. Sci. Appl., 3(3):27–51. Wang, D. (2022). Research on sentiment analysis of movie reviews based on mlp model. World Scientific Research Journal, 8(10):81–85. Wang, X., Wei, F., Liu, X., Zhou, M., and Zhang, M. (2011). Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, page 1031–1040, New York, NY, USA. Association for Computing Machinery. Widiyaningtyas, T., Hidayah, I., and Adji, T. B. (2021). User profile correlationbased similarity (upcsim) algorithm in movie recommendation system. Journal of Big Data, 8:1–21. Xu, M., Wei, D., Zhu, T., and Zhang, Y. (2020). Box-office revenue predictions based on xgboost and sentiment analysis. World Scientific Research Journal, 6(11):46–56. Yang, C., Chen, X., Liu, L., Liu, T., and Geng, S. (2018). A hybrid movie recommendation method based on social similarity and item attributes. In Advances in Swarm Intelligence: 9th International Conference, ICSI 2018, Shanghai, China, June 17-22, 2018, Proceedings, Part II 9, pages 275–285. Springer. Zhang, N. (2021). Design of movie data visualization system based on web crawler. Journal of Physics: Conference Series, 1971(1):012029. |
Description: | 碩士 國立政治大學 統計學系 110354026 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0110354026 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
402601.pdf | 29225Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|