Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/136730
|
Title: | 應用詞向量及語意分析探討華語歌曲推薦之研究 The Study of Mandopop Music Recommendation By Word Vectors and Semantic Analysis |
Authors: | 鄧絜心 Teng, Chieh-Hsin |
Contributors: | 鄭宇庭 鄧絜心 Teng, Chieh-Hsin |
Keywords: | 自然語言處理 文本分析 華語歌曲歌詞 TF-IDF Word2vec BERT |
Date: | 2021 |
Issue Date: | 2021-08-04 16:37:50 (UTC+8) |
Abstract: | 音樂,是一種人類生活情感的藝術表現,我們常透過某種意境的旋律或歌詞來代表當下的情緒或是情境的體現。本研究目的為探討華語歌詞內容及意境上的相似程度,進而優化現下最熱門的音樂工具自動推薦歌曲清單。透過將歌詞作為文本轉成向量後計算向量之間的相似程度,並將結果作為推薦歌曲的依據。本研究從KKBOX及魔境歌詞網爬蟲收集共13212首華語歌曲之歌詞,並透過兩種自然語言處理模型-Word2vec及BERT將歌詞轉為向量後,利用餘弦相似度的計算可得兩首不同歌詞之間的相近程度,最後透過焦點團體訪問及問卷調查的方式來驗證實驗之結果。研究結果發現,以使用者主觀意見來看,利用BERT模型所做出來的推薦結果準確率優於Word2vec模型,更貼近使用者之喜好,且BERT之AUC值亦高於Word2vec,說明BERT之效益也高於Word2vec。本研究期許藉由實驗結果能幫助音樂產業企業在推薦歌單之演算法設計上能更正確地符合使用者之需求。 Music is an artistic expression of human emotions. We often use a certain artistic conception of melody or lyrics to represent present emotions or the embodiment of the situation. The purpose of our study is to explore the similarity of different Chinese lyrics content and artistic conception, optimizing the automatically recommended song lists. Our study collects a total of 13,212 lyrics of mandopop songs from KKBOX and Mojing Lyrics by python web crawler. We use two natural language processing models, Word2vec and BERT to convert lyrics into vectors. Then, through cosine similarity we can obtain the similarity of two different lyrics. The results of the experiment were verified through focus group interviews and questionnaire surveys. Based on the result of this study we found that in user’s subjective opinion, the accuracy of the recommendation results by using the BERT model is better than that of the Word2vec model, which is closer to the user’s preferences. The AUC value of BERT is higher than that of Word2vec as well, indicating that the benefits of BERT are also Higher than Word2vec. We hope that the experimental results can help music industry companies to more accurately meet the needs of users in the algorithm design of recommended playlists. |
Reference: | 期刊論文 [1] Sulartopo, S. (2020). The thesis topic similarity test with TF-IDF method. E-Bisnis : Jurnal Ilmiah Ekonomi Dan Bisnis, 13(1), 13-16. [2] Salton, G. and C. Buckley (1988). "Term-weighting approaches in automatic text retrieval." Inf. Process. Manage. 24(5): 513-523. [3] Tomas Mikolov, K. C., Greg Corrado,Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations. [4] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, editors, 11th International Society for Music Information and Retrieval Conference, August 2010. [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv: 1810.04805, 2018. [6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017. [7] Spärck Jones, K. (2004), "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, Vol. 60 No. 5, pp. 493-502. [8] R.T.-W. Lo, B. He, and I. Ounis. "Automatically building a stopword list for an information retrieval system," Proceedings of The 5th Dutch-Belgian Workshop on Information Retrieval(DIR), Utrecht, Dutch, 2005, pp. 3-8. [9] 尹其言, 楊建民. (2010). 應用文件分群與文字探勘技術於機器學習領域趨勢分析以 SSCI 資料庫為例. [10] 溫品竹, 蔡易霖, et al. (2015). 基於 Word2Vec 詞向量的網路情緒文和流行 音樂媒合方法之研究. on Computational Linguistics and Speech Processing ROCLING XXVII (2015), 167.
書籍 [1] 謝邦昌, 鄭宇庭, 謝邦彥, 硬是愛數據應用股份有限公司(2019). 玩轉社群:文字大數據實作.
網際網路 [1] https://pypi.org/project/pywordseg/ [2] https://www.kkbox.com/tw/tc/ [3] https://mojim.com/twznew.htm [4] https://selenium-python.readthedocs.io |
Description: | 碩士 國立政治大學 企業管理研究所(MBA學位學程) 108363073 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0108363073 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202100892 |
Appears in Collections: | [企業管理研究所(MBA學位學程)] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
307301.pdf | | 5393Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|