政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/136730

政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/136730

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 113311/144292 (79%)
Visitors : 50933165 Online Users : 992

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大典藏 > College of Commerce > MBA Program > Theses > Item 140.119/136730

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/136730

Title:	應用詞向量及語意分析探討華語歌曲推薦之研究 The Study of Mandopop Music Recommendation By Word Vectors and Semantic Analysis
Authors:	鄧絜心 Teng, Chieh-Hsin
Contributors:	鄭宇庭鄧絜心 Teng, Chieh-Hsin
Keywords:	自然語言處理文本分析華語歌曲歌詞 TF-IDF Word2vec BERT
Date:	2021
Issue Date:	2021-08-04 16:37:50 (UTC+8)
Abstract:	音樂，是一種人類生活情感的藝術表現，我們常透過某種意境的旋律或歌詞來代表當下的情緒或是情境的體現。本研究目的為探討華語歌詞內容及意境上的相似程度，進而優化現下最熱門的音樂工具自動推薦歌曲清單。透過將歌詞作為文本轉成向量後計算向量之間的相似程度，並將結果作為推薦歌曲的依據。本研究從KKBOX及魔境歌詞網爬蟲收集共13212首華語歌曲之歌詞，並透過兩種自然語言處理模型－Word2vec及BERT將歌詞轉為向量後，利用餘弦相似度的計算可得兩首不同歌詞之間的相近程度，最後透過焦點團體訪問及問卷調查的方式來驗證實驗之結果。研究結果發現，以使用者主觀意見來看，利用BERT模型所做出來的推薦結果準確率優於Word2vec模型，更貼近使用者之喜好，且BERT之AUC值亦高於Word2vec，說明BERT之效益也高於Word2vec。本研究期許藉由實驗結果能幫助音樂產業企業在推薦歌單之演算法設計上能更正確地符合使用者之需求。 Music is an artistic expression of human emotions. We often use a certain artistic conception of melody or lyrics to represent present emotions or the embodiment of the situation. The purpose of our study is to explore the similarity of different Chinese lyrics content and artistic conception, optimizing the automatically recommended song lists. Our study collects a total of 13,212 lyrics of mandopop songs from KKBOX and Mojing Lyrics by python web crawler. We use two natural language processing models, Word2vec and BERT to convert lyrics into vectors. Then, through cosine similarity we can obtain the similarity of two different lyrics. The results of the experiment were verified through focus group interviews and questionnaire surveys. Based on the result of this study we found that in user’s subjective opinion, the accuracy of the recommendation results by using the BERT model is better than that of the Word2vec model, which is closer to the user’s preferences. The AUC value of BERT is higher than that of Word2vec as well, indicating that the benefits of BERT are also Higher than Word2vec. We hope that the experimental results can help music industry companies to more accurately meet the needs of users in the algorithm design of recommended playlists.
Reference:	期刊論文 [1] Sulartopo, S. (2020). The thesis topic similarity test with TF-IDF method. E-Bisnis : Jurnal Ilmiah Ekonomi Dan Bisnis, 13(1), 13-16. [2] Salton, G. and C. Buckley (1988). "Term-weighting approaches in automatic text retrieval." Inf. Process. Manage. 24(5): 513-523. [3] Tomas Mikolov, K. C., Greg Corrado,Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations. [4] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, editors, 11th International Society for Music Information and Retrieval Conference, August 2010. [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv: 1810.04805, 2018. [6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017. [7] Spärck Jones, K. (2004), "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, Vol. 60 No. 5, pp. 493-502. [8] R.T.-W. Lo, B. He, and I. Ounis. "Automatically building a stopword list for an information retrieval system," Proceedings of The 5th Dutch-Belgian Workshop on Information Retrieval(DIR), Utrecht, Dutch, 2005, pp. 3-8. [9] 尹其言, 楊建民. (2010). 應用文件分群與文字探勘技術於機器學習領域趨勢分析以 SSCI 資料庫為例. [10] 溫品竹, 蔡易霖, et al. (2015). 基於 Word2Vec 詞向量的網路情緒文和流行音樂媒合方法之研究. on Computational Linguistics and Speech Processing ROCLING XXVII (2015), 167. 書籍 [1] 謝邦昌, 鄭宇庭, 謝邦彥, 硬是愛數據應用股份有限公司(2019). 玩轉社群：文字大數據實作. 網際網路 [1] https://pypi.org/project/pywordseg/ [2] https://www.kkbox.com/tw/tc/ [3] https://mojim.com/twznew.htm [4] https://selenium-python.readthedocs.io
Description:	碩士國立政治大學企業管理研究所(MBA學位學程) 108363073
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0108363073
Data Type:	thesis
DOI:	10.6814/NCCU202100892
Appears in Collections:	[MBA Program] Theses

Files in This Item:

File	Description	Size	Format
307301.pdf		5393Kb	Adobe PDF2	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback