政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/136730
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113318/144297 (79%)
造訪人次 : 51068408      線上人數 : 930
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/136730


    題名: 應用詞向量及語意分析探討華語歌曲推薦之研究
    The Study of Mandopop Music Recommendation By Word Vectors and Semantic Analysis
    作者: 鄧絜心
    Teng, Chieh-Hsin
    貢獻者: 鄭宇庭
    鄧絜心
    Teng, Chieh-Hsin
    關鍵詞: 自然語言處理
    文本分析
    華語歌曲歌詞
    TF-IDF
    Word2vec
    BERT
    日期: 2021
    上傳時間: 2021-08-04 16:37:50 (UTC+8)
    摘要: 音樂,是一種人類生活情感的藝術表現,我們常透過某種意境的旋律或歌詞來代表當下的情緒或是情境的體現。本研究目的為探討華語歌詞內容及意境上的相似程度,進而優化現下最熱門的音樂工具自動推薦歌曲清單。透過將歌詞作為文本轉成向量後計算向量之間的相似程度,並將結果作為推薦歌曲的依據。本研究從KKBOX及魔境歌詞網爬蟲收集共13212首華語歌曲之歌詞,並透過兩種自然語言處理模型-Word2vec及BERT將歌詞轉為向量後,利用餘弦相似度的計算可得兩首不同歌詞之間的相近程度,最後透過焦點團體訪問及問卷調查的方式來驗證實驗之結果。研究結果發現,以使用者主觀意見來看,利用BERT模型所做出來的推薦結果準確率優於Word2vec模型,更貼近使用者之喜好,且BERT之AUC值亦高於Word2vec,說明BERT之效益也高於Word2vec。本研究期許藉由實驗結果能幫助音樂產業企業在推薦歌單之演算法設計上能更正確地符合使用者之需求。
    Music is an artistic expression of human emotions. We often use a certain artistic conception of melody or lyrics to represent present emotions or the embodiment of the situation. The purpose of our study is to explore the similarity of different Chinese lyrics content and artistic conception, optimizing the automatically recommended song lists. Our study collects a total of 13,212 lyrics of mandopop songs from KKBOX and Mojing Lyrics by python web crawler. We use two natural language processing models, Word2vec and BERT to convert lyrics into vectors. Then, through cosine similarity we can obtain the similarity of two different lyrics. The results of the experiment were verified through focus group interviews and questionnaire surveys. Based on the result of this study we found that in user’s subjective opinion, the accuracy of the recommendation results by using the BERT model is better than that of the Word2vec model, which is closer to the user’s preferences. The AUC value of BERT is higher than that of Word2vec as well, indicating that the benefits of BERT are also Higher than Word2vec. We hope that the experimental results can help music industry companies to more accurately meet the needs of users in the algorithm design of recommended playlists.
    參考文獻: 期刊論文
    [1] Sulartopo, S. (2020). The thesis topic similarity test with TF-IDF method. E-Bisnis : Jurnal Ilmiah Ekonomi Dan Bisnis, 13(1), 13-16.
    [2] Salton, G. and C. Buckley (1988). "Term-weighting approaches in automatic text retrieval." Inf. Process. Manage. 24(5): 513-523.
    [3] Tomas Mikolov, K. C., Greg Corrado,Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations.
    [4] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, editors, 11th International Society for Music Information and Retrieval Conference, August 2010.
    [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv: 1810.04805, 2018.
    [6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
    [7] Spärck Jones, K. (2004), "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, Vol. 60 No. 5, pp. 493-502.
    [8] R.T.-W. Lo, B. He, and I. Ounis. "Automatically building a stopword list for an information retrieval system," Proceedings of The 5th Dutch-Belgian Workshop on Information Retrieval(DIR), Utrecht, Dutch, 2005, pp. 3-8.
    [9] 尹其言, 楊建民. (2010). 應用文件分群與文字探勘技術於機器學習領域趨勢分析以 SSCI 資料庫為例.
    [10] 溫品竹, 蔡易霖, et al. (2015). 基於 Word2Vec 詞向量的網路情緒文和流行 音樂媒合方法之研究. on Computational Linguistics and Speech Processing ROCLING XXVII (2015), 167.

    書籍
    [1] 謝邦昌, 鄭宇庭, 謝邦彥, 硬是愛數據應用股份有限公司(2019). 玩轉社群:文字大數據實作.

    網際網路
    [1] https://pypi.org/project/pywordseg/
    [2] https://www.kkbox.com/tw/tc/
    [3] https://mojim.com/twznew.htm
    [4] https://selenium-python.readthedocs.io
    描述: 碩士
    國立政治大學
    企業管理研究所(MBA學位學程)
    108363073
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0108363073
    資料類型: thesis
    DOI: 10.6814/NCCU202100892
    顯示於類別:[企業管理研究所(MBA學位學程)] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    307301.pdf5393KbAdobe PDF20檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋