政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153378
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113392/144379 (79%)
造訪人次 : 51228625      線上人數 : 902
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/153378
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/153378


    題名: 基於深度學習以情感辭典增強情緒分析
    Emotion Analysis Enhanced with Sentiment Lexicons Based on Deep Learning
    作者: 張禎尹
    貢獻者: 邱淑怡
    Chiu, Shu-I
    張禎尹
    關鍵詞: 情緒分析
    多標籤資料不平衡
    相似詞替換
    NRC情感辭典
    雙向長短期記憶網絡(BiLSTM)
    NRC情感辭典 (EmoLex)
    掩碼語言模型(MLM)
    emotion analysis
    EmoLex
    multi-label data imbalance
    synonym replacement
    Bidirectional Long Short-Term Memory (BiL STM)
    Masked Language Model (MLM)
    日期: 2024
    上傳時間: 2024-09-04 14:59:43 (UTC+8)
    摘要: 本研究結合了雙向長短期記憶網絡(BiLSTM)和NRC情感辭典(EmoLex),名為EmoBiLSTM,旨在提高台灣社交媒體文本的情緒識別準確性。隨著COVID-19 疫情的全球蔓延,人們的生活和心理健康受到了顯著影響,及時準確地掌握公眾的情感變化對於公共衛生政策的制定具有重要意義。情緒分析在疫情期間的重要性尤為突出,能夠幫助政府及時了解公眾的情緒狀態,並針對性地採取措施。然而,現有的情緒分析技術在準確性和適應性方面仍存在不足,特別是在面對多標籤資料不平衡問題時。通過結合深度學習技術和情感辭典,提升情緒分析的準確性和適應性。為了解決多標籤資料不平衡問題,採用了相似詞替換和掩碼語言模型(MLM)進行資料擴增。相似詞替換通替換句子中的部分詞彙來生成新的訓練樣本,增加少數類別的數據量;MLM 通過預測句子中被隨機掩碼的單詞進行訓練,學習詞語的語境和句子結構,提升文本生成和擴增的效果。模型結合了BiLSTM和CNN 兩種技術。CNN 用於提取文本的局部特徵,BiLSTM 則負責捕捉文本的全局上下文信息。為了進一步增強模型的情感識別能力,模型引入了NRC 情感辭典(EmoLex)。這一辭典提供了豐富的情感詞彙,能夠幫助模型更準確地識別和處理文本中的情感信息。模型參數經過調整以優化性能,使用訓練數據集進行訓練。訓練過程中,採用準確率、召回率和F1-score 等性能指標對模型進行評估。結果顯示,相似詞換搭配EmoLex 和BiLSTM 模型在各項指標上均表現優異,特別是在處理多標籤資料不平衡問題時,顯示出了顯著的優勢。實驗結果表明,在處理台灣社交媒體文本的情緒識別任務中,具有較高的準確性和穩定性。這表明,結合深度學習技術與情感辭典的情緒分析方法,在處理多標籤資料不平衡問題方面,具有顯著的效果。
    This study integrates Bidirectional Long Short-Term Memory (BiLSTM) networks and the NRC Emotion Lexicon (EmoLex) to enhance the accuracy of emotion recognition in Taiwanese social media texts during the COVID- 19 pandemic. The model, named EmoBiLSTM, aims to provide timely and accurate insights into public emotional changes, which is crucial for public health policy formulation. To address multi-label data imbalance, the study employs synonym replacement and Masked Language Model (MLM) for data augmentation. Synonym replacement generates new training samples by substituting words in sentences, increasing the data volume of minority classes.MLM predicts randomly masked words in sentences, enhancing text generation and augmentation. The model combines CNN and BiLSTM techniques,with CNN extracting local text features and BiLSTM capturing global contextual information. Introducing the NRC Emotion Lexicon (EmoLex) further enhances the model’s ability to identify and process emotional information. Performance metrics such as accuracy, recall, and F1-score are used to evaluate the model. Results show that synonym replacement combined with EmoLex and BiLSTM models performs excellently, particularly in handling multi-label data imbalance issues. This demonstrates the effectiveness of combining deep learning techniques with emotion lexicons for emotion analysis in social media texts.
    參考文獻: [1] Zi-xian Liu, De-gan Zhang, Gu-zhao Luo, Ming Lian, and Bing Liu. A new method of emotional analysis based on cnn–bilstm hybrid neural network. Cluster Computing, 23:2901–2913, 2020.
    [2] Cuiyan Wang, Riyu Pan, Xiaoyang Wan, Yilin Tan, Linkang Xu, Roger S McIntyre, Faith N Choo, Bach Tran, Roger Ho, Vijay K Sharma, et al. A longitudinal study on the mental health of general population during the covid-19 epidemic in china. Brain, behavior, and immunity, 87:40–48, 2020.
    [3] Tian-Ru Huang. Did covid-19 form an unexpected shield? post-pandemic suicide deaths surge to a 14-year high: ”so many more people” in two groups. The Storm Media, 2024.
    [4] Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications, 203:117215, 2022.
    [5] Alex Graves and Alex Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012.
    [6] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997.
    [7] Christos Pavlatos, Evangelos Makris, Georgios Fotis, Vasiliki Vita, and Valeri Mladenov. Enhancing electrical load prediction using a bidirectional lstm neural network. Electronics, 12(22):4652, 2023.
    [8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    [9] Yoon Kim. Convolutional neural networks for sentence classification, 2014.
    [10] Kai Zhou and Fei Long. Sentiment analysis of text based on cnn and bi-directional lstm model. pages 1–5, 2018.
    [11] Saif M Mohammad and Peter D Turney. Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3):436–465, 2013.
    [12] Qihuang Zhang, Grace Y. Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. PLOS ONE, 18(2):e0277878, February 2023. ISSN 1932-6203. doi: 10.1371/journal.pone. 0277878.
    [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [14] Ritesh Kumar. Augment your small dataset using transformers: Synonym replacement for sentiment analysis part 1. Towards Data Science, 2020.
    [15] Qihuang Zhang, Grace Y Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. Plos one, 18 (2):e0277878, 2023.
    [16] Hua Qian and Craig R Scott. Anonymity and self-disclosure on weblogs. Journal of Computer-Mediated Communication, 12(4):1428–1451, 2007.
    [17] Marcus Müller, Sabine Bartsch, and Jens O Zinn. Communicating the unknown: An interdisciplinary annotation study of uncertainty in the coronavirus pandemic. International Journal of Corpus Linguistics, 26(4):498–531, 2021.
    [18] Sun Peng. Jieba: Chinese word segmentation tool. 2012.
    [19] Tomasz Szandała. Review and comparison of commonly used activation functions for deep neural networks. Bio-inspired neurocomputing, pages 203–224, 2021.
    [20] Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), 2024.
    描述: 碩士
    國立政治大學
    資訊科學系
    111753136
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0111753136
    資料類型: thesis
    顯示於類別:[資訊科學系] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    313601.pdf4321KbAdobe PDF1檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋