政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/152791
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113311/144292 (79%)
Visitors : 50916914      Online Users : 871
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/152791


    Title: 結合數值資訊與文字資訊的股價預測模型: 以有考慮ESG因子的股票為例
    Stock price prediction model combining numerical information and text information: taking stocks that consider ESG factors as an example
    Authors: 蔡青穎
    Contributors: 黃泓智
    蔡青穎
    Keywords: ESG投資
    嵌入模型
    離散小波轉換
    集成學習
    ESG investment
    Embedding model
    Discrete wavelet transform
    Ensemble learning
    Date: 2024
    Issue Date: 2024-08-05 14:03:16 (UTC+8)
    Abstract: 本研究探討結合數值資訊與文字資訊來預測有考慮ESG因子股票的隔日報酬率表現,提出了一種多元的預測模型。ESG投資日益受到重視,不僅因其社會責任和環境影響,還因其在長期財務表現中的潛力。然而,台灣現有的ESG數據庫存在資料缺失和標準不一致等問題,促使我們採用不同的文字資料來源進行預測。研究方法包括資料的預處理,包含利用M3-Embedding的嵌入模型作為文字資訊的向量化處理,並跳脫以往情緒分析的框架,直接將所有向量作為特徵值,以及數值資料的離散小波轉換,機器學習部分則利用多種機器學習模型(包括極限學習機、隨機森林、多層感知器、支援向量機以及卷積神經網路)和集成學習方法進行訓練和比較。實證結果顯示,僅含有數值資訊的模型仍有較低的誤差值,然而,整體來看,綜合了數值和文字資訊的模型在預測股價報酬率和風險控制方面均表現出較好的績效,尤其是在夏普比率、最大回落以及報酬率等績效指標上優於僅使用單一類型資訊的模型,且更能有效地利用市場的即時資訊進行預測。綜上,本研究證明了在股票預測中結合文字和數值資訊的可行性和優勢,為ESG投資的相關研究提供了新的方向和參考。
    This study explores the combination of numerical information and text information to predict the next-day return performance of stocks that consider ESG factors, and proposes a multivariate prediction model. ESG investing is increasingly valued not only for its social responsibility and environmental impact, but also for its potential in long-term financial performance. However, existing ESG databases in Taiwan have problems such as missing data and inconsistent standards, which prompts us to use different textual data sources for prediction. Research methods include data preprocessing, including using the M3-Embedding embedding model as vectorization processing of text information, breaking away from the previous sentiment analysis framework, directly using all vectors as features, and discrete wavelet transformation of numerical data. Also, this study uses a variety of machine learning models (including extreme learning machines, random forests, multi-layer perceptrons, support vector machines and convolutional neural networks) and ensemble learning method for training and comparison. Empirical results show that models containing only numerical information still have lower error values. However, the model that combine numerical and text information show better performance in predicting stock price changes and risk control, especially in terms of Sharpe ratio, performance indicators such as maximum drawdown and return rate are better than models that only use a single type of information, and can more effectively utilize real-time market information for prediction. In summary, this study proves the feasibility and advantages of combining text and numerical information in stock prediction, and provides new directions and references for ESG investment-related research.
    Reference: 吳漢瑞. (2011). 應用文字探勘技術於臺灣上市公司重大訊息對股價影響之研究 吳漢瑞].
    林美雯. (2016). 台灣上市公司重大訊息揭露與股票行為之關聯性研究 東吳大學]. 臺灣博碩士論文知識加值系統. 台北市. https://hdl.handle.net/11296/gvxxta
    孫亦農. (2023). BERT模型在財金新聞情緒與台灣股票報酬預測之運用 國立中山大學]. 臺灣博碩士論文知識加值系統. 高雄市. https://hdl.handle.net/11296/3g98f5

    Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint arXiv:2402.03216.
    Gao, L., Dai, Z., & Callan, J. (2021). COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. arXiv preprint arXiv:2104.07186.
    Hafez, P., & Gomez, F. (2019). Socially responsible investing: Combining ESG ratings with news sentiment generates alpha.
    Haryono, A. T., Sarno, R., & Abdullah, R. (2022). Aspect-based sentiment analysis of financial headlines and microblogs using semantic similarity and bidirectional long short-term memory. International Journal of Intelligent Engineering and Systems, 15(3), 233-241.
    Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1-3), 489-501.
    Khattab, O., & Zaharia, M. (2020). Colbert: Efficient and effective passage search via contextualized late interaction over bert. Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval,
    Kumar, M., & Thenmozhi, M. (2006). Forecasting stock index movement: A comparison of support vector machines and random forest. Indian institute of capital markets 9th capital markets conference paper,
    Lepik, Ü., & Tamme, E. (2007). Solution of nonlinear Fredholm integral equations via the Haar wavelet method. Proceedings of the Estonian Academy of Sciences, Physics, Mathematics,
    Maqbool, J., Aggarwal, P., Kaur, R., Mittal, A., & Ganaie, I. A. (2023). Stock prediction by integrating sentiment scores of financial news and MLP-regressor: A machine learning approach. Procedia Computer Science, 218, 1067-1078.
    Mehta, S., Rana, P., Singh, S., Sharma, A., & Agarwal, P. (2019). Ensemble learning approach for enhanced stock prediction. 2019 twelfth international conference on contemporary computing (IC3),
    Mehtab, S., & Sen, J. (2020). Stock price prediction using CNN and LSTM-based deep learning models. 2020 International Conference on Decision Aid Sciences and Application (DASA),
    Miche, Y. (2010). Publication A Yoan Miche, Antti Sorjamaa, Patrick Bas, Olli Simula, Christian Jutten, and Amaury Lendasse. 2010. OP-ELM: Optimally Pruned Extreme Learning Machine. IEEE Transactions on Neural Networks, volume 21, number 1, pages 158-162. IEEE TRANSACTIONS ON NEURAL NETWORKS, 21(1).
    Ortega, L., & Khashanah, K. (2014). A neuro‐wavelet model for the short‐term forecasting of high‐frequency time series of stock returns. Journal of Forecasting, 33(2), 134-146.
    Pedersen, L. H., Fitzgibbons, S., & Pomorski, L. (2021). Responsible investing: The ESG-efficient frontier. Journal of financial economics, 142(2), 572-597.
    Sakhare, N. N., & Imambi, S. S. (2019). Performance analysis of regression based machine learning techniques for prediction of stock market movement. Int. J. Recent Technol. Eng, 7(6S4), 206-213.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    Description: 碩士
    國立政治大學
    風險管理與保險學系
    111358028
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0111358028
    Data Type: thesis
    Appears in Collections:[Department of Risk Management and Insurance] Theses

    Files in This Item:

    File Description SizeFormat
    802801.pdf2667KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback