Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/136324
|
Title: | 透過文字探勘預測台股報酬 Predicting Taiwan Stocks Returns with Text Data |
Authors: | 郭亭佑 Kuo, Ting-You |
Contributors: | 翁久幸 林士貴 Weng, Chiu-Hsing Lin, Shih-Kuei 郭亭佑 Kuo, Ting-You |
Keywords: | 非結構化數據 文字探勘 股票新聞 機器學習 預測股票報酬 情緒分析 效率市場假說 超額報酬 Unstructured Data Text Mining Stock News Machine Learning Predict Stock Returns Sentiment Analysis Efficient-Market Hypothesis Abnormal Returns |
Date: | 2021 |
Issue Date: | 2021-08-04 14:43:11 (UTC+8) |
Abstract: | 近年來非結構化數據成長快速,因而引發多位學者針對新聞媒體對於股票報酬之影響此類議題進行研究分析。新聞為一般投資人進行交易行為時,最為普遍接觸之「公開資訊」。然而,新聞文章不若財報資訊中有明確數據資料供投資人研究分析後,作為其投資之參考依據。本研究欲透過文字探勘方法獲取台股新聞情緒信息,並利用新聞情緒分數預測台股報酬。本文依據 Ke, Kelly & Xiu (2019) 提出之文字探勘方法建構台股新聞情緒分數模型(Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, 台股SESTM),我們發現該方法特別適合用於分析新聞文章與股價走勢之間的變動關係,因此本研究欲將該文字探勘方法拓展至臺灣股票市場,並用於實證臺灣效率市場假說。我們發現使用台股SESTM所估算之新聞情緒分數,於臺灣股票市場建構投資組合交易策略同樣有巨大經濟效益,而該情緒分數對於個股報酬有顯著的預測能力及解釋力。若比較美國與台股SESTM交易策略績效表現,可發現台股SESTM對於新聞發佈前之股票報酬有較高的預測能力。同時也發現,儘管台股SESTM對於股票報酬之預測能力顯著有效,但我們透過評估績效發現,新聞對於臺灣投資人決策行為之影響與美國是顯著不同的,這些結果均符合我們對於臺灣股票市場的經濟直觀。我們期待此研究所建構之台股SESTM能夠幫助臺灣財務文字探勘領域建立研究基底。 In recent years, unstructured data has grown rapidly, which has triggered many scholars to conduct research and analysis on the impact of news media on stock price returns. News article is the most common and accessible “open information” by investors when they conduct transactions. However, news articles, unlike financial report or stock price, news articles cannot be converted to specific numerical data as a reference basis for investment. Our research intends to obtain sentiment information from Taiwan stocks news through text-mining and use news sentiment scores to predict Taiwan stocks` returns. Our research is based on the text-mining methodology introduce by Ke, Kelly & Xiu (2019) to construct a Taiwan stock news sentiment model (Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, Taiwan SESTM). We found that this methodology is particularly suitable for analyzing the relationship between news articles and stock price trends. Therefore, this study intends to extend this text-mining methodology to the Taiwan stock market and use the empirical analysis of Taiwan`s efficiency-market hypothesis by news articles. We found that using the news sentiment score estimated by Taiwan SESTM to construct a portfolio trading strategy in the Taiwan stock market also has huge economic benefits, and the sentiment score is significantly effective on predict stock returns and explain their correlation. We compare the performance of the United States and Taiwan SESTM trading strategies, we found that Taiwan SESTM has a higher predictive ability for stock price returns before the news articles release. At the same time, we also found the impact of news on the decision making of Taiwanese investors is significantly different with United States by evaluate our portfolio performance. These results are in line with our economic intuition about the Taiwan stock market. We hope that the Taiwan SESTM constructed by this research can help establish a research base in the field of financial text-mining in Taiwan. |
Reference: | 1. 李昱穎. (2019). 新聞輿情分析在台灣股票市場之應用: 文字轉向量與動能策略. 政治大學金融學系學位論文, 1-40. 2. 陳信宏, 陳昱志,& 鄭舜仁.(2006). 以時間數列模型檢定台灣股票市場弱式效率性之研究. 管理科學與統計決策, 3(4), 8-17. 3. 鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價漲跌趨勢預測之研究 (Doctoral dissertation, 撰者). 4. Azar, P. D., & Lo, A. W. (2016). The wisdom of Twitter crowds: Predicting stock market reactions to FOMC meetings via Twitter feeds. The Journal of Portfolio Management, 42(5), 123-134. 5. Alvarez-Ramirez, J., Rodriguez, E., & Espinosa-Paredes, G. (2012). Is the US stock market becoming weakly efficient over time? Evidence from 80-year-long data. Physica A: Statistical Mechanics and its Applications, 391(22), 5643-5647. 6. Bernard, V. L., & Thomas, J. K. (1990). Evidence that stock prices do not fully reflect the implications of current earnings for future earnings. Journal of Accounting and Economics, 13(4), 305-340. 7. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493-2537. 8. Cowles 3rd, A. (1933). Can stock market forecasters forecast?. Econometrica: Journal of the Econometric Society, 309-324. 9. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860. 10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 11. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417. 12. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensioal feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849-911. 13. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolutional sequence to sequence learning. In International Conference on Machine Learning (pp. 1243-1252). PMLR. 14. Heston, S. L., & Sinha, N. R. (2017). News vs. sentiment: Predicting stock returns from news stories. Financial Analysts Journal, 73(3), 67-83. 15. Hutchins, R. M. (1954). Great books. Western World. 16. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. The Journal of Finance, 48(1), 65-91. 17. Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729. 18. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188. 19. Ke, Z. T., Kelly, B. T., & Xiu, D. (2019). Predicting returns with text data (No. w26186). National Bureau of Economic Research. 20. Lakonishok, J., & Vermaelen, T. (1990). Anomalous price behavior around repurchase tender offers. The Journal of Finance, 45(2), 455-477. 21. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp. 1188-1196). PMLR. 22. Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. arXiv preprint cs/0205028. 23. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65. 24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119. 25. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762. 26. Ritter, J. R. (1991). The long‐run performance of initial public offerings. The Journal of Finance, 46(1), 3-27. 27. Spiess, D. K., & Affleck-Graves, J. (1995). Underperformance in long-run stock returns following seasoned equity offerings. Journal of Financial Economics, 38(3), 243-267. 28. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215. 29. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168. 30. Tetlock, P. C. (2014). Information transmission in finance. Annual Review of Financial Economics, 6(1), 365-384. 31. Turing, I. B. A. (1950). Computing machinery and intelligence-AM Turing. Mind, 59(236), 433. 32. Wilson, D. S. (1975). A theory of group selection. Proceedings of the National Academy of Sciences, 72(1), 143-146. 33. Yang, B., Yih, W. T., He, X., Gao, J., & Deng, L. (2014). Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. 34. Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. 35. Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners` guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820. |
Description: | 碩士 國立政治大學 統計學系 108354023 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0108354023 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202101087 |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
402301.pdf | | 3083Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|