Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/101083
|
Title: | 應用網路新聞文字探勘於預測台灣股價趨勢之研究 A study of forecasting Taiwan stock price trends by applying news text mining technique |
Authors: | 陳人華 Chen, Ren Hua |
Contributors: | 廖四郎 陳人華 Chen, Ren Hua |
Keywords: | 文字探勘 svm 新聞 股市 |
Date: | 2016 |
Issue Date: | 2016-09-01 23:47:06 (UTC+8) |
Abstract: | 股市新聞是散戶投資人重要的消息來源管道,近年來集中市場裡散戶投資人交易占比雖然下滑,但仍有過半的比重,而過去文獻也一再指出新聞媒體的報導確實會影響股票的報酬,若能夠將新聞中的資訊萃取出來並用來建構交易策略,無論是單獨使用或者和其他策略相結合,均可帶給投資人額外的幫助。 本研究運用支援向量機演算法(Support Vector Machine, SVM)進行自動分類及預測新聞發布後的股價趨勢,藉由應用張玉芳等人(2006)提出的改良式TF-IDF法,挑選新聞特徵詞的過程將會更準確,本研究從兩個不同的來源分別獲取數千篇新聞資料,包括鉅亨網和台灣經濟新報(TEJ),透過分析大量的新聞資料使結果更具代表性與穩定性,然而實證結果卻發現預測模型的精確度仍然不足,因此本研究最終未能透過模型證明新聞內容對股價的關係。 Stock market news is an important source of information for individual investors. In Taiwan exchange market, individual investors participation is still above 50% though it was on a decline for resent years. Some past research showed that news do affect returns of stocks. If we can find a way to extract the information in the news and build a trading strategy based on it, investors will gain additional profit from using the strategy─whether they combine the strategy with another. This study use SVM algorithm for automatic classification and for predicting Taiwan stock price trends after a news published. By applying the improved TF-IDF method developed by Chang et al., the process of characteristic selection become more accurate. This study analyze thousands of news articles which come from two different source, cnYES and Taiwan Economic Journal (TEJ), in order to make the predicting model representative and stable. However, the empirical results show that the precision of the model isn’t good enough. This study find no evidence that the information in news contents associate with Taiwan stock returns. |
Reference: | 1.Barber, B. M., & Odean, T. (2008). All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors. Review of Financial Studies, 21(2), 785-818. 2.Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics. 3.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,20(3), 273-297. 4.Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research, 3(Mar), 1289-1305. 5.Gidofalvi, G., & Elkan, C. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego. 6.Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. 7.Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000, November). Language models for financial news recommendation. InProceedings of the ninth international conference on Information and knowledge management (pp. 389-396). ACM. 8.Merton, R. C. (1987). A simple model of capital market equilibrium with incomplete information. The journal of finance, 42(3), 483-510. 9.Mittermayer, M. A. (2004). Forecasting intraday stock price trends with text mining techniques. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on (pp. 10-pp). IEEE. 10.Nie, J. Y., Brisebois, M., & Ren, X. (1996). On Chinese text retrieval. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 225-233). ACM. 11.Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523. 12.Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. 13.Sproat, R. (1990). A STATISTICAL METHOD FOR FINDING WORD BOUNDARIES IN CHINESE TEXT. 14.Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168. 15.Witten, I. H. (2005). Text mining. Practical handbook of Internet computing, 14-1. 16.Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information retrieval, 1(1-2), 69-90. 17.池祥萱, 林煜恩, 陳韋如 & 周賓凰. (2009). Does CEO Media Coverage Affect Firm Performance?. 交大管理學報, 1, 139-173. 18.張玉芳, 彭時名 & 呂佳. (2006). 基於文本分類 TFIDF 方法的改進與應用. 電腦工程, 32(19), 76-78. 19.鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價 |
Description: | 碩士 國立政治大學 金融研究所 103352019 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0103352019 |
Data Type: | thesis |
Appears in Collections: | [金融學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
201901.pdf | | 1271Kb | Adobe PDF2 | 307 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|