政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/59440

政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/59440

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 115256/146303 (79%)
Visitors : 54527517 Online Users : 530

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大典藏 > College of Informatics > Department of Computer Science > Theses > Item 140.119/59440

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/59440

Title:	串流資料分析在台灣股市指數期貨之應用 An Application of Streaming Data Analysis on TAIEX Futures
Authors:	林宏哲 Lin, Hong Che
Contributors:	徐國偉 Hsu, Kuo Wei 林宏哲 Lin, Hong Che
Keywords:	資料串流探勘概念飄移台灣股市期貨 data stream mining concept drift TAIEX Futures
Date:	2012
Issue Date:	2013-09-02 16:48:39 (UTC+8)
Abstract:	資料串流探勘是一個重要的研究領域，因為在現實中有許多重要的資料以串流的形式產生或被收集，金融市場的資料常常是一種資料串流，而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言，預測期貨這種資料串流並不容易，而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變，這造成預測的準確率會急遽下降，因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出，使用偵測概念飄移的演算法可以大幅改善預測的準確率，甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外，我們提出了一個多分類器演算法，有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前，其最大的特色在於不需要使用者設定每個子分類器的樣本數，而該樣本數是影響演算法的關鍵之一。 Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms.
Reference:	[1] C. Sammut and M. Harries, "Concept Drift," in Encyclopedia of Machine Learning, ed: Springer, 2010, pp. 202-205. [2] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," The Journal of Machine Learning Research, vol. 99, pp. 1601-1604, 2010. [3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009. [4] J. A. Ou and S. H. Penman, "Financial statement analysis and the prediction of stock returns," Journal of accounting and economics, vol. 11, pp. 295-329, 1989. [5] R. W. Holthausen and D. F. Larcker, "The prediction of stock returns using financial statement information," Journal of accounting and economics, vol. 15, pp. 373-411, 1992. [6] D. P. Brown and R. H. Jennings, "On technical analysis," Review of Financial Studies, vol. 2, pp. 527-551, 1989. [7] H. V. Roberts, "Stock‐Market “Patterns” And Financial Analysis: Methodological Suggestions," The Journal of Finance, vol. 14, pp. 1-10, 1959. [8] L. Blume, D. Easley, and M. O`hara, "Market statistics and technical analysis: The role of volume," The Journal of Finance, vol. 49, pp. 153-181, 1994. [9] E. J. Hannan, Multiple time series vol. 38: Wiley, 1970. [10] P.-F. Pai and C.-S. Lin, "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, vol. 33, pp. 497-505, 2005. [11] S. H. Cheng, "Data mining techniques to identify the direction of Taiwan Stock Index Futures day trading," PhD Thesis, Department of Financial Engineering and Actuarial Mathematics of Soochow University. 2011. (in Chinese) [12] C.-H. L. Chiu, Zne-Jung, "Application of Data Mining Technologies for IC Stock Category," Digital Technology Information Management. 2009. (in Chinese) [13] S.-H. C. Cheng, I-LING, "Data Mining for Analysis of Choosing Stocks from Taiwan Stock Market," 2009 International Conference on Advanced Information Technologies (AIT), 2009. (in Chinese) [14] P.-C. Chang and C.-H. Liu, "A TSK type fuzzy rule based system for stock price prediction," Expert Systems with Applications, vol. 34, pp. 135-144, 2008. [15] T.-N. Lin, "Using AdaBoost for Taiwan Stock Index Future Intra-day Trading System," Graduae Institute of Network and Multimedia college of Electrical Engineering and computer Science, National Taiwan University. 2008. (in Chinese), 2008. (in Chinese) [16] M. Harries and K. Horn, "Detecting concept drift in financial time series prediction using symbolic machine learning," in AI-CONFERENCE-, 1995, pp. 91-98. [17] K. B. Pratt and G. Tschapek, "Visualizing concept drift," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 735-740. [18] G. R. Marrs, R. J. Hickey, and M. M. Black, "The impact of latency on online classification learning with concept drift," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 459-469. [19] C.-M. Y. Chao, Huei-Wen, "Application of Multiple Data Streams Sequential Pattern Mining on Taiwan Stock Market," Journal of Information Management, vol. 12, pp. 113-132, 2010. (in Chinse) [20] J. Sun and H. Li, "Dynamic financial distress prediction using instance selection for the disposal of concept drift," Expert Systems with Applications, vol. 38, pp. 2566-2576, 2011. [21] M. Last, "Online classification of nonstationary data streams," Intelligent Data Analysis, vol. 6, pp. 129-147, 2002. [22] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993. [23] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986. [24] W. W. Cohen, "Fast effective rule induction," in Machine Learning-International Workshop Then Conference, 1995, pp. 115-123. [25] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 21-27, 1967. [26] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995. [27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp. 119-139, 1997. [28] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, pp. 69-101, 1996. [29] A. Bifet, J. Gama, M. Pechenizkiy, and I. Zliobaite, "Handling concept drift: Importance, challenges and solutions," PAKDD-2011 Tutorial, Shenzhen, China, 2011. [30] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, and S. Y. Philip, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008. [31] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80. [32] A. Bifet and R. Gavaldà, "Adaptive learning from evolving data streams," in Advances in Intelligent Data Analysis VIII, ed: Springer, 2009, pp. 249-260. [33] G. Holmes, R. Kirkby, and B. Pfahringer, "Stress-testing hoeffding trees," in Knowledge Discovery in Databases: PKDD 2005, ed: Springer, 2005, pp. 495-502. [34] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence–SBIA 2004, ed: Springer, 2004, pp. 286-295. [35] M. Baena-García, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno, "Early drift detection method," 2006. [36] H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226-235. [37] D. Brzeziński and J. Stefanowski, "Accuracy updated ensemble for data streams with concept drift," in Hybrid Artificial Intelligent Systems, ed: Springer, 2011, pp. 155-163. [38] E. Kirkos, C. Spathis, and Y. Manolopoulos, "Data mining techniques for the detection of fraudulent financial statements," Expert Systems with Applications, vol. 32, pp. 995-1003, 2007. [39] P. Ou and H. Wang, "Prediction of stock market index movement by ten data mining techniques," Modern Applied Science, vol. 3, p. P28, 2009. [40] B. Rosenberg and W. McKibben, "The prediction of systematic and specific risk in common stocks," Journal of Financial and Quantitative Analysis, pp. 317-333, 1973. [41] G. Gidófalvi and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, 2001.
Description:	碩士國立政治大學資訊科學學系 100753020 101
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0100753020
Data Type:	thesis
Appears in Collections:	[Department of Computer Science ] Theses

Files in This Item:

File	Size	Format
302001.pdf	5809Kb	Adobe PDF2	786	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback