Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/112151
|
Title: | 應用情感分析於指數型證券投資信託基金趨勢預測之研究 Research into sentimental analysis to predict exchange-traded fund trend |
Authors: | 黃泓銘 Huang, Hung-Ming |
Contributors: | 姜國輝 Chiang, Kuo-Huie 黃泓銘 Huang, Hung-Ming |
Keywords: | 情感分析 LDA主題模型 支援向量機 ETF Sentimental analysis LDA SVM ETF |
Date: | 2017 |
Issue Date: | 2017-08-28 11:24:26 (UTC+8) |
Abstract: | 近年來ETF規模快速成長,亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源,而元大台灣50指數型證券投資信託基金因規模大,受到投資人的青睞。根據過去的研究指出,網路上的文本訊息會對群眾情緒造成影響,進而影響股價波動,對投資者而言,若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢,勢必可提高報酬率。然而,每日有上百篇的財金文本產生,人工分析耗時耗力,本研究採用文字探勘技術,提出一套情感分析的價格預測模型。 過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,然而,為解決監督式學習無法預期未知的限制,本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別,計算情緒指數並標記文本情緒傾向,再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等,建立分類模型以預測元大台灣50ETF的價格趨勢。 實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏,使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性,使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面,證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。 本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率,故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only. |
Reference: | [ 1 ] Baker, M. and J. Wurgler. Investor sentiment and the cross-section of stock returns, Journal of Finance, 4, 1645-1680, 2006 [ 2 ] Ballve, M.. Big Data Will Drive The Next Phase Of Innovation In Mobile Computing, 2013 [ 3 ] Barber, B.“Noise trader risk, odd-lot trading, and security returns,” Working Paper, University of California at Davis, 1999 [ 4 ] Chan WJ, Cheng KC, Shieh JM, Fong Y, Chang JM, Chuang SS, Ko SC., Mediastinal hemangiomatosis. Thorac Med , 19,125-131, 2004 [ 5 ] Corinna Cortes Vladimir Vapnik, “Support-Vector networks” Machine Learning, pp.273-297, 1995 [ 6 ] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation.Journal of Machine Learning Research, 3:993–1022,January 2003. [ 7 ] Devitt, A. and K. Ahmad 2007. Sentiment Polarity Identification in Financial News: A Cohesion-Based Approach. Association of Computational Linguistics, Prague, Czech Republic. [ 8 ] E. Cambria and A. Hussain. Sentic Computing: Techniques, Tools, and Applications. Dordrecht, Netherlands: Springer, ISBN: 978-94-007-5069-2, 2012 [ 9 ] Feldman, Techniques and applications for sentiment analysis, 2013 [ 10 ] Giovanni Vigna, The wall street journal-0424, 2013 [ 11 ] Griffiths, T. L., & Steyvers, M. Finding scientific topics. Proceedings of the National Academy of Sciences, 101, 5228-5235, 2004 [ 12 ] H. (Sam) Han, G. Karypis, and V. Kumar, “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,” in Advances in Knowledge Discovery and Data Mining, D. Cheung, G. J. Williams, and Q. Li, Eds. Springer Berlin Heidelberg, 2001, pp. 53–65. [ 13 ] Harris Drucker, Support Vector Machines for Spam Categorization, 1999 [ 14 ] Johan Bollen1, Huina Mao1, Xiao-Jun Zeng. Twitter mood predicts the stock market. 2010 [ 15 ] Jonathan Taplin, Twitter tool delves into the sentiment of social media, 2013 [ 16 ] Kumar, A., Lee, C. M. C. Retail Investor Sentiment and Return Comovements, 2006 [ 17 ] Liu, “Sentiment Analysis and Opinion Mining,” Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167, May 2012. [ 18 ] M. Qamar, E. Gaussier, J.-P. Chevallet, and J.-H. Lim, “Similarity Learning for Nearest Neighbor Classification,” in Eighth IEEE International Conference on Data Mining, 2008. ICDM ’08, pp. 983–988, 2008 [ 19 ] Mishne, G. and de Rijke, M., MoodViews: Tools for Blog Mood Analysis, AAAI 2006 Spring Symposium on Computational Approaches to analyzing Weblogs (AAAI-CAAW2006), 2006. [ 20 ] Newman, Hage, Chemudugunta, Smyth. Subject Metadata Enrichment using Statistical Topic Models. JCDL : 366-375, 2007 [ 21 ] Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found. Trends Inf. Retr., vol. 2, no. 1–2, pp. 1–135, Jan. 2008. [ 22 ] Pang and Lee. Opinion mining and sentiment analysis, 2008 [ 23 ] Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, Stroudsburg, PA, USA, pp. 79–86, 2002 [ 24 ] Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, Stroudsburg, PA, USA, pp. 79–86, 2002 [ 25 ] Soliman, Utilizing support vector machines in mining online customer reviews, 2012 [ 26 ] Sui, Y. Jianping, Z. Hongxian, and Z. Wei, “Sentiment analysis of Chinese micro-blog using semantic sentiment space model,” in 2012 2nd International Conference on Computer Science and Network Technology (ICCSNT), pp. 1443–1447,2012 [ 27 ] Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based Methods for Sentiment Analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267–307, Jun. 2011. [ 28 ] Thorsten Joachims, SVM-Light Support Vector Machine, 2008 [ 29 ] Zheng and Y. Tian, “Chinese Web Text Classification System Model Based on Naive Bayes,” in 2010 International Conference on E-Product E-Service and E-Entertainment (ICEEE), pp. 1–4, 2010 [ 30 ] 王波, 郭曉軍. 基地情感分析的網絡財經媒體通貨膨脹預期研究 55(16): 140-143.(CSSCI), 2011 [ 31 ] 王濟川,郭志剛. Logistic 迴歸模型-方法及應用, 2003. [ 32 ] 台灣證券交易所, 投資人開戶統計表, 2016 [ 33 ] 投信投顧公會, 投信投顧公會基金績效表, 2015 [ 34 ] 李啟菁,王正豪. “中文部落格文章之意見分析”, 2010 [ 35 ] 林冠中. 漸進式支持向量機於人臉辨識之應用, 2005 [ 36 ] 林彩雯, 以Google App 評論為字詞權重調整之情緒分析系統, 2015 [ 37 ] 林育龍, 對使用者評論之情感分析研究-以Google Play市集為例, 2014 [ 38 ] 林昱廷, 我國掛牌ETF的種類及交易概況介紹, 證券暨期貨月刊34(3), 5-15, 2016 [ 39 ] 洪崇洋, 以LDA 和使用紀錄為基礎的線上電子書主題趨勢發掘方法, 2012 [ 40 ] 徐中琦, 黃銘遠. 公開資訊之資訊內涵與投資人在不同情緒下投資行為之研究, 2014 [ 41 ] 張良杰. 巨量資料環境下之新聞主題暨輿情與股價關係之研究, 2014 [ 42 ] 郭俊桔,張育蓉. 使用情緒分析於圖書館使用者滿意度評估之研究, 2013 [ 43 ] 郭敏華, 如何測量投資人情緒?, 2009 [ 44 ] 陳信源, 葉鎮源, 林昕潔, 黃明居, 柯皓仁, 楊維邦, & 圖書館. 結合支援向量機與詮釋資料之圖書自動分類方法. 資訊科技國際期刊, 3(1), 2-21, 2009 [ 45 ] 游和正, 黃挺豪, 陳信希. 領域相關詞彙極性分析及文件情緒分類之研究, 2012 [ 46 ] 黃承龍, 陳穆臻, & 王界人. 支援向量機於信用評等之應用: 計量管理期刊, 2004 [ 47 ] 黃純敏,應用LDA進行Plurk主題分類及使用者情緒分析,2014 [ 48 ] 黃運高,王妍,邱武松,向林泓,趙學良.基于K-means和TF-IDF的中文藥名聚類分析, 2014 [ 49 ] 經濟部統計處, 工業生產資訊年報, 2014 [ 50 ] 萬常選, 江騰蛟, 鍾敏娟, 邊海容. 基於詞性標註和依存句法的 Web 金融信息情感計算, 2013 [ 51 ] 葉又豪, 運用文字探勘分析非量化資訊協助投資人預測公司財務表現, 2012 [ 52 ] 劉吉軒, 吳建良, “以情緒為中心之情境資訊觀察與評估, ” 2007NCS全國計算機會議, pp. 12-20~21, 2007 [ 53 ] 劉羿廷, 運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究, 2015 [ 54 ] 劉鵬,滕家雨. 基於Spark的大規模文本k-means並行聚類算法, 2014 [ 55 ] 蔡正修, 台灣電子類股價指數趨勢預測之研究, 2007 [ 56 ] 談成訪, 基於LDA模型的新聞話題分類研究, 2014 [ 57 ] 魏晶晶,吳曉吟. 電子商務產品評論多級情感分析的研究與實現, 2013 [ 58 ] 龔建彰, 基於新聞字詞漲跌極性之股價趨勢分類預測, 2014 |
Description: | 碩士 國立政治大學 資訊管理學系 103356022 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0103356022 |
Data Type: | thesis |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
602201.pdf | 1320Kb | Adobe PDF2 | 72 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|