Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/118732
|
Title: | 運用財經文本PAD情感模型於指數型證券投資信託基金趨勢研究-以台灣中型100基金為例 A Study on the Trend of Exchange Traded Funds by PAD Sentiment Pattern Model in Yuanta Taiwan Mid-Cap 100 ETF |
Authors: | 吳旻諺 |
Contributors: | 姜國輝 季延平 吳旻諺 |
Keywords: | 情感分析 ETF TensorFlow 側影係數 PAD情感模型 Sentimental analysis ETF TensorFlow Silhouette coefficient PAD emotional state model |
Date: | 2018 |
Issue Date: | 2018-07-18 11:02:04 (UTC+8) |
Abstract: | 近年來ETF資產規模蓬勃發展,以成為許多投資人關注的目標。除了元大50外,許多分析師認為元大台灣中型100基金的成長率更佳,從歷年數據可知,元大台灣中型100基金於某些年度報酬率優於台灣50,加上元大台灣中型100基金的研究數量十分稀少,故本研究希望建立一套以文本情感分析的價格預測模型,成為投資客參考的重要工具。 過去的文本分析研究中,皆以LDA分群效果最好,並認為TF-IDF結合K-means因為稀疏矩陣而效果不佳,因此本研究透過TensorFlow程式庫進行實作和側影係數的比較,並發現TF-IDF結合K-means主題模型分群效果及分群比例皆優於LDA主題模型。 過去的財經文本情感分析研究中,情緒標注皆以NTUSD、知網及自行擴充的情感辭典為主,由於辭典的選擇及變動皆會造成情緒分數的改變;由於財經詞庫的不足性,也會造成大量的人工標注。因此,本研究提出利用廣義知網的詞義辭典結合PAD情感模型將情緒數據化並且大量減少人工標注行為。 實驗結果證實情緒指數和股價指數具有相似的走勢及波動,其中股市資訊主題的情緒指數具備著領先指標的特性,對於價格預測模型有所幫助。 在監督式情感分析方法中,本研究採用SVM和kNN來做比較。實驗結果中發現以SVM的情緒指數結合台灣加權股價指數、原油價格和美元匯率等間接指標,分類效果最為良好。證實財經文本分析能夠有效提升對元大台灣中型100基金的價格趨勢預測。 ETF assets have been growing in recent years, and become a focus for many investors. The historical data said the Yuanta Taiwan Mid-Cap 100 ETF return rate is better than that of Yuanta Taiwan Top 50 ETF in serval years; moreover, the researches of Yuanta Taiwan Mid-Cap 100 ETF is very scarce. Therefore, the aim of this study is to establish a price prediction model which will become an important tool for investors in texting sentiment analysis. The past researches pointed out that LDA was the best clustering method in text sentiment analysis, and argued that TF-IDF combined with K-means had a weak effect because of sparse matrix. We use TensorFlow to implement TF-IDF combined with K-means, and we find that the effect of TF-IDF combination K-means, which is implemented by TensorFlow, is superior to the LDA model by silhouette coefficient. In the past researches of the sentiment analysis of financial news, sentimental labels was mainly based on financial dictionaries, like NTUSD, HowNet Knowledge Database and the self-expansion algorithm. It must need a lot of manual tagging, so this study proposes to use the lexical thesaurus of E-HowNet Knowledge Database mixing PAD emotional state model to digitize emotions and greatly reduce manual labeling. The results support that sentiment index has a similar trend with the stock index. Especially, the sentiment index of the subject of the stock’s information has the characteristics of the leading indicators. Eventually, we use SVM and kNN to compare in this study. The results are that the SVM model which combine with sentiment index and indirect indicators, Taiwan Weighted Stock Index, International Crude Oil Price and Exchange Rate, is the best. |
Reference: | [1] 元大證券投資信託股份有限公司,2018,元大台灣中型100證券投資信託基金公開說明書. [2] 元大證券投資信託股份有限公司,2018,元大台灣卓越50證券投資信託基金公開說明書. [3] 靳志輝,(2013)。LDA數學八卦。 [4] 王波, 郭曉軍. (2011). 基地情感分析的網絡財經媒體通貨膨脹預期研究 55(16): 140-143.(CSSCI) [5] 黃運高, 王妍, 邱武松, 向林泓, & 趙學良. (2014). 基於 K-means 和 TF-IDF 的中文藥名聚類分析. 電腦應用, 34(A01), 173-174. [6] 黃承龍, 陳穆臻, 王界人(2004). 支援向量機於信用評等之應用. 計量管理期刊vol. 1, no 2, page 155~172 [7] 陳信源, 葉鎮源, 林昕潔, 黃明居, 柯皓仁, 楊維邦, & 圖書館. (2009). 結合支援向量機與詮釋資料之圖書自動分類方法. 資訊科技國際期刊, 3(1), 2-21. [8] 陳怡君, 黃淑齡, 施悅音, & 陳克健. (2005). 繁體字知網架構下之功能詞表達初探. 第六屆漢語詞彙語意學研討會論文集, 中國. [9] 萬常選, 江騰蛟, 鐘敏娟, & 邊海容. (2013). 基於詞性標注和依存句法的 Web 金融資訊情感計算. 電腦研究與發展, 50(12), 2554-2569. [10] 劉鵬, 滕家雨, 丁恩傑, & 孟磊. (2017). 基於 Spark 的大規模文本 k-means 並行聚類演算法. 中文資訊學報, 31(4), 145-153. [11] 談成訪, & 汪材印. (2014). 基於 LDA 模型的新聞話題分類研究. 電腦知識與技術: 學術交流, (6), 3795-3797. [12] 魏晶晶, & 吳曉吟. (2013). 電子商務產品評論多級情感分析的研究與實現. 軟體, 34(9), 65-67. [13] 吳建良(2007)。以情緒詞為基礎之情境資訊連結與觀察(碩士)。 國立政治大學資訊科學系研究所學位論文,台北市 [14] 李啟菁 (2010)。 中文部落格文章之意見分析(碩士)。臺北科技大學資訊工程系研究所學位論文,臺北市 [15] 林育龍 (2014)。 對使用者評論之情感分析研究-以 Google Play 市集為例(碩士)。 國立政治大學資訊管理系研究所學位論文,台北市 [16] 林詠翔 (2017)。 應用情感型態分析於指數股票型基金趨勢研究-以台灣卓越50基金為例(碩士)。 國立政治大學資訊管理系研究所學位論文,台北市 [17] 洪崇洋, 李建祥, & 黃三益. (2012). LDA 和使用紀錄為基礎的線上電子書主題趨勢發掘方法. [18] 張育蓉. (2012). 使用情緒分析於圖書館使用者滿意度評估之研究. [19] 張日威 (2014)。 應用LDA進行Plurk主題分類及使用者情緒分析(碩士)。 國立雲林科技大學資訊管理系研究所學位論文,雲林縣 [20] 張良杰. (2014). 巨量資料環境下之新聞主題暨輿情與股價關係之研究 (Doctoral dissertation, 張良杰). [21] 曹海濤. (2013). 基於 PAD 模型的中文微博情感分析研究 (Doctoral dissertation, 大連: 大連理工大學). [22] 葉又豪. (2011). 運用文字探勘分析非量化資訊協助投資人預測公司財務表現. 政治大學會計研究所學位論文, 1-79. [23] 蔡宇祥 (2016)。 股市趨勢預測之研究: 財經評論文本情感分析(碩士)。 國立政治大學資訊管理系研究所學位論文,台北市 [24] 維基百科,ETF,上網日期2017年11月11日,檢自:https://zh.wikipedia.org/wiki/ETF [25] 維基百科,隱含狄利克雷分布,上網日期2017年11月10日,檢自:https://zh.wikipedia.org/wiki/隱含狄利克雷分布 [26] 維基百科,吉布斯採樣,上網日期2017年11月10日,檢自:https://zh.wikipedia.org/wiki/吉布斯採樣 [27] Aranganayagi, S., & Thangavel, K. (2007, December). Clustering categorical data using silhouette coefficient as a relocating measure. In Conference on Computational Intelligence and Multimedia Applications, 2007. International Conference on (Vol. 2, pp. 13-17). IEEE. [28] Dan Vesset, Ashish Nadkarni, Carl W. Olofson, David Schubmehl, Maureen Fleming, Mary Wardley, Brian McDonough, Jean S. Bozman, Steve Conway, Rob Brothers, Rohit Mehra, Melissa Webster, Richard L. Villars, Henry D. Morris, Ali Zaidi, Mukesh Dialani. (2012). Worldw ide Big Data Technology and Services 2012 – 2016 Forecast, Publisher Location: International Data Corporation [29] Giancarlo Zaccone, (2016). Getting started with TensorFlow, Publisher Location: Packt Publishing [30] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. [31] Baker, M. and J. Wurgler. Investor sentiment and the cross-section of stock returns, Journal of Finance, 4, 1645-1680, 2006 [32] Bollen, J., Mao, H., & Pepe, A. (2010, October). Determining the Public Mood State by Analysis of Microblogging Posts. In ALIFE (pp. 667-668). [33] Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15-21. [34] Chmiel, A., Sienkiewicz, J., Thelwall, M., Paltoglou, G., Buckley, K., Kappas, A., & Hołyst, J. A. (2011). Collective Emotions Online and Their Influence on Community Life. PloS one, 6(7), e22207. doi: 10.1371/journal.pone.0022207 [35] Chiu, J., Chung, H., Ho, K. Y., & Wang, G. H. (2012). Funding liquidity and equity liquidity in the subprime crisis period: Evidence from the ETF market. Journal of Banking & Finance, 36(9), 2660-2671. [36] Ceron, A., Curini, L., Iacus, S. M., & Porro, G. (2014). Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens` political preferences with an application to Italy and France. New Media & Society, 16(2), 340-358. doi: 10.1177/1461444813480466 [37] DeLong, J. B., A. Shleifer, L. H. Summers, and R. J. Waldmann, “Noise trader risk in financial markets,”Journal of Political Economy, 98,703-738, 1990 [38] Donovan, R. J., Rossiter, J. R., Marcoolyn, G., & Nesdale, A. (1994). Store atmosphere and purchasing behavior. Journal of retailing, 70(3), 283-294. [39] Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048-1054. [40] Frankie Chau, Rataporn Deesomsak, Marco C.K. Lau, “Investor sentiment and feedback trading: Evidence from -traded the exchange fund markets”, International Review of Financial Analysis, 20, pp. 292-305, Oct. 2011. [41] Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89. [42] Hui, M. K., & Bateson, J. E. (1991). Perceived control and the effects of crowding and consumer choice on the service experience. Journal of consumer research, 18(2), 174-184. [43] Isen, A. M., Daubman, K. A., & Nowicki, G. P. (1987). Positive affect facilitates creative problem solving. Journal of personality and social psychology, 52(6), 1122. [44] Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (pp. 137-142). Springer, Berlin, Heidelberg. [45] Kuppens, P., Realo, A., & Diener, E. (2008). The role of positive and negative emotions in life satisfaction judgment across nations. Journal of personality and social psychology, 95(1), 66. [46] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167. [47] Lletı, R., Ortiz, M. C., Sarabia, L. A., & Sánchez, M. S. (2004). Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. Analytica Chimica Acta, 515(1), 87-100. [48] Maskeri, G., Sarkar, S., & Heafield, K. (2008, February). Mining business topics in source code using latent dirichlet allocation. In Proceedings of the 1st India software engineering conference (pp. 113-120). ACM. [49] Massara, F., Liu, S. S., & Melara, R. D. (2010). Adapting to a retail environment: Modeling consumer–environment interactions. Journal of Business Research, 63(7), 673-681. [50] Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007, June). Subject metadata enrichment using statistical topic models. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 366-375). ACM. [51] Proshansky, H. M., Ittelson, W. H., & Rivlin, L. G. (1970). Freedom of choice and behavior in a physical setting. Environmental psychology: Man and his physical setting, 173-183. [52] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135. [53] Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143-157. [54] Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65. [55] Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of research in Personality, 11(3), 273-294. [56] Russell, J. A., & Steiger, J. H. (1982). The structure in persons` implicit taxonomy of emotions. Journal of Research in Personality, 16(4), 447-469. [57] Strapparava, C., & Valitutti, A. (2004, May). Wordnet affect: an affective extension of wordnet. In Lrec (Vol. 4, pp. 1083-1086). [58] Soliman, T. H. A., Elmasry, M. A., Hedar, A. R., & Doss, M. M. (2012, October). Utilizing support vector machines in mining online customer reviews. In Computer Theory and Applications (ICCTA), 2012 22nd International Conference on (pp. 192-197). IEEE. [59] Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the Association for Information Science and Technology, 61(12), 2544-2558. [60] Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the Association for Information Science and Technology, 62(2), 406-418. [61] Tseng, Y. C., & Lee, W. C. (2016). Investor Sentiment and ETF Liquidity-Evidence from Asia Markets. Advances in Management and Applied Economics, 6(1), 89. [62] Wells, P. S., Anderson, D. R., Rodger, M., Stiell, I., Dreyer, J. F., Barnes, D., ... & Kovacs, M. J. (2001). Excluding pulmonary embolism at the bedside without diagnostic imaging: management of patients with suspected pulmonary embolism presenting to the emergency department by using a simple clinical model and d-dimer. Annals of internal medicine, 135(2), 98-107. [63] Berlyne, D. E. (1971). Aesthetics and psychology. [64] Jonathan Taplin, Twitter tool delves into the sentiment of social media, 2013 [65] Mishne, G., & Glance, N. (2006, May). Leave a reply: An analysis of weblog comments. In Third annual workshop on the Weblogging ecosystem. |
Description: | 碩士 國立政治大學 資訊管理學系 105356032 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0105356032 |
Data Type: | thesis |
DOI: | 10.6814/THE.NCCU.MIS.005.2018.A05 |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
603201.pdf | 2155Kb | Adobe PDF2 | 9 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|