政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/112151
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113160/144130 (79%)
造访人次 : 50755377      在线人数 : 722
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/112151


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/112151


    题名: 應用情感分析於指數型證券投資信託基金趨勢預測之研究
    Research into sentimental analysis to predict exchange-traded fund trend
    作者: 黃泓銘
    Huang, Hung-Ming
    贡献者: 姜國輝
    Chiang, Kuo-Huie
    黃泓銘
    Huang, Hung-Ming
    关键词: 情感分析
    LDA主題模型
    支援向量機
    ETF
    Sentimental analysis
    LDA
    SVM
    ETF
    日期: 2017
    上传时间: 2017-08-28 11:24:26 (UTC+8)
    摘要: 近年來ETF規模快速成長,亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源,而元大台灣50指數型證券投資信託基金因規模大,受到投資人的青睞。根據過去的研究指出,網路上的文本訊息會對群眾情緒造成影響,進而影響股價波動,對投資者而言,若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢,勢必可提高報酬率。然而,每日有上百篇的財金文本產生,人工分析耗時耗力,本研究採用文字探勘技術,提出一套情感分析的價格預測模型。
    過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,然而,為解決監督式學習無法預期未知的限制,本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別,計算情緒指數並標記文本情緒傾向,再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等,建立分類模型以預測元大台灣50ETF的價格趨勢。
    實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏,使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性,使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面,證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。
    本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率,故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。
    Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet.
    In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index.
    According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only.
    參考文獻: [ 1 ] Baker, M. and J. Wurgler. Investor sentiment and the cross-section of stock returns, Journal of Finance, 4, 1645-1680, 2006
    [ 2 ] Ballve, M.. Big Data Will Drive The Next Phase Of Innovation In Mobile Computing, 2013
    [ 3 ] Barber, B.“Noise trader risk, odd-lot trading, and security returns,” Working Paper, University of California at Davis, 1999
    [ 4 ] Chan WJ, Cheng KC, Shieh JM, Fong Y, Chang JM, Chuang SS, Ko SC., Mediastinal hemangiomatosis. Thorac Med , 19,125-131, 2004
    [ 5 ] Corinna Cortes Vladimir Vapnik, “Support-Vector networks” Machine Learning, pp.273-297, 1995
    [ 6 ] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation.Journal of Machine Learning Research, 3:993–1022,January 2003.
    [ 7 ] Devitt, A. and K. Ahmad 2007. Sentiment Polarity Identification in Financial News: A Cohesion-Based Approach. Association of Computational Linguistics, Prague, Czech Republic.
    [ 8 ] E. Cambria and A. Hussain. Sentic Computing: Techniques, Tools, and Applications. Dordrecht, Netherlands: Springer, ISBN: 978-94-007-5069-2, 2012
    [ 9 ] Feldman, Techniques and applications for sentiment analysis, 2013
    [ 10 ] Giovanni Vigna, The wall street journal-0424, 2013
    [ 11 ] Griffiths, T. L., & Steyvers, M. Finding scientific topics. Proceedings of the National Academy of Sciences, 101, 5228-5235, 2004
    [ 12 ] H. (Sam) Han, G. Karypis, and V. Kumar, “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,” in Advances in Knowledge Discovery and Data Mining, D. Cheung, G. J. Williams, and Q. Li, Eds. Springer Berlin Heidelberg, 2001, pp. 53–65.
    [ 13 ] Harris Drucker, Support Vector Machines for Spam Categorization, 1999
    [ 14 ] Johan Bollen1, Huina Mao1, Xiao-Jun Zeng. Twitter mood predicts the stock market. 2010
    [ 15 ] Jonathan Taplin, Twitter tool delves into the sentiment of social media, 2013
    [ 16 ] Kumar, A., Lee, C. M. C. Retail Investor Sentiment and Return Comovements, 2006
    [ 17 ] Liu, “Sentiment Analysis and Opinion Mining,” Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167, May 2012.
    [ 18 ] M. Qamar, E. Gaussier, J.-P. Chevallet, and J.-H. Lim, “Similarity Learning for Nearest Neighbor Classification,” in Eighth IEEE International Conference on Data Mining, 2008. ICDM ’08, pp. 983–988, 2008
    [ 19 ] Mishne, G. and de Rijke, M., MoodViews: Tools for Blog Mood Analysis, AAAI 2006 Spring Symposium on Computational Approaches to analyzing Weblogs (AAAI-CAAW2006), 2006.
    [ 20 ] Newman, Hage, Chemudugunta, Smyth. Subject Metadata Enrichment using Statistical Topic Models. JCDL : 366-375, 2007
    [ 21 ] Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found. Trends Inf. Retr., vol. 2, no. 1–2, pp. 1–135, Jan. 2008.
    [ 22 ] Pang and Lee. Opinion mining and sentiment analysis, 2008
    [ 23 ] Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, Stroudsburg, PA, USA, pp. 79–86, 2002
    [ 24 ] Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, Stroudsburg, PA, USA, pp. 79–86, 2002
    [ 25 ] Soliman, Utilizing support vector machines in mining online customer reviews, 2012
    [ 26 ] Sui, Y. Jianping, Z. Hongxian, and Z. Wei, “Sentiment analysis of Chinese micro-blog using semantic sentiment space model,” in 2012 2nd International Conference on Computer Science and Network Technology (ICCSNT), pp. 1443–1447,2012
    [ 27 ] Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based Methods for Sentiment Analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267–307, Jun. 2011.
    [ 28 ] Thorsten Joachims, SVM-Light Support Vector Machine, 2008
    [ 29 ] Zheng and Y. Tian, “Chinese Web Text Classification System Model Based on Naive Bayes,” in 2010 International Conference on E-Product E-Service and E-Entertainment (ICEEE), pp. 1–4, 2010
    [ 30 ] 王波, 郭曉軍. 基地情感分析的網絡財經媒體通貨膨脹預期研究 55(16): 140-143.(CSSCI), 2011
    [ 31 ] 王濟川,郭志剛. Logistic 迴歸模型-方法及應用, 2003.
    [ 32 ] 台灣證券交易所, 投資人開戶統計表, 2016
    [ 33 ] 投信投顧公會, 投信投顧公會基金績效表, 2015
    [ 34 ] 李啟菁,王正豪. “中文部落格文章之意見分析”, 2010
    [ 35 ] 林冠中. 漸進式支持向量機於人臉辨識之應用, 2005
    [ 36 ] 林彩雯, 以Google App 評論為字詞權重調整之情緒分析系統, 2015
    [ 37 ] 林育龍, 對使用者評論之情感分析研究-以Google Play市集為例, 2014
    [ 38 ] 林昱廷, 我國掛牌ETF的種類及交易概況介紹, 證券暨期貨月刊34(3), 5-15, 2016
    [ 39 ] 洪崇洋, 以LDA 和使用紀錄為基礎的線上電子書主題趨勢發掘方法, 2012
    [ 40 ] 徐中琦, 黃銘遠. 公開資訊之資訊內涵與投資人在不同情緒下投資行為之研究, 2014
    [ 41 ] 張良杰. 巨量資料環境下之新聞主題暨輿情與股價關係之研究, 2014
    [ 42 ] 郭俊桔,張育蓉. 使用情緒分析於圖書館使用者滿意度評估之研究, 2013
    [ 43 ] 郭敏華, 如何測量投資人情緒?, 2009
    [ 44 ] 陳信源, 葉鎮源, 林昕潔, 黃明居, 柯皓仁, 楊維邦, & 圖書館. 結合支援向量機與詮釋資料之圖書自動分類方法. 資訊科技國際期刊, 3(1), 2-21, 2009
    [ 45 ] 游和正, 黃挺豪, 陳信希. 領域相關詞彙極性分析及文件情緒分類之研究, 2012
    [ 46 ] 黃承龍, 陳穆臻, & 王界人. 支援向量機於信用評等之應用: 計量管理期刊, 2004
    [ 47 ] 黃純敏,應用LDA進行Plurk主題分類及使用者情緒分析,2014
    [ 48 ] 黃運高,王妍,邱武松,向林泓,趙學良.基于K-means和TF-IDF的中文藥名聚類分析, 2014
    [ 49 ] 經濟部統計處, 工業生產資訊年報, 2014
    [ 50 ] 萬常選, 江騰蛟, 鍾敏娟, 邊海容. 基於詞性標註和依存句法的 Web 金融信息情感計算, 2013
    [ 51 ] 葉又豪, 運用文字探勘分析非量化資訊協助投資人預測公司財務表現, 2012
    [ 52 ] 劉吉軒, 吳建良, “以情緒為中心之情境資訊觀察與評估, ” 2007NCS全國計算機會議, pp. 12-20~21, 2007
    [ 53 ] 劉羿廷, 運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究, 2015
    [ 54 ] 劉鵬,滕家雨. 基於Spark的大規模文本k-means並行聚類算法, 2014
    [ 55 ] 蔡正修, 台灣電子類股價指數趨勢預測之研究, 2007
    [ 56 ] 談成訪, 基於LDA模型的新聞話題分類研究, 2014
    [ 57 ] 魏晶晶,吳曉吟. 電子商務產品評論多級情感分析的研究與實現, 2013
    [ 58 ] 龔建彰, 基於新聞字詞漲跌極性之股價趨勢分類預測, 2014
    描述: 碩士
    國立政治大學
    資訊管理學系
    103356022
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0103356022
    数据类型: thesis
    显示于类别:[資訊管理學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    602201.pdf1320KbAdobe PDF272检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈