政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/67862
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113451/144438 (79%)
造访人次 : 51263161      在线人数 : 646
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/67862


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/67862


    题名: 巨量資料環境下之新聞主題暨輿情與股價關係之研究
    A Study of the Relevance between News Topics & Public Opinion and Stock Prices in Big Data
    作者: 張良杰
    Chang, Liang Chieh
    贡献者: 楊建民
    張良杰
    Chang, Liang Chieh
    关键词: 巨量資料
    文字探勘
    新聞主題偵測與追蹤
    連結分析
    情感分析
    Big data
    Text mining
    News topic detection and tracking
    Link analysis
    Sentiment analysis
    日期: 2013
    上传时间: 2014-07-29 16:03:34 (UTC+8)
    摘要: 近年來科技、網路以及儲存媒介的發達,產生的資料量呈現爆炸性的成長,也宣告了巨量資料時代的來臨。擁有巨量資料代表了不必再依靠傳統抽樣的方式來蒐集資料,分析數據也不再有資料收集不足以致於無法代表母題的限制。突破傳統的限制後,巨量資料的精隨在於如何從中找出有價值的資訊。
    以擁有大量輿論和人際互動資訊的社群網站為例,就有相關學者研究其情緒與股價具有正相關性,本研究也試著利用同樣具有巨量資料特性的網路新聞,抓取中央新聞社2013年7月至2014年5月之經濟類新聞共計30,879篇,結合新聞主題偵測與追蹤技術及情感分析,利用新聞事件相似的概念,透過連結匯聚成網絡並且分析新聞的情緒和股價指數的關係。
    研究結果顯示,新聞事件間可以連結成一特定新聞主題,且能在龐大的網絡中找出不同的新聞主題,並透過新聞主題之連結產生新聞主題脈絡。對此提供一種新的方式來迅速了解巨量新聞內容,也能有效的回溯新聞主題及新聞事件。
    在新聞情緒和股價指數方面,研究發現新聞情緒影響了股價指數之波動,其相關係數達到0.733562;且藉由情緒與心理線及買賣意願指標之比較,顯示新聞的情緒具有一定的程度能夠成為股價判斷之參考依據。
    In recent years, the technology, network, and storage media developed, the amount of generated data with the explosive growth, and also declared the new era of big data. Having big data let us no longer rely on the traditional sample ways to collect data, and no longer have the issue that could not represent the population which caused by the inadequate data collection. Once we break the limitations, the main spirit of big data is how to find out the valuable information in big data.
    For example, the social network sites (SNS) have a lot of public opinions and interpersonal information, and scholars have founded that the emotions in SNS have a positive correlation with stock prices. Therefore, the thesis tried to focus on the news which have the same characteristic of big data, using the web crawl to catch total of 30,879 economics news articles form the Central News Agency, furthermore, took the “Topic Detection & Tracking” and “Sentiment Analysis” technology on these articles. Finally, based on the concept of the similarity between news articles, through the links converging networks and analyze the relevant between news sentiment and stock prices.
    The results shows that news events can be linked to specific news topics, identify different news topics in a large network, and form the news topic context by linked news topics together. The thesis provides a new way to quickly understand the huge amount of news, and backtracking news topics and news event with effective.
    In the aspect of news sentiment and stock prices, the results shows that the news sentiments impact the fluctuations of stock prices, and the correlation coefficient is 0.733562. By comparing the emotion with psychological lines & trading willingness indicators, the emotion is better than the two indicators in the stock prices determination.
    參考文獻: Adler, E. (2013). Here`s Why `The Internet Of Things` Will Be Huge, And Drive Tremendous Value For People And Businesses. Retrieved from Business Insider website: http://www.businessinsider.com/growth-in-the-internet-of-things-2013-10
    Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
    Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer networks, 54(15), 2787-2805.
    Ballve, M. (2013). Big Data Will Drive The Next Phase Of Innovation In Mobile Computing. Retrieved from Business Insider website: http://www.businessinsider.com/big-data-is-growing-thanks-to-mobile-2013-12
    Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., & Goldstein, G. (2011). Identifying and following expert investors in stock microblogs. Paper presented at the Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8.
    Brown, G. W. (1999). Volatility, sentiment, and noise traders. Financial Analysts Journal, 82-90.
    Cambria, E., Rajagopal, D., Olsher, D., & Das, D. (2013). Big social data analysis. Big Data Computing, 401-414.
    Chen, C., Chen, Y.-T., Sun, Y., & Chen, M. (2003). Life Cycle Modeling of News Events Using Aging Theory. In N. Lavrač, D. Gamberger, H. Blockeel & L. Todorovski (Eds.), Machine Learning: ECML 2003 (Vol. 2837, pp. 47-59): Springer Berlin Heidelberg.
    Cieri, C., Strassel, S., Graff, D., Martey, N., Rennert, K., & Liberman, M. (2002). Corpora for topic detection and tracking Topic detection and tracking (pp. 33-66): Springer.
    Davenport, T. H., & Dyché, J. (2013). Big Data in Big Companies.
    Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
    Devitt, A., & Ahmad, K. (2007). Sentiment polarity identification in financial news: A cohesion-based approach. Paper presented at the ACL.
    Esuli, A., & Sebastiani, F. (2006). Determining Term Subjectivity and Term Orientation for Opinion Mining. Paper presented at the EACL.
    Feldman, R. (2013). Techniques and applications for sentiment analysis. Commun. ACM, 56(4), 82-89. doi: 10.1145/2436256.2436274
    Feldman, R., Rosenfeld, B., Bar-Haim, R., & Fresko, M. (2011). The stock sonar—sentiment analysis of stocks based on a hybrid approach. Paper presented at the Twenty-Third IAAI Conference.
    Gantz, J., & Reinsel, D. (2012). THE DIGITAL UNIVERSE IN 2020: Big Data,
    Bigger Digital Shadow s, and Biggest Grow th in
    the Far East. IDC: IDC.
    Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. Paper presented at the ACM SIGOPS Operating Systems Review.
    Gloor, P. A., Krauss, J., Nann, S., Fischbach, K., & Schoder, D. (2009). Web science 2.0: Identifying trends through semantic social network analysis. Paper presented at the Computational Science and Engineering, 2009. CSE`09. International Conference on.
    Gold, M. K. (2012). Debates in the Digital Humanities: University of Minnesota Press.
    Handcock, M. S., Raftery, A. E., & Tantrum, J. M. (2007). Model‐based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(2), 301-354.
    Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.
    Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews.
    Huang, Y.-L. (2013). The Asymmetric Effect of Investor Sentiment and Stock Returns.
    IBM. What is big data? Bringing big data to the enterprise. Retrieved 3/15, 2014, from http://www-01.ibm.com/software/au/data/bigdata/
    Ikeda, D., Fujiki, T., & Okumura, M. (2006). Automatically Linking News Articles to Blog Entries. Paper presented at the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
    Intel. What Happens In An Internet Minute? Retrieved 3/21, 2014, from http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html
    Issenberg, S. (2013). How president obama`s campaign used big data to rally individual voters. Technology Review, 116(1), 38-49.
    Ku, L.-W., Lo, Y.-S., & Chen, H.-H. (2007). Using polarity scores of words for sentence-level opinion extraction. Paper presented at the Proceedings of NTCIR-6 workshop meeting.
    Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety.
    Laney, D. (2012). The Importance of `Big Data`: A Definition: Gartner.
    Lin, F.-r., & Liang, C.-H. (2008). Storyline-based summarization for news topic retrospection. Decision Support Systems, 45(3), 473-490.
    Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
    Liu, B., Mobasher, B., & Nasraoui, O. (2011). Web Usage Mining Web Data Mining (pp. 527-603): Springer Berlin Heidelberg.
    Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
    Magnusson, J. (2012). Social Network Analysis Utilizing Big Data Technology.
    Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., & Vassilakis, T. (2010). Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 3(1-2), 330-339.
    Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database*. International journal of lexicography, 3(4), 235-244.
    Mishne, G. (2006). Multiple ranking strategies for opinion retrieval in blogs. Paper presented at the Online Proceedings of TREC.
    Mohammad, S. M., & Turney, P. D. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.
    NIST. (2004). 2004 Topic Detection and Tracking (TDT-2004) Evaluation. Retrieved 12/25, 2013, from http://www.itl.nist.gov/iad/mig/tests/tdt/2004/
    Normandeau, K. (2013). Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity. Retrieved 3/21, 2014, from http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/
    Papka, R. (1999). On-line new event detection, clustering, and tracking. University of Massachusetts Amherst.
    Popescu, A. R. (2001). Implementation of term weighting in a simple IR system.
    Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
    Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval.
    Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
    Scherer, M. (2012). Inside the secret world of the data crunchers who helped Obama win. swampland. time. com/2012/11/07/inside-thesecret-world-of-quants-and-data-crunchers-who-helped-obama-win.
    Stone, P., Dunphy, D. C., Smith, M. S., & Ogilvie, D. (1968). The general inquirer: A computer approach to content analysis. Journal of Regional Science, 8(1), 113-116.
    Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th annual meeting on association for computational linguistics.
    Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
    Uramoto, N., & Takeda, K. (1998). A method for relating multiple newspaper articles by using graphs, and its application to webcasting. Paper presented at the Proceedings of the 17th international conference on Computational linguistics-Volume 2.
    Vigna, P. (2013). Stocks Plunge, Quickly Recover, on Fake Tweet. Retrieved from The Wall Street Journal website: http://blogs.wsj.com/moneybeat/2013/04/23/stocks-plunge-quickly-recover-on-fake-tweet/
    Vu, D. Q., Hunter, D. R., & Schweinberger, M. (2013). Model-based clustering of large networks. The Annals of Applied Statistics, 7(2), 1010-1039.
    Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. Paper presented at the Proceedings of the conference on human language technology and empirical methods in natural language processing.
    Wu, H.-H., CHARNG-RURNG TSAI, A., TZONG-HAN TSAIi, R., & YUNG-JEN HSU, J. (2013). Building a Graded Chinese Sentiment Dictionary Based on Commonsense Knowledge for Sentiment Analysis of Song Lyrics. Journal of Information Science & Engineering, 29(4).
    Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.
    Yang, Y., Pierce, T., Archibald, B. T., Carbonell, J. G., Brown, R. D., & Liu, X. (1999). Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, 14(4), 32-43.
    Zhang, W., & Skiena, S. (2010). Trading Strategies to Exploit Blog and News Sentiment. Paper presented at the ICWSM.
    古倫維. (2000). 中英文新聞文件主題偵測方法之研究. 國立臺灣大學.
    李啟菁. (2010). 中文部落格文章之意見分析. (碩士), 國立台北科技大學.
    胡家瑜. (2009). 追蹤進行中新聞議題產生事件主軸摘要. 清華大學. Available from Airiti AiritiLibrary database. (2009年)
    孫瑛澤, 陳建良, 劉峻杰, 劉昭麟, & 蘇豐文. (2010). 中文短句之情緒分類.
    婁鑫坡, 柴., 昝紅英,韓英傑. (2012). 微博情感倾向性分析.
    許凱玲. (2011). Twitter「情緒指數」成預測股市走勢利器. Retrieved from 數位時代 website: http://www.bnext.com.tw/focus/view/cid/103/id/20060
    郭敏華. (2009). 如何測量投資人情緒?.
    戴尚學. (2003). 運用事件偵測與追蹤技術於中文多文件摘要之研究.
    描述: 碩士
    國立政治大學
    資訊管理研究所
    101356002
    102
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0101356002
    数据类型: thesis
    显示于类别:[資訊管理學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    600201.pdf2527KbAdobe PDF2167检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈