政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/111879
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113648/144635 (79%)
Visitors : 51665938      Online Users : 581
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/111879


    Title: 建構GDELT數位新聞分析流程於Spark大數據平台:以新聞事件影響力探究美國S&P股市指數變化為例
    Establishing GDELT digital news analytics pipeline on the Spark platform : exploiting news events influences on S&P stock index variations as an example
    Authors: 黃書瑋
    Huang, Shu Wei
    Contributors: 胡毓忠
    Hu, Yuh Jong
    黃書瑋
    Huang, Shu Wei
    Keywords: GDELT專案
    滾動式機器學習
    大數據分析流程
    新聞影響力
    亞馬遜網路服務
    GDELT project
    Rolling-Window machine learning
    Big data analysis pipeline
    News events influences
    AWS
    Date: 2017
    Issue Date: 2017-08-10 10:18:59 (UTC+8)
    Abstract: 於2013年正式公開的GDELT專案號稱能監控全球65種發行語言的數位新聞媒體,利用現今完善的機器學習演算法、自然語言處理及深度學習等先進人工智慧技術,將寶貴的新聞資料,萃取與轉換成具有58組欄位資訊的結構化資料,提供各領域進一步研究與應用。本研究以GDELT新聞事件資料集來開發大數據資料分析流程,並且利用Spark ML Pipeline的技術,在亞馬遜網路服務(AWS)的雲端平台上,完成以滾動式機器學習演算法,來進行以GDELT資料為主的美國標普500(S&P 500)股市指數追蹤,與特定「佔領華爾街」事件影響力的因果分析。本研究所採用的45天滾動式隨機森林模型,在歷史指數的追蹤與預測表現上,獲得了方均根差僅43.35(誤差2.12%)的優異成果;於雲端系統上的15分鐘近即時滾動式預測誤差,更是低於1.5%。在因果分析方面,本研究採用貝氏時間序列模型分析「佔領華爾街」事件影響股市的反事實指數,闡釋該事件的發生與後續效應,促使S&P 500股市指數在觀察區間中上漲116.76點。
    Reference: [1] Box, George EP, et al. Time series analysis: forecasting and control. John Wiley
    & Sons, 2015.
    [2] Breiman, Leo. ”Random forests.” Machine learning 45.1 (2001): 5-32.
    [3] Brodersen, Kay H., et al. ”Inferring causal impact using Bayesian structural timeseries models.” The Annals of Applied Statistics 9.1 (2015): 247-274.
    [4] Dietterich, Thomas G. ”Ensemble methods in machine learning.” International
    workshop on multiple classifier systems. Springer Berlin Heidelberg, 2000.
    [5] Elwert, Felix. ”Graphical causal models.” Handbook of causal analysis for social
    research. Springer Netherlands, 2013. 245-273.
    [6] Gerner, Deborah J., et al. ”Conflict and mediation event observations (CAMEO):
    A new event data framework for the analysis of foreign policy interactions.” International Studies Association, New Orleans (2002).
    [7] Granger, Clive WJ. ”Investigating causal relations by econometric models and
    cross-spectral methods.” Econometrica: Journal of the Econometric Society
    (1969): 424-438.
    [8] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. ”Overview of supervised
    learning.” The elements of statistical learning. Springer New York, 2009. 9-41.
    [9] Jiang, Lei, and Fan Mai. ”Discovering bilateral and multilateral causal events in
    GDELT.” international conference on social computing, behavioral-cultural modeling, and prediction, Washington, DC. 2014.
    [10] Kane, Michael J., et al. ”Comparison of ARIMA and Random Forest time series
    models for prediction of avian influenza H5N1 outbreaks.” BMC bioinformatics
    15.1 (2014): 276.
    [11] Keertipati, Swetha, et al. ”Multi-Level Analysis of Peace and Conflict Data in
    GDELT.” Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning
    for Sensory Data Analysis. ACM, 2014.
    [12] Kumar, Sumeet, Matthew Benigni, and Kathleen M. Carley. ”The impact of US
    cyber policies on cyber-attacks trend.” Intelligence and Security Informatics (ISI),
    2016 IEEE Conference on. IEEE, 2016.
    [13] Leetaru, Kalev, and Philip A. Schrodt. ”Gdelt: Global data on events, location, and
    tone, 1979ȉ 2012.” ISA Annual Convention. Vol. 2. No. 4. 2013.
    [14] Lindquist, Martin A., and Michael E. Sobel. ”Graphical models, potential outcomes and causal inference: Comment on Ramsey, Spirtes and Glymour.” NeuroImage 57.2 (2011): 334-336.
    [15] Neyman, Jersey. ”Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes.” Roczniki Nauk Rolniczych 10 (1923): 1-51.
    [16] Norris, Clayton. ”Petrarch 2: Petrarcher.” arXiv preprint arXiv: 1602.07236
    (2016).
    [17] Pai, Ping-Feng, and Chih-Sheng Lin. ”A hybrid ARIMA and support vector machines model in stock price forecasting.” Omega 33.6 (2005): 497-505.
    [18] Pearl, Judea. ”Graphical models, potential outcomes and causal inference: comment on Linquist and Sobel.” NeuroImage 58.3 (2011): 770.
    [19] Racette, Mark P., et al. ”Improving situational awareness for humanitarian logistics through predictive modeling.” Systems and Information Engineering Design
    Symposium (SIEDS), 2014. IEEE, 2014.
    [20] Rubin, Donald B. ”Causal inference using potential outcomes: Design, modeling,
    decisions.” Journal of the American Statistical Association 100.469 (2005): 322-
    331.
    [21] Schrodt, Philip A. ”Automated coding of international event data using sparse parsing techniques.” annual meeting of the International Studies Association, Chicago.
    2001.
    [22] Schrodt, Philip A., and Blake Hall. ”Twenty years of the Kansas event data system
    project.” Political Methodologist 14.1 (2006): 2-6.
    [23] Schrodt, Philip A., John Beieler, and Muhammed Idris. ”Threeȷ sa Charm?: Open
    Event Data Coding with EL: DIABLO, PETRARCH, and the Open Event Data
    Alliance.” ISA Annual Convention. 2014.
    [24] Wager, Stefan, and Susan Athey. ”Estimation and inference of heterogeneous treatment effects using random forests.” Journal of the American Statistical Association
    just-accepted (2017).
    [25] Yonamine, James E. A nuanced study of political conflict using the Global Datasets
    of Events Location and Tone (GDELT) dataset. Diss. The Pennsylvania State University, 2013.
    [26] Zaharia, Matei, et al. ”Resilient distributed datasets: A fault-tolerant abstraction
    for in-memory cluster computing.” Proceedings of the 9th USENIX conference on
    Networked Systems Design and Implementation. USENIX Association, 2012.
    [27] Zaharia, Matei, et al. ”Spark: Cluster computing with working sets.” HotCloud
    10.10-10 (2010): 95.
    [28] Zivot, Eric, and Jiahui Wang. ”Rolling Analysis of Time Series.” Modeling Financial Time Series with S-Plus®. Springer New York, 2003. 299-346.
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    104971002
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0104971002
    Data Type: thesis
    Appears in Collections:[Executive Master Program of Computer Science of NCCU] Theses

    Files in This Item:

    File SizeFormat
    100201.pdf3606KbAdobe PDF2243View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback