政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/118242
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113318/144297 (79%)
造访人次 : 51019266      在线人数 : 929
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 金融學系 > 學位論文 >  Item 140.119/118242


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/118242


    题名: 建立ARIMA與SVR混合式模型,結合GDELT數位新聞資料集預測美元指數
    Constructing A Hybrid Model of ARIMA and SVR Algorithm with GDELT Digital News Dataset to Predict U.S. Dollar Index
    作者: 沈柏宇
    Shen, Po-Yu
    贡献者: 廖四郎
    Liao, Szu-Lang
    沈柏宇
    Shen, Po-Yu
    关键词: GDELT專案
    混合式模型
    美元指數
    GDELT project
    Hybrid model
    U.S. dollar index
    日期: 2018
    上传时间: 2018-07-03 17:27:00 (UTC+8)
    摘要: 新聞資訊為基本面分析的重要訊息來源,如何利用數位新聞資料輔助或彌補傳統計量模型的價格預測能力,首先借助具有規模且公開的數位新聞資料集 — GDELT 專案,豐富的新聞來源經過嚴謹的文字探勘與自然語言處理所得到的結構化資料,結合本研究提出的資料前處理方法,接續做為混合式模型中大數據分析方法的特徵值,用以預測美元指數的價格行為,比較不同模型之間的成效。
    針對時間序列的資料,本研究採用兩層的滾動窗格分析方法,作為模型成效評估依據的測試資料集選取三種不同的時間區間:發生歐債危機前(2009/06/02~2009/11/30,130筆日資料)、歐債危機擴散中(2009/12/01~2010/12/01,260筆日資料)與歐債危機過後(2017/01/02~2017/06/30,130筆日資料)。實作的成果顯示出,在發生歐債危機前與危機過後的兩個區間當中,有加入 GDELT 特徵值的混合式模型表現優於單純的 ARIAM 迴歸模型,歐債危機擴散中的表現則不然;本研究認為金融危機擴散期間,市場的價格與財金相關的新聞之間存在更強的鏈結,缺乏財金相關新聞資訊的 GDELT 資料集在此情境之下,模型的表現自然會受到限制甚至更差。
    實作的資料量體龐大,資料處理與計算的過程仰賴叢集式架構的平行運算,因此使用到 Google Cloud Platform 的雲端虛擬機租借服務,以及在虛擬機上方操作 Spark 叢集式運算平台,完成類即時的滾動式窗格分析流程。
    The information implied in the news is an important signal for fundamental analysis. In this research, we are going to improve the accuracy on price prediction of traditional econometric model with news messages. First of all, this research adopt the data from the GDELT Project which has abundant resources and well performed text mining technique. With series of data preprocessing, we build up several hybrid models made up of ARIMA model and big data analysis model, some of them take the preprocessed GDELT messages as features. Finally, performances of different models depend on the mean square error.
    In the rolling window analysis, this study take different periods of time as testing data sets : before the European debt crisis (2009/06/02~2009/11/30), under the crisis (2009/12/01~2010/12/01) and after the crisis (2017/01/02~2017/06/30). Results show that hybrid models with GDELT features have better performance than pure ARIMA model in the prediction of U.S. Dollar Index in the first and last period. However, those models work poorly in the European debt crisis.
    Considering the great volume of data, the pipeline of data preprocessing and data analysis relies on parallel operation of cluster architecture. In that way, this study use the virtual machines rent services supported by Google Cloud Platform and operate on PySpark to simulate real-time rolling window analysis.
    參考文獻: [1] 黃書瑋 (民106),建構GDELT數位新聞分析流程於Spark大數據平台:以新聞 事件影響力探究美國S&P股市指數變化為例,國立政治大學資訊科學系碩士在 職專班論文。
    [2] Bergmeir, Christoph, Rob J. Hyndman, & Bonsoo Koo. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis, 120, 70-83.
    [3] Caporale, G. M., Spagnolo, F., & Spagnolo, N. (2017). Macro news and exchange rates in the BRICS. Finance Research Letters, 21, 140-143.
    [4] Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning. New York: Springer series in statistics.
    [5] Gidofalvi, G., & Elkan, C. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego.
    [6] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.
    [7] Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: lightning-fast big data analysis. " O`Reilly Media, Inc.".
    [8] Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145).
    [9] Lizardo, R. A., & Mollick, A. V. (2010). Oil price fluctuations and US dollar exchange rates. Energy Economics, 32(2), 399-408.
    [10] Loretan, M. (2005). Indexes of the foriegn exchange value of the dollar. Fed. Res. Bull., 91, 1.
    [11] Mishra, S. (2017). Studying geo-conflict and cooperation over time using media reports: A case study using temporal geographical maps.
    [12] Mitchell, T. M. (1997). Machine learning. WCB.
    [13] Pai, P. F., & Lin, C. S. (2005). A hybrid ARIMA and support vector machines model in stock price forecasting. Omega, 33(6), 497-505.
    [14] Schrodt, P. (2012). Conflict and Mediation Event Observations event and actor codebook V. 1.1 b3.
    [15] Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and computing, 14(3), 199-222.
    [16] Tanenbaum, Andrew S., & Maarten Van Steen. (2017). Distributed Systems 3rd edition. Pearson Education, Inc.
    [17] Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of finance, 62(3), 1139-1168.
    [18] Wu, G. G. R., Hou, T. C. T., & Lin, J. L. (2018). Can economic news predict Taiwan stock market returns?. Asia Pacific Management Review.
    [19] Yoshioka, M., Allan, M. J. J., & Kando, N. (2018). Visualizing Polarity-based Stances of News Websites.
    [20] Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159-175.
    [21] Zhang, G., & Hu, M. Y. (1998). Neural network forecasting of the British pound/US dollar exchange rate. Omega, 26(4), 495-506.
    [22] Zhu, B., & Wei, Y. (2013). Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology. Omega, 41(3), 517-524.
    描述: 碩士
    國立政治大學
    金融學系
    105352034
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0105352034
    数据类型: thesis
    DOI: 10.6814/THE.NCCU.MB.003.2018.F06
    显示于类别:[金融學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    203401.pdf1177KbAdobe PDF210检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈