政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/128992
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113648/144635 (79%)
造访人次 : 51606566      在线人数 : 820
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/128992


    题名: 應用PPO深度強化學習演算法於投資 : 組合之資產配置優化
    Applying Deep Reinforcement Learning Algorithm PPO for Portfolio Optimization
    作者: 林上人
    Lin, Shang-Jen
    贡献者: 胡毓忠
    Hu, Yuh-Jong
    林上人
    Lin, Shang-Jen
    关键词: 深度強化學習
    投資組合
    資產配置
    機器人理財
    Deep Reinforcement Learning
    Proximal Policy Optimization
    Portfolio Management
    Asset allocation
    Robo-Advisor
    日期: 2020
    上传时间: 2020-03-02 11:38:14 (UTC+8)
    摘要: 本研究結合深度強化學習和金融科技,探討深度強化學習技術於資產配置議題上的效益,希望建構的模型能同時擁有判斷及學習資產配置優化的能力,因此透過強化學習體現學習的過程,並以深度學習的特徵學習技術加強判斷的能力。利用PPO深度強化學習演算法與GRU循環神經網路的結合來針對路孚特資料庫進行資產配置,最終目標是結合資料、判斷及學習此三項要素產生一個智慧理財軟體代理者,依照經驗與歷史資料來判斷是否要進行投資,並決定資產分配的結果。藉此驗證PPO是否可有效配置資產並提高資產總價值。

    本研究在比較每日交易與每30日交易兩種情況時,每日交易會因導致手續費過高進而使報酬遠低於每30日交易,因此固定為每30日進行交易。接著透過調整GRU使用層數與修改數據組成天數進行研究,利用2006年到2016年的股票資料訓練模型,並使用2017到2018年的股票資料做測試。過程中發現在本實驗的實驗設定之下,產生的手續費對報酬的變化影響幅度不足以讓智慧理財軟體代理者因此學到需要考量降低手續費的投資策略,且初期投資資金大小設定讓智慧理財軟體代理者分配的資金大多時候皆不足以買入一張高股價之股票,導致持股變化多集中在股價低的股票。實驗最終得到每30天交易一次、單純使用PPO並且每個資料由7天組成的參數組合能夠得到相對較穩定,表現較好的智慧理財軟體代理者模型,並獲得7.39%的年化報酬率。
    This research integrates DRL and FinTech to discuss the benefit of the portfolio with DRL. Hoping to build the model with judgment and learning ability, therefore to practice the process of learning with reinforcement learning and strengthen judgment with deep learning. Using the algorithm of PPO and GRU recurrent neural networks to combine the Refinitiv Database for portfolio optimization. Combine database, judgment and learning, three features to make the smart financial software agent, according to the experience and historical data to decide to invest or not and come up with the asset allocation ratio.
    To verify the effect of PPO on portfolio optimization and increase the value of total assets.

    In this study, when comparing the daily transaction with the transaction every 30 days, the daily transactions would cause excessive fees and make the return far lower than every 30-day transaction. Therefore, the transaction was fixed at every 30 days. Next, the research was conducted by adjusting the number of GRU layers and modifying the data composition days.
    Training the model with 2006 to 2016 stock information then use 2017 to 2018 for testing.
    In the process, it was found that the amplitude of change on reward caused by the commission under the experimental conditions is not enough to allow the smart financial software agent to learn the investment strategy that would reduce the commission. Besides, the funds that the smart financial commissioner to allocate are often insufficient to buy a lot of high-priced stock. As a result, the shareholdings changes focused on stocks with a low price.

    The experiment finally obtained a parameter combination consisting of trading one time every 30 days, using only PPO, and each data consisting of 7 days, which can obtain a relatively stable and well-performing smart financial commissioner model. This smart financial commissioner obtained 7.39% annualized rate of return at the end.
    參考文獻: [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
    [2] R. Evans and J. Gao, "DeepMind AI reduces Google data centre cooling bill by 40%," Website, 2016, https://deepmind.com/blog/ deepmind-ai-reduces-google-data-centre-cooling-bill-40/.
    [3]
    BBC News, "Artificial intelligence: Google`s AlphaGo beats go master Lee Sedol," Website, 2016, https://www.bbc.com/news/technology-35785875.
    [4] R. A. Ferri, All About Asset Allocation. McGraw-Hill New York, 2010.
    [5]
    G. P. Brinson, B. D. Singer, and G. L. Beebower, "Determinants of portfolio performance ii: An update," Financial Analysts Journal, vol. 47, no. 3, pp. 40-48, 1991.
    [6] H. Markowitz, "Portfolio Selection," The Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952.
    [7] A. F. Perold and W. F. Sharpe, "Dynamic strategies for asset allocation," Financial Analysts Journal, vol. 44, no. 1, pp. 16-27, 1988.
    [8] J. C. Singleton, Core-satellite portfolio management. McGraw Hill Professional, 2004.
    [9] Y. Bengio, "Using a financial training criterion rather than a prediction criterion," International Journal of Neural Systems, vol. 8, no. 04, pp. 433-443, 1997.
    [10] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [11] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bandanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
    [12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv: 1412.3555, 2014.
    [13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
    [14] C. J. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992.
    [15] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, no. 3-4, pp. 229-256, 1992.
    [16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, 2000, pp. 1057-1063.
    [17] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, 2016, pp. 1928-1937.
    [18] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
    [19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and 0. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.068.47,2017.
    [20] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. Eslami, M. Riedmiller et al., "Emergence of locomotion behaviors in rich environments," arXiv preprint arXiv:1707.02286, 2017.
    [21] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," in International conference on machine learning, 2015, pp. 1889-1897.
    [22] T. Rollinger and S. Hoffman, "Sortino ratio: A better measure of risk," 2013.
    [23] J. Moody and M. Saffell, "Learning to trade via direct reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, 2001.
    [24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653-664, 2016.
    [25] Z. Xiong, X.-Y. Liu, S. Zhong, A. Walid, et al., "Practical deep reinforcement learning approach for stock trading," arXiv preprint arXiv:1811.07522, 2018.
    [26] Z. Liang, K. Jiang, H. Chen, J. Zhu, and Y. Li, "Deep reinforcement learning in portfolio management," arXiv preprint arXiv:1808.09940, 2018.
    [27] M. Hausknecht and P. Stone, "Deep recurrent q-learning for partially observable mdps," in 2015 AAAI Fall Symposium Series, 2015.
    [28] C. Y. Huang, "Financial trading as a game: A deep reinforcement learning approach," arXiv preprint arXiv:1807.02787, 2018.
    描述: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    106971001
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0106971001
    数据类型: thesis
    DOI: 10.6814/NCCU202000267
    显示于类别:[資訊科學系碩士在職專班] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    100101.pdf2627KbAdobe PDF20检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈