政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/137167
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113648/144635 (79%)
Visitors : 51588900      Online Users : 823
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/137167


    Title: 應用深度強化學習演算法於資產配置優化之比較
    Comparison of Deep Reinforcement Learning Algorithms For Optimizing Portfolio Management
    Authors: 黃牧天
    Huang, Mu-Tien
    Contributors: 胡毓忠
    Hu, Yuh-Jong
    黃牧天
    Huang, Mu-Tien
    Keywords: 財務工程
    深度學習
    強化學習
    深度強化學習
    Financial Engineering
    Deep Learning
    Reinforcement Learning
    Deep Reinforcement Learning
    Date: 2021
    Issue Date: 2021-09-02 18:17:58 (UTC+8)
    Abstract: 本文主要有三個命題,命題一,深度強化學習模型應用於資產配置是否需財務時間序列與統計的背景知識?命題二,比較不同的深度強化學習演算法在不同市場情境下之優劣。命題三,比較深度強化學習演算法與現代投資組合理論之績效表現,深度強化學習演算法是否具有實務應用價值?以三命題剖析應用深度強化學習演算法於資產配置之各類比較,命題一研究成果顯示,使用特徵資料如符合深度強化學習模型前提假設之馬可夫性,將使模型具事半功倍之成效;命題二研究成果顯示,不同深度強化學習模型具不同偏差與方差權衡之特性,可對應於實務資產管理權衡績效與模型穩定度之取捨;命題三研究成果顯示,深度強化學習模型顯著優於現代投資組合理論之均值方差模型,並輔以客戶體驗角度論述其價值性;三類比較以貫穿本文主旨,期能以客觀公允之方式交付具意涵的比較分析結果,俾提升深度強化學習模型應用於資產配置之有效性。
    The purpose of this paper is three-fold. First, does the application of DRL require statistical (time-series) knowledge? The results revealed that using data that meets the model`s assumptions will make the model more effective. Second, compare the pros and cons of DRL algorithms in different market. The results revealed that building DRL algorithms are forced to make decisions about the bias and variance. Ultimately, asset management companies have to find the correct balance for their customers. Third, What is the value of DRL? Compare the performance of DRL and MVO in detail to explain the value of DRL. The results revealed that DRL is significantly better than MVO, which can solve the pain points of current customers.
    Reference: [1] AdvisoryHQ.COM. Comarison review, betterment vs wealthfront
    vs vanguard. https://www.advisoryhq.com/articles/
    betterment-vs-wealthfront-vs-vanguard-ranking-review/. [Online; accessed 17March2021].

    [2] Annasamy, R. M., and Sycara, K. Towards better interpretability in deep qnetworks.
    In Proceedings of the AAAI Conference on Artificial Intelligence (2019), vol. 33,
    pp. 4561–4569.

    [3] Black, F., and Litterman, R. Global portfolio optimization. Financial analysts journal
    48, 5 (1992), 28–43.

    [4] Bzdok, D., Altman, N., and Krzywinski, M. Points of significance: statistics versus machine learning, 2018.

    [5] Choi, B., and Choi, M. General solution of the black–scholes boundaryvalue
    problem.
    Physica A: Statistical Mechanics and its Applications 509 (2018), 546–550.

    [6] Choi, P.M.
    Reinforcement learning in nonstationary environments. Hong Kong
    University of Science and Technology (Hong Kong), 2000.

    [7] Cortes, C., and Vapnik, V. Supportvector
    networks. Machine learning 20, 3 (1995),
    273–297.

    [8] Cover, T. M. Universal portfolios. In The Kelly Capital Growth Investment Criterion:
    Theory and Practice. World Scientific, 2011, pp. 181–209.

    [9] Dankwa, S., and Zheng, W. Twindelayed
    ddpg: A deep reinforcement learning technique
    to model a continuous movement of an intelligent robot agent. In Proceedings
    of the 3rd International Conference on Vision, Image and Signal Processing (2019),
    pp. 1–5.

    [10] Degris, T., Pilarski, P. M., and Sutton, R. S. Modelfree
    reinforcement learning with
    continuous action in practice. In 2012 American Control Conference (ACC) (2012),
    IEEE, pp. 2177–2182.

    [11] Engle, R., and Granger, C. Longrun
    economic relationships: Readings in cointegration.
    Oxford University Press, 1991.

    [12] Fairbank, M., and Alonso, E. The divergence of reinforcement learning algorithms
    with valueiteration
    and function approximation. In The 2012 International Joint
    Conference on Neural Networks (IJCNN) (2012), IEEE, pp. 1–8.

    [13] Fan, J., Wang, Z., Xie, Y., and Yang, Z. A theoretical analysis of deep qlearning.
    In Learning for Dynamics and Control (2020), PMLR, pp. 486–489.

    [14] Filos, A. Reinforcement learning for portfolio management. arXiv preprint
    arXiv:1909.09571 (2019).

    [15] Fridman, M. Hidden markov model regression.

    [16] Fujimoto, S., Hoof, H., and Meger, D. Addressing function approximation error
    in actorcritic methods. In International Conference on Machine Learning (2018),
    PMLR, pp. 1587–1596.

    [17] Fürnkranz, J., Hüllermeier, E., Cheng, W., and Park, S.H.
    Preferencebased reinforcement learning: a formal framework and a policy iteration algorithm. Machine learning 89, 12
    (2012), 123–156.

    [18] Gappmair, W. Claude e. shannon: The 50th anniversary of information theory. IEEE
    Communications Magazine 37, 4 (1999), 102–105.

    [19] Gourieroux, C., Wickens, M., Ghysels, E., and Smith, R. J. Applied time series
    econometrics. Cambridge university press, 2004.

    [20] Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actorcritic:
    Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In International
    Conference on Machine Learning (2018), PMLR, pp. 1861–1870.

    [21] Kolm, P. N., and Ritter, G. Modern perspectives on reinforcement learning in finance.
    Modern Perspectives on Reinforcement Learning in Finance (September 6, 2019). The Journal of Machine Learning in Finance 1, 1 (2020).

    [22] Kolm, P. N., Tütüncü, R., and Fabozzi, F. J. 60 years of portfolio optimization:
    Practical challenges and current trends. European Journal of Operational Research
    234, 2 (2014), 356–371.

    [23] Kuan, C.M.
    Lecture on the markov switching model. Institute of Economics
    Academia Sinica 8, 15 (2002), 1–30.

    [24] Lam, J. W. Roboadvisors:
    A portfolio management perspective. Senior thesis, Yale College 20 (2016).

    [25] Lanne, M., Lütkepohl, H., and Maciejowska, K. Structural vector autoregressions
    with markov switching. Journal of Economic Dynamics and Control 34, 2 (2010),
    121–131.

    [26] Li, B., Zhao, P., Hoi, S. C., and Gopalkrishnan, V. Pamr: Passive aggressive mean
    reversion strategy for portfolio selection. Machine learning 87, 2 (2012), 221–258.

    [27] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and
    Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint
    arXiv:1509.02971 (2015).

    [28] Longstaff, F. A., and Schwartz, E. S. Interest rate volatility and the term structure: A
    twofactor general equilibrium model. The Journal of Finance 47, 4 (1992), 1259–1282.

    [29] Markowitz, H. The utility of wealth. Journal of political Economy 60, 2 (1952),
    151–158.

    [30] McCulloch, W. S., and Pitts, W. A logical calculus of the ideas immanent in nervous
    activity. The bulletin of mathematical biophysics 5, 4 (1943), 115–133.

    [31] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,
    Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Humanlevel
    control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.

    [32] Moerland, T. M., Broekens, J., and Jonker, C. M. Modelbased
    reinforcement learning: A survey. arXiv preprint arXiv:2006.16712 (2020).

    [33] Moody, J., and Wu, L. Optimization of trading systems and portfolios. In Proceedings
    of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) (1997), IEEE, pp. 300–307.

    [34] Nachum, O., Norouzi, M., Xu, K., and Schuurmans, D. Bridging the gap between
    value and policy based reinforcement learning. arXiv preprint arXiv:1702.08892 (2017).

    [35] Ng, A. Y., Russell, S. J., et al. Algorithms for inverse reinforcement learning. In
    Icml (2000), vol. 1, p. 2.

    [36] Onali, E., and Goddard, J. Are european equity markets efficient? new evidence
    from fractal analysis. International Review of Financial Analysis 20, 2 (2011), 59–67.

    [37] Perold, A. F. The capital asset pricing model. Journal of economic perspectives 18,
    3 (2004), 3–24.

    [38] Rasekhschaffe, K. C., and Jones, R. C. Machine learning for stock selection. Financial
    Analysts Journal 75, 3 (2019), 70–88.

    [39] Rasmussen, C. E. Gaussian processes in machine learning. In Summer school on
    machine learning (2003), Springer, pp. 63–71.

    [40] Rezaee, Z., Aliabadi, S., Dorestani, A., and Rezaee, N. J. Application of time series
    models in business research: Correlation, association, causation. Sustainability 12,
    12 (2020), 4833.

    [41] Rosenblatt, M. A central limit theorem and a strong mixing condition. Proceedings
    of the National Academy of Sciences of the United States of America 42, 1 (1956), 43.

    [42] Sato, Y. Modelfree
    reinforcement learning for financial portfolios: a brief survey.
    arXiv preprint arXiv:1904.04973 (2019).

    [43] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A. Winner’s curse? on pace,
    progress, and empirical rigor.

    [44] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. Deterministic
    policy gradient algorithms. In International conference on machine learning
    (2014), PMLR, pp. 387–395.

    [45] Statista. Personal finance report 2021. https://www.statista.com/outlook/dmo/
    fintech/personal-finance/robo-advisors/worldwide. [Online; accessed 10Jun2021].

    [46] Sutton, R. S., and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.

    [47] Sutton, R. S., McAllester, D. A., Singh, S. P., Mansour, Y., et al. Policy gradient
    methods for reinforcement learning with function approximation. In NIPs (1999),
    vol. 99, Citeseer, pp. 1057–1063.

    [48] Weinan, E., Han, J., and Jentzen, A. Deep learningbased
    numerical methods for highdimensional parabolic partial differential equations and backward stochastic
    differential equations. Communications in Mathematics and Statistics 5, 4 (2017), 349–380.

    [49] Xiong, J. X., and Idzorek, T. M. The impact of skewness and fat tails on the asset
    allocation decision. Financial Analysts Journal 67, 2 (2011), 23–35.

    [50] 金融監督管理委員會. 金融科技(fintech) 全球發展趨勢與證券市場應用評估.
    https://www.fsc.gov.tw. [Online; accessed 10Jun2021].
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    108971007
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0108971007
    Data Type: thesis
    DOI: 10.6814/NCCU202101194
    Appears in Collections:[Executive Master Program of Computer Science of NCCU] Theses

    Files in This Item:

    File Description SizeFormat
    100701.pdf2281KbAdobe PDF2306View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback