政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/128994

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 113648/144635 (79%)
Visitors : 51599926 Online Users : 873

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系碩士在職專班 > 學位論文 > Item 140.119/128994

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/128994

Title:	應用TD3深度強化學習演算法進行資產優化管理配置 Applying DRL TD3 Algorithm for Portfolio Management Optimization
Authors:	吳宇翔 Wu, Yu-Hsiang
Contributors:	胡毓忠 Hu, Yuh-Jong 吳宇翔 Wu, Yu-Hsiang
Keywords:	LSTM DRL TD3 資產配置 LSTM DRL TD3 Portfolio management
Date:	2020
Issue Date:	2020-03-02 11:38:39 (UTC+8)
Abstract:	AI 領域中的深度強化學習(Deep Reinforcement Learning，DRL)，透過不斷與環境互動來學習，從錯誤中學習、以極大化每一步決策的報酬，常用於決策最佳化，近年最知名的 AlphaGo 就是強化學習最具代表性的實例。DRL 適合用來模擬各種時序決策任務，為驗證此特性，本研究將此概念運用於最佳資產管理配置議題上。本研究致力於金融資產配置最佳化中的投資決策過程，實作深度強化學習 (Twin Delayed DDPG，TD3)及其變形(TD3+LSTM)演算法，找出最佳配置權重，以期最大化投資報酬，探究 TD3 應用於優化動態資產管理配置策略的適用性。本研究標的為台股 0050 ETF 成分股，並透過多項實驗進行驗證，其表現結果優於買進持有(Buy and Hold)及定期定額策略。 DRL(Deep Reinforcement Learning) in AI, by interacting with the environment continuously and learning from errors, maximizing the rewards of every step, usually applying to optimizing strategy decision, AlphaGo is the most concept to portfolio management optimization. This study engages in studying the process of deciding in optimizing portfolio management. Implementing Twin Delayed DDPG(TD3) and TD3+LSTM algorithms. Finding out the best representative one in DRL. This study will apply this weight of distribution, maximizing investment rewards. And check if TD3 is suitable for optimizing the strategy of dynamic portfolio management. This study using a member of 0050 ETF of Taiwan. After implementing several experiments, the performance of TD3 is better than the Buy and Hold strategy and Systematic Investment Plan.
Reference:	[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013. 1 [2] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. ArXiv, abs/ 1709.06560, 2017. 1 [3] Alex Irpan. Deep reinforcement learning doesn’t work yet. https://www. alexirpan.com/2018/02/14/rl-hard.html, 2018. 1 [4] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 3 [5] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. 6 [6] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4): 279–292, 1992. 6 [7] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, 2015. 6 [8] Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531–1538, 2002. 7 [9] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000. 7 [10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014. 8 [11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. 8 [12] Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approx- imation error in actor-critic methods. In ICML, 2018. 8, 17 [13] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. 12 [14] William Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of finance, 19(3):425–442, 1964. 12 [15] Gary P Brinson, L Randolph Hood, and Gilbert L Beebower. Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44, 1986. 13 [16] Gary P Brinson, Brian D Singer, and Gilbert L Beebower. Determinants of portfolio performance ii: An update. Financial Analysts Journal, 47(3):40–48, 1991. 13 [17] Andre F Perold and William F Sharpe. Dynamic strategies for asset allocation. Financial Analysts Journal, 140, 1995. 13 [18] Franois Balloux and Nicolas Lugon-Moulin. The estimation of population differ- entiation with microsatellite markers. Molecular ecology, 11(2):155–165, 2002. 13 [19] William F Sharpe. Integrated asset allocation. Financial Analysts Journal, 43(5): 25–32, 1987. 14 [20] Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. Ad- versarial Deep Reinforcement Learning in Portfolio Management. arXiv e-prints, page arXiv:1808.09940, Aug 2018. 2, 15 [21] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867, 2018. 2, 15, 23, 24 [22] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar El- walid. Practical deep reinforcement learning approach for stock trading. ArXiv, abs/1811.07522, 2018. 2, 15 [23] Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, and Sakyasingha Das- gupta. Model-based deep reinforcement learning for dynamic portfolio optimiza- tion. ArXiv, abs/1901.08740, 2019. 15 [24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664, March 2017. 16 [25] C. T. Chen, A. Chen, and S. Huang. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In 2018 IEEE International Conference on Agents (ICA), pages 34–37, July 2018. 16 [26] Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio man- agement. In ICBDR 2018, 2018. 16 [27] Xiang Gao. Deep reinforcement learning for time series: playing idealized trading games. ArXiv, abs/1803.03916, 2018. 17 [28] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep di- rect reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016. 38 [29] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867. IEEE, 2018. 38
Description:	碩士國立政治大學資訊科學系碩士在職專班 106971009
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0106971009
Data Type:	thesis
DOI:	10.6814/NCCU202000257
Appears in Collections:	[資訊科學系碩士在職專班] 學位論文

Files in This Item:

File	Size	Format
100901.pdf	5706Kb	Adobe PDF2	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback