Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/131629
|
Title: | 深度強化學習的視覺化分析—以橫向卷軸遊戲為例 Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game |
Authors: | 鄭緒辰 Cheng, Hsu-Chen |
Contributors: | 紀明德 Chi, Ming-Te 鄭緒辰 Cheng, Hsu-Chen |
Keywords: | 視覺化分析 深度強化學習 橫向卷軸遊戲 Visual Analytics Deep Reinforcement Learning Side-scrolling Game |
Date: | 2020 |
Issue Date: | 2020-09-02 12:15:02 (UTC+8) |
Abstract: | 深度強化學習是人工智慧(AI)領域近年來常用於訓練不同電腦遊戲中的代理人(Agent)對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境,方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境,而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景,這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析,預期解決以下問題,第一,找出AI的傾向和限制。第二,分析AI的動作選擇以及遊玩策略。第三,藉由比較不同訓練時間的模型了解AI的學習歷程。第四,驗證AI是否有學習到方向及距離的重要性。為了解決上述問題,首先,我們利用A3C的深度強化學習架構,對環境和獎勵機制做調整,以增強AI進行遊戲的靈活度和適應性。接著,蒐集遊戲歷程和訓練資料。最後,制定視覺化分析,根據分析,可以提高研究人員對模型表現的解讀,降低改良深度學習模型的門檻。 In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model. |
Reference: | [1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74. [2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319 [3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3. [4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/. [5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602. [6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53 [7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806 [8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952 [9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527 [10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository, https://github.com/hardlyrichie/pytorch-flappy-bird [11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository, https://github.com/uvipen/Super-mario-bros-A3C-pytorch [12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658 [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015 [14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016). [15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros [16] https://www.romhacking.net/utilities/178/ [17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971 [18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286 |
Description: | 碩士 國立政治大學 資訊科學系 106753016 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106753016 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202001675 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
301601.pdf | | 4039Kb | Adobe PDF2 | 118 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|