政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/131629
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 114014/145046 (79%)
造访人次 : 52046123      在线人数 : 624
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/131629


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/131629


    题名: 深度強化學習的視覺化分析—以橫向卷軸遊戲為例
    Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game
    作者: 鄭緒辰
    Cheng, Hsu-Chen
    贡献者: 紀明德
    Chi, Ming-Te
    鄭緒辰
    Cheng, Hsu-Chen
    关键词: 視覺化分析
    深度強化學習
    橫向卷軸遊戲
    Visual Analytics
    Deep Reinforcement Learning
    Side-scrolling Game
    日期: 2020
    上传时间: 2020-09-02 12:15:02 (UTC+8)
    摘要: 深度強化學習是人工智慧(AI)領域近年來常用於訓練不同電腦遊戲中的代理人(Agent)對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境,方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境,而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景,這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析,預期解決以下問題,第一,找出AI的傾向和限制。第二,分析AI的動作選擇以及遊玩策略。第三,藉由比較不同訓練時間的模型了解AI的學習歷程。第四,驗證AI是否有學習到方向及距離的重要性。為了解決上述問題,首先,我們利用A3C的深度強化學習架構,對環境和獎勵機制做調整,以增強AI進行遊戲的靈活度和適應性。接著,蒐集遊戲歷程和訓練資料。最後,制定視覺化分析,根據分析,可以提高研究人員對模型表現的解讀,降低改良深度學習模型的門檻。
    In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model.
    參考文獻: [1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74.
    [2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319
    [3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3.
    [4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/.
    [5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602.
    [6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53
    [7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806
    [8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952
    [9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527
    [10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository,
    https://github.com/hardlyrichie/pytorch-flappy-bird
    [11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository,
    https://github.com/uvipen/Super-mario-bros-A3C-pytorch
    [12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658
    [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015
    [14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016).
    [15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros
    [16] https://www.romhacking.net/utilities/178/
    [17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971
    [18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286
    描述: 碩士
    國立政治大學
    資訊科學系
    106753016
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0106753016
    数据类型: thesis
    DOI: 10.6814/NCCU202001675
    显示于类别:[資訊科學系] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    301601.pdf4039KbAdobe PDF2118检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈