政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/125637
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113318/144297 (79%)
Visitors : 50993370      Online Users : 877
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/125637


    Title: 消除深度學習目標函數中局部極小值之研究
    A Survey on Eliminating Local Minima of the Objective Function in Deep Learning
    Authors: 季佳琪
    Chi, Chia-Chi
    Contributors: 蔡炎龍
    Tsai, Yen-Lung
    季佳琪
    Chi, Chia-Chi
    Keywords: 深度學習
    神經網路
    目標函數
    損失函數
    局部極小值
    Deep Learning
    Neural Network
    Objective Function
    Loss Function
    Local Minima
    Date: 2019
    Issue Date: 2019-09-05 16:13:48 (UTC+8)
    Abstract: 在本文中,我們主要研究消除目標函數的非最佳局部極小值的方法和其中的定理。 更具體地說,我們發現,在給定原始神經網絡的情況下,我們可以透過對其添加外加的神經網路層來建構一個修正的神經網絡。在這前提下,如果修正的神經網絡的目標函數達到局部最小值,則原始神經網絡的目標函數將達到絕對最小值。在接下來的內容中,我們首先回顧一些以前的相關文獻、概述何謂深度學習,並證明常見損失函數的凸性以滿足定理的假設。接下來,我們在主要定理中證明了一些細節、討論了此方法的效果,並研究了它的局限性。 最後,我們進行了一系列實驗來顯示此方法可以用於實際工作。
    In this paper, we mainly survey the method and theorems of eliminating suboptimal local minima of the objective function. More specifically, we find that: given an original neural network, we can construct a modified network by adding external layers to it. Then if the objective function of the modified network achieve a local minimum, the objective function of the original neural network will reach a global minimum. We first review some previous related literature, give an overview of deep learning and then prove the convexity of common loss functions to satisfy the assumptions of theorems. Next, we prove some details in such theorems, discuss the effects of the method, and investigate its limitations. Finally, we perform a series of experiments to show that the method can be used for practical works.
    Reference: [1] Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–1908–II–1916. JMLR.org, 2014.
    [2] Avrim L. Blum and Ronald L. Rivest. Training a 3-node neural network is np-complete. Neural Networks, 5(1):117 – 127, 1992.
    [3] Alon Brutzkus and Amir Globerson. Globally optimal gradient descent for a convnet with gaussian inputs. 02 2017.
    [4] Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar, October 2014. Association for Computational Linguistics.
    [5] Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015.
    [6] Simon S. Du and Jason D. Lee. On the power of over-parametrization in neural networks with quadratic activation. CoRR, abs/1803.01206, 2018.
    [7] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179 – 211, 1990.
    [8] Rong Ge, Jason D. Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with
    landscape design. CoRR, abs/1711.00501, 2017.
    [9] Surbhi Goel and Adam R. Klivans. Learning depth-three neural networks in polynomial time. CoRR, abs/1709.06010, 2017.
    [10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
    [11] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:
    1308.0850, 2013.
    [12] Moritz Hardt and Tengyu Ma. Identity matters in deep learning. CoRR, abs/1611.04231,
    2017.
    [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    [14] Kenji Kawaguchi. Deep learning without poor local minima. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 586–594. Curran Associates, Inc., 2016.
    [15] KenjiKawaguchiandYoshuaBengio.Depthwithnonlinearitycreatesnobadlocalminima in resnets. arXiv preprint arXiv:1810.09038, 2018.
    [16] Kenji Kawaguchi, Jiaoyang Huang, and Leslie Pack Kaelbling. Effect of depth and width on local minima in deep learning. CoRR, abs/1811.08150, 2018.
    [17] Kenji Kawaguchi and Leslie Pack Kaelbling. Elimination of all bad local minima in deep learning. CoRR, abs/1901.00279, 2019.
    [18] AlexKrizhevsky,IlyaSutskever,andGeoffreyEHinton.Imagenetclassificationwithdeep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
    [19] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
    [20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436, 2015.
    [21] YannLeCun,LéonBottou,YoshuaBengio,PatrickHaffner,etal.Gradient-basedlearning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    [22] Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. CoRR, abs/1705.09886, 2017.
    [23] Shiyu Liang, Ruoyu Sun, Jason D. Lee, and Rayadurgam Srikant. Adding one neuron can eliminate all bad local minima. Advances in Neural Information Processing Systems, 2018-December:4350–4360, 1 2018.
    [24] Katta G. Murty and Santosh N. Kabadi. Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39(2):117–129, Jun 1987.
    [25] Quynh Nguyen and Matthias Hein. Optimization landscape and expressivity of deep CNNs. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3730–3739, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
    [26] Quynh N. Nguyen and Matthias Hein. The loss surface of deep and wide neural networks. CoRR, abs/1704.08045, 2017.
    [27] SebastianRuder.Anoverviewofgradientdescentoptimizationalgorithms.arXivpreprint arXiv:1609.04747, 2016.
    [28] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
    [29] Hanie Sedghi and Anima Anandkumar. Provable methods for training neural networks with sparse connectivity. arXiv preprint arXiv:1412.2693, 2014.
    [30] Ohad Shamir. Are resnets provably better than linear predictors? CoRR, abs/1804.06739, 2018.
    [31] ClaudeElwoodShannon.Amathematicaltheoryofcommunication.Bellsystemtechnical journal, 27(3):379–423, 1948.
    [32] Mahdi Soltanolkotabi. Learning relus via gradient descent. CoRR, abs/1705.04591, 2017.
    [33] Daniel Soudry and Elad Hoffer. Exponentially vanishing sub-optimal local minima in multilayer neural networks, 2018.
    [34] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
    [35] Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, and Inderjit S. Dhillon. Recovery guarantees for one-hidden-layer neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 4140–4149, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
    Description: 碩士
    國立政治大學
    應用數學系
    1057510163
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G1057510163
    Data Type: thesis
    DOI: 10.6814/NCCU201900936
    Appears in Collections:[Department of Mathematical Sciences] Theses

    Files in This Item:

    File SizeFormat
    016301.pdf5062KbAdobe PDF2340View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback