Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/125637
|
Title: | 消除深度學習目標函數中局部極小值之研究 A Survey on Eliminating Local Minima of the Objective Function in Deep Learning |
Authors: | 季佳琪 Chi, Chia-Chi |
Contributors: | 蔡炎龍 Tsai, Yen-Lung 季佳琪 Chi, Chia-Chi |
Keywords: | 深度學習 神經網路 目標函數 損失函數 局部極小值 Deep Learning Neural Network Objective Function Loss Function Local Minima |
Date: | 2019 |
Issue Date: | 2019-09-05 16:13:48 (UTC+8) |
Abstract: | 在本文中,我們主要研究消除目標函數的非最佳局部極小值的方法和其中的定理。 更具體地說,我們發現,在給定原始神經網絡的情況下,我們可以透過對其添加外加的神經網路層來建構一個修正的神經網絡。在這前提下,如果修正的神經網絡的目標函數達到局部最小值,則原始神經網絡的目標函數將達到絕對最小值。在接下來的內容中,我們首先回顧一些以前的相關文獻、概述何謂深度學習,並證明常見損失函數的凸性以滿足定理的假設。接下來,我們在主要定理中證明了一些細節、討論了此方法的效果,並研究了它的局限性。 最後,我們進行了一系列實驗來顯示此方法可以用於實際工作。 In this paper, we mainly survey the method and theorems of eliminating suboptimal local minima of the objective function. More specifically, we find that: given an original neural network, we can construct a modified network by adding external layers to it. Then if the objective function of the modified network achieve a local minimum, the objective function of the original neural network will reach a global minimum. We first review some previous related literature, give an overview of deep learning and then prove the convexity of common loss functions to satisfy the assumptions of theorems. Next, we prove some details in such theorems, discuss the effects of the method, and investigate its limitations. Finally, we perform a series of experiments to show that the method can be used for practical works. |
Reference: | [1] Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–1908–II–1916. JMLR.org, 2014. [2] Avrim L. Blum and Ronald L. Rivest. Training a 3-node neural network is np-complete. Neural Networks, 5(1):117 – 127, 1992. [3] Alon Brutzkus and Amir Globerson. Globally optimal gradient descent for a convnet with gaussian inputs. 02 2017. [4] Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar, October 2014. Association for Computational Linguistics. [5] Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015. [6] Simon S. Du and Jason D. Lee. On the power of over-parametrization in neural networks with quadratic activation. CoRR, abs/1803.01206, 2018. [7] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179 – 211, 1990. [8] Rong Ge, Jason D. Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with landscape design. CoRR, abs/1711.00501, 2017. [9] Surbhi Goel and Adam R. Klivans. Learning depth-three neural networks in polynomial time. CoRR, abs/1709.06010, 2017. [10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. [11] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv: 1308.0850, 2013. [12] Moritz Hardt and Tengyu Ma. Identity matters in deep learning. CoRR, abs/1611.04231, 2017. [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [14] Kenji Kawaguchi. Deep learning without poor local minima. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 586–594. Curran Associates, Inc., 2016. [15] KenjiKawaguchiandYoshuaBengio.Depthwithnonlinearitycreatesnobadlocalminima in resnets. arXiv preprint arXiv:1810.09038, 2018. [16] Kenji Kawaguchi, Jiaoyang Huang, and Leslie Pack Kaelbling. Effect of depth and width on local minima in deep learning. CoRR, abs/1811.08150, 2018. [17] Kenji Kawaguchi and Leslie Pack Kaelbling. Elimination of all bad local minima in deep learning. CoRR, abs/1901.00279, 2019. [18] AlexKrizhevsky,IlyaSutskever,andGeoffreyEHinton.Imagenetclassificationwithdeep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [19] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951. [20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436, 2015. [21] YannLeCun,LéonBottou,YoshuaBengio,PatrickHaffner,etal.Gradient-basedlearning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [22] Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. CoRR, abs/1705.09886, 2017. [23] Shiyu Liang, Ruoyu Sun, Jason D. Lee, and Rayadurgam Srikant. Adding one neuron can eliminate all bad local minima. Advances in Neural Information Processing Systems, 2018-December:4350–4360, 1 2018. [24] Katta G. Murty and Santosh N. Kabadi. Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39(2):117–129, Jun 1987. [25] Quynh Nguyen and Matthias Hein. Optimization landscape and expressivity of deep CNNs. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3730–3739, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. [26] Quynh N. Nguyen and Matthias Hein. The loss surface of deep and wide neural networks. CoRR, abs/1704.08045, 2017. [27] SebastianRuder.Anoverviewofgradientdescentoptimizationalgorithms.arXivpreprint arXiv:1609.04747, 2016. [28] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015. [29] Hanie Sedghi and Anima Anandkumar. Provable methods for training neural networks with sparse connectivity. arXiv preprint arXiv:1412.2693, 2014. [30] Ohad Shamir. Are resnets provably better than linear predictors? CoRR, abs/1804.06739, 2018. [31] ClaudeElwoodShannon.Amathematicaltheoryofcommunication.Bellsystemtechnical journal, 27(3):379–423, 1948. [32] Mahdi Soltanolkotabi. Learning relus via gradient descent. CoRR, abs/1705.04591, 2017. [33] Daniel Soudry and Elad Hoffer. Exponentially vanishing sub-optimal local minima in multilayer neural networks, 2018. [34] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990. [35] Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, and Inderjit S. Dhillon. Recovery guarantees for one-hidden-layer neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 4140–4149, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. |
Description: | 碩士 國立政治大學 應用數學系 1057510163 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G1057510163 |
Data Type: | thesis |
DOI: | 10.6814/NCCU201900936 |
Appears in Collections: | [應用數學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
016301.pdf | 5062Kb | Adobe PDF2 | 340 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|