Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/157810
|
Title: | 結合主成分分析之神經元剪枝方法應用於優化孿生神經網路 An Optimization Approach for Siamese Neural Networks Using Principal Component Analysis-Based Neuron Pruning |
Authors: | 王冀鋼 Wang, Chi-Kang |
Contributors: | 周珮婷 Chou, Pei-Ting 王冀鋼 Wang, Chi-Kang |
Keywords: | 孿生神經網路 神經元剪枝 主成分分析 非結構化資料 模型簡化 分類 Siamese Neural Network Principal Component Analysis Neuron Pruning Unstructured Data Model Simplification Classification |
Date: | 2025 |
Issue Date: | 2025-07-01 15:03:45 (UTC+8) |
Abstract: | 神經網路模型在各類應用中展現出強大的預測能力,但其超參數設定仍然是影響模型效能的重要挑戰,尤其是在神經元數量的選擇上。當神經元數量不足時,模型往往難以捕捉數據中的複雜模式,導致預測精度下降;反之,過多的神經元則會大幅增加參數規模和計算成本,同時可能引發過擬合等問題。針對此一困境,本研究提出一種基於主成分分析的神經元剪枝策略,旨在對預訓練神經網路模型中的神經元權重進行解析,並篩選出具有代表性的神經元。為驗證所提出方法的適用性與普遍性,本研究設計了一系列實驗,利用適合在少量資料環境的孿生神經網路,分別針對結構性資料與非結構型資料進行訓練和預測,並紀錄和比較不同神經元配置下的模型預測結果。最後結果顯示,經過此方法挑選神經元不僅有效減少模型參數,在較高的累積解釋變異數比例下,簡化後的模型預測表現甚至優於預訓練模型。 Neural network models have demonstrated strong predictive capabilities across a wide range of applications. However, the tuning of hyperparameters remains a critical challenge affecting model performance, particularly in determining the appropriate number of neurons. When the number of neurons is insufficient, the model often fails to capture the complex patterns inherent in the data, leading to reduced predictive accuracy. Conversely, an excessive number of neurons significantly increases the parameter scale and computational cost, and may also result in overfitting. To address this issue, this study proposes a neuron pruning strategy based on Principal Component Analysis(PCA), which aims to analyze the weights of neurons in a pre-trained neural network and identify a subset of representative neurons. To evaluate the applicability and generalizability of the proposed method, a series of experiments were conducted using Siamese Neural Networks(SNN), which are suitable for low-data scenarios. The experiments were performed on both structured and unstructured datasets, where models were trained and tested under various neuron configurations. The results show that the neuron selection method not only effectively reduces the number of model parameters, but also enables the simplified models to achieve predictive performance that surpasses that of the original pretrained models, particularly when a high cumulative explained variance ratio is retained. |
Reference: | Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4), 433-459.
Bengio, Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade: Second edition (pp. 437-478). Berlin, Heidelberg: Springer Berlin Heidelberg.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. The journal of machine learning research, 13(1), 281-305.
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1993). Signature verification using a " siamese " time delay neural network. Advances in neural information processing systems, 6.
Cao, Z., Shaomin, M. U., Yongyu, X. U., & Dong, M. (2018, December). Image retrieval method based on CNN and dimension reduction. In 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC) (pp. 441-445). IEEE.
Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.
Chopra, S., Hadsell, R., & LeCun, Y. (2005, June). Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 539-546). IEEE.
Domhan, T., Springenberg, J. T., & Hutter, F. (2015, July). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In IJCAI (Vol. 15, pp. 3460-8).
Gale, T., Elsen, E., & Hooker, S. (2019). The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574.
Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06) (Vol. 2, pp. 1735-1742). IEEE.
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28.
Hassibi, B., & Stork, D. (1992). Second order derivatives for network pruning: Optimal brain surgeon. Advances in neural information processing systems, 5.
Jolliffe, I. T. (2002). Principal component analysis for special types of data (pp. 338-372). Springer New York.
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences, 374(2065), 20150202.
Klein, A., Falkner, S., Springenberg, J. T., & Hutter, F. (2017, February). Learning curve prediction with Bayesian neural networks. In International conference on learning representations.
Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2, No. 1, pp. 1-30).
LeCun, Y., Denker, J., & Solla, S. (1989). Optimal brain damage. Advances in neural information processing systems, 2.
Louizos, C., Welling, M., & Kingma, D. P. (2017). Learning sparse neural networks through L_0 regularization. arXiv preprint arXiv:1712.01312.
Molchanov, D., Ashukha, A., & Vetrov, D. (2017, July). Variational dropout sparsifies deep neural networks. In International conference on machine learning (pp. 2498-2507). PMLR.
Mueller, J., & Thyagarajan, A. (2016, March). Siamese recurrent architectures for learning sentence similarity. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
Riera, M., Arnau, J. M., & González, A. (2022). DNN pruning with principal component analysis and connection importance estimation. Journal of Systems Architecture, 122, 102336.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3), 55-75. |
Description: | 碩士 國立政治大學 統計學系 112354032 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0112354032 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
403201.pdf | | 1926Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|