政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/145743
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113318/144297 (79%)
造访人次 : 50995156      在线人数 : 826
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/145743


    题名: 訓練樣本分布對聯盟式學習成效之影響評估
    Evaluating the Performance of Federated Learning Across Different Training Sample Distributions
    作者: 林書羽
    Lin, Shu-Yu
    贡献者: 廖文宏
    Liao, Wen-Hung
    林書羽
    Lin, Shu-Yu
    关键词: 聯盟式學習
    圖像分類
    深度學習
    資料平衡性
    Federated Learning
    Image Classification
    Deep Learning
    Data Balance
    日期: 2023
    上传时间: 2023-07-06 16:22:37 (UTC+8)
    摘要: 聯盟式學習是一種能夠有效解決機器學習中面臨資料隱私與資料分散問題的新興機器學習技術,參與的客戶端能在保有自己資料隱私的前提下,聯合訓練以共享知識。
    本論文探討在圖像分類任務中,聯盟式學習對於模型訓練的效益,並與傳統的集中式訓練相互比照分析。藉由將資料集模擬生成為獨立同分布和非獨立同分佈的配置方式,以及搭配多個協作單位數量的組合,觀察資料平衡性分布差異對於聯盟式學習的執行效益影響程度,而在非獨立同分佈方面,還特別討論了類別無交集的分配方式。
    本研究透過深度學習方法,分別以搭載預訓練模型和重新訓練模型之方式,綜合討論單位數量的多寡和分佈特性,並以Top-1準確率和Top-5準確率評估聯合訓練之成果。
    實驗結果顯示,聯合訓練的初始權重設定有著關鍵的影響性,隨機權重會使得模型表現較不穩定,而基準相同的權重則表現穩定且具有較為良好的準確率。此外,依據不同的資料配置方式,模型表現也會有所不同,其中獨立同分布的表現最佳,而非獨立同分佈中的不平衡分配次之、無交集分配最不理想。
    Federated learning is an emerging machine learning technique that can effectively solve the problems of data privacy and data dispersion in machine learning, where the participating clients can share knowledge through joint training while maintaining the privacy of their own data.
    This thesis explores the benefits of federated learning in model training for image classification tasks and compares it with traditional centralized training. By simulating datasets with independent identical distribution (IID) and non-independent identical distribution (non-IID), and varying the number of collaborating units, we observe how differences in training sample distribution affect the effectiveness of federated learning. Specifically, we discuss the special situation of non-intersecting classes in the case of non-independent identical distribution.
    Using deep learning methods with both pre-trained and trained-from-scratch models, this study comprehensively discusses the impact of the number and distribution of units and evaluates the results of joint training based on Top-1 and Top-5 accuracy.
    Experimental results show that the initial weight setting of joint training has a critical impact. Random weights lead to unstable model performance, while weights set based on the same criteria yield stable and more accurate results. Additionally, model performance varies depending on characteristics of data distribution. The performance of federated-learning model trained with independent identical distribution samples is the best, followed by imbalanced distribution in non-independent identical distribution, while non-intersecting class allocation is the least ideal.
    參考文獻: [1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR.
    [2] Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1-19.
    [3] 維基百科:聯盟式學習定義。
    https://en.wikipedia.org/wiki/Federated_learning
    [4] Liu, Y., Kang, Y., Xing, C., Chen, T., & Yang, Q. (2020). A secure federated transfer learning framework. IEEE Intelligent Systems, 35(4), 70-82.
    [5] Gentry, C. (2009, May). Fully homomorphic encryption using ideal lattices. In Proceedings of the forty-first annual ACM symposium on Theory of computing (pp. 169-178).
    [6] Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J. K., Gibbons, P. B., ... & Xing, E. P. (2013). More effective distributed ml via a stale synchronous parallel parameter server. Advances in neural information processing systems, 26.
    [7] Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2), 1-210.
    [8] Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE.
    [9] ILSVRC歷年Top-5錯誤率
    https://www.kaggle.com/getting-started/149448
    [10] CIFAR-10 / CIFAR-100 資料集
    https://www.cs.toronto.edu/~kriz/cifar.html
    [11] Caltech-UCSD Birds-200-2011 資料集
    https://paperswithcode.com/dataset/cub-200-2011
    [12] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
    [13] MNIST 資料集
    http://yann.lecun.com/exdb/mnist/
    [14] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90.
    [15] ImageNet 資料集
    https://www.image-net.org/
    [16] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
    [17] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
    [18] Tiny-ImageNet 資料集
    https://www.kaggle.com/c/tiny-imagenet/overview
    [19] Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Parcollet, T., de Gusmão, P. P., & Lane, N. D. (2020). Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390.
    [20] Flower Framework
    https://flower.dev/
    [21] Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
    [22] Tan, M., & Le, Q. (2021, July). Efficientnetv2: Smaller models and faster training. In International conference on machine learning (pp. 10096-10106). PMLR.
    [23] OpenMMLab MMClassification github
    https://github.com/open-mmlab/mmclassification
    [24] Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.
    [25] 維基百科:gRPC https://en.wikipedia.org/wiki/GRPC
    描述: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    109971023
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0109971023
    数据类型: thesis
    显示于类别:[資訊科學系碩士在職專班] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    102301.pdf5614KbAdobe PDF2180检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈