English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 112704/143671 (78%)
Visitors : 49771661      Online Users : 218
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/152770


    Title: 應用生成式資料擴增提升魚眼鏡頭物件偵測模型效能
    Enhancing Fisheye Lens Object Detection Using Generative Data Augmentation
    Authors: 程品潔
    Cheng, Pin-Chieh
    Contributors: 廖文宏
    Liao, Wen-Hung
    程品潔
    Cheng, Pin-Chieh
    Keywords: 魚眼鏡頭
    魚眼校正
    物件偵測
    擴散模型
    生成式資料擴增
    Fisheye Camera
    Fisheye Correction
    Object Detection
    Diffusion Model
    Generative Data Augmentation
    Date: 2024
    Issue Date: 2024-08-05 13:56:22 (UTC+8)
    Abstract: 智慧城市旨在利用創新科技提升城市運行效率、安全性及生活品質,先進的監控系統和物件偵測技術是智慧城市的重要組成部分,有助於管理和優化公共空間。頂照式魚眼鏡頭因其超廣角視野,非常適合用於大範圍監控,但也帶來了嚴重的影像失真問題。此外,鑒於隱私保護需求和場景的多樣性,獲取足夠且多樣化的公開影像極其困難,從而阻礙了相關研究的發展。
    針對上述問題,本研究選擇圖書館這一常見且重要的公共場所作為研究對象,旨在解決使用頂照式魚眼鏡頭進行物件偵測時的兩大挑戰:資料稀缺性和魚眼影像失真問題。透過使用文本到圖像的生成模型來擴增訓練資料,並結合基於預設相機內部參數的失真校正方法,我們成功提高了物件偵測的準確率。
    實驗結果顯示,使用生成式AI模型生成的圖像進行訓練,並逐步策略性地增加合成實例,能夠顯著提升模型的偵測性能,將資料集校正後對小尺寸物件的偵測效果尤其顯著。我們提出的魚眼校正微調模型跟YOLOv8基準模型相比,整體的mAP(0.5) 從0.246提升到0.688,mAP(0.5-0.95) 從0.122提升到0.518;在特定的小尺寸物件類別(飲料)上,mAP(0.5) 從0.507提升到0.795,mAP(0.5-0.95) 從0.268提升到0.586。此外,混合適當比例的合成資料與真實資料進行訓練,不僅可以提升訓練過程的穩健性,還有助於進一步優化模型性能。這些發現證實了本研究採用的方法在頂照式魚眼鏡頭物件偵測應用中的潛力。
    Smart cities aim to leverage innovative technologies to enhance urban operational efficiency, safety, and quality of life. Advanced surveillance systems and object detection technologies are crucial components of smart cities, aiding in the management and optimization of public spaces. Overhead fisheye lenses, with their ultra-wide field of view, are well-suited for large-scale surveillance but present significant image distortion challenges. Furthermore, due to privacy protection requirements and the diversity of scenes, acquiring sufficient and diverse public images is extremely difficult, hindering the development of related research.

    Addressing these issues, this study focuses on libraries, a common and important public venue, aiming to address two major challenges in object detection using overhead fisheye images: data scarcity and fisheye lens distortion. By using text-to-image generative models to augment the training data and combining them with distortion correction methods based on preset camera intrinsic parameters, we successfully improved object detection accuracy.

    Experimental results show that training with images generated by generative AI models and gradually and strategically increasing synthetic instances can significantly enhance the model's detection performance, with particularly notable improvements in detecting small objects after correcting the dataset. Our fisheye correction fine-tuning model, compared to the YOLOv8 baseline model, improved the overall mAP(0.5) from 0.246 to 0.688 and mAP(0.5-0.95) from 0.122 to 0.518; for specific small object categories (beverages), mAP(0.5) increased from 0.507 to 0.795, and mAP(0.5-0.95) from 0.268 to 0.586. Additionally, mixing an appropriate proportion of synthetic data with real data for training not only enhances the robustness of the training process but also helps further optimize model performance. These findings confirm the potential of our approach in the application of object detection using overhead fisheye lenses.
    Reference: [1] Xu, J., Han, D.-W., Li, K., Li, J.-J., & Ma, Z.-Y. (2024). A Comprehensive Overview of Fish-Eye Camera Distortion Correction Methods. arXiv.
    [2] Electrical & Computer Engineering, Visual Information Processing, Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/.
    [3] Electrical & Computer Engineering, Visual Information Processing, Rotated Bounding-Box Annotations for Mirror Worlds Dataset (MW-R). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/mw-r/.
    [4] Electrical & Computer Engineering, Visual Information Processing, Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/.
    [5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems (pp. 2672-2680).
    [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593. [Submitted on 30 Mar 2017 (v1), last revised 24 Aug 2020 (this version, v7)].
    [7] Besnier, V., Jain, H., Bursuc, A., Cord, M., & Perez, P. (2019). THIS DATASET DOES NOT EXIST: TRAINING MODELS FROM GENERATED IMAGES. arXiv.
    [8] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs. arXiv preprint arXiv:1606.03498. [Submitted on 10 Jun 2016].
    [9] Prafulla Dhariwal, Alex Nichol. (2021). Diffusion Models Beat GANs on Image Synthesis. arXiv preprint arXiv:2105.05233. [Submitted on 11 May 2021 (v1), last revised 1 Jun 2021 (this version, v4)].
    [10] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239. [Submitted on 19 Jun 2020 (v1), last revised 16 Dec 2020 (this version, v2)].
    [11] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. arXiv preprint arXiv:1605.05396. [Submitted on 17 May 2016 (v1), last revised 5 Jun 2016 (this version, v2)].
    [12] Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R., Pai, D., Gromov, A., Roberts, D. A., Yang, D., & Donoho, D. L., & Koyejo, S. (2024). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv preprint arXiv:2404.12345. [Submitted on 1 Apr 2024 (v1), last revised 29 Apr 2024 (this version, v2)].
    [13] Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels. (2022). Analysis of Training Object Detection Models with Synthetic Data. arXiv preprint arXiv:2211.15432. [Submitted on 29 Nov 2022].
    [14] Seib, V., Roosen, M., Germann, I., Wirtz, S., & Paulus, D. (2024). Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs. arXiv.
    [15] Huibing Wanga, Tianxiang Cuia, Mingze Yaoa, Huijuan Panga, Yushan Dua. (2023). Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos. arXiv preprint arXiv:2308.04322v1 [cs.CV]. [Submitted on 8 Aug 2023].
    [16] Fu, Y., Chen, C., Qiao, Y., & Yu, Y. (2024). DreamDA: Generative Data Augmentation with Diffusion Models. arXiv preprint arXiv:2403.09876. [Submitted on 19 Mar 2024].
    [17] Zhu-Cun Xue, Nan Xue, Gui-Song Xia. (2020). Fisheye Distortion Rectification from Deep Straight Lines. arXiv preprint arXiv:2003.11767. [Submitted on 25 Mar 2020].
    [18] Yang, S., Lin, C., Liao, K., Zhang, C., & Zhao, Y. (2021). Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow. arXiv preprint arXiv:2103.12345. [Submitted on 30 Mar 2021 (v1), last revised 31 Mar 2021 (this version, v2)].
    [19] Yang, S., Lin, C., Liao, K., & Zhao, Y. (2023). Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization. arXiv preprint arXiv:2301.09876. [Submitted on 26 Jan 2023].
    [20] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv preprint arXiv:1506.02640. [Submitted on 8 Jun 2015 (v1), last revised 9 May 2016 (this version, v5)].
    [21] Gochoo, M., Otgonbold, M., Ganbold, E., Hsieh, J.-W., Chang, M.-C., Chen, P.-Y., Dorj, B., Al Jassmi, H., Batnasan, G., Alnajjar, F., Abduljabbar, M., & Lin, F.-P. (2023). FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection. arXiv preprint arXiv:2305.09876. [Submitted on 27 May 2023 (v1), last revised 6 Jun 2023 (this version, v2)].
    [22] Li, S., Tezcan, M. O., Ishwar, P., & Konrad, J. (2019). Supervised People Counting Using An Overhead Fisheye Camera. IEEE Transactions on Image Processing, 28(12), 6142-6157.
    [23] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. arXiv preprint arXiv:2012.12345. [Submitted on 3 Dec 2020 (v1), last revised 21 Dec 2022 (this version, v2)].
    [24] Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020.
    [25] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving. arXiv preprint arXiv:2004.13621.
    [26] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Varley, P., O'Dea, D., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Chennupati, S., Nayak, S., Mansoor, S., Perroton, X., & Perez, P. (2021). WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving. arXiv.
    [27] Peterson, M. (2024). Dos and don’ts when fine-tuning generative AI models. RWS. Retrieved from https://www.rws.com/artificial-intelligence/train-ai-data-services/blog/dos-and-donts-when-fine-tuning-generative-ai-models/?utm_campaign=TrainAI%20Data%20Services%20-%20GenAI%20Campaign&utm_content=281536374&utm_medium=social&utm_source=linkedin&hss_channel=lcp-12582389.
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    111971008
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0111971008
    Data Type: thesis
    Appears in Collections:[資訊科學系碩士在職專班] 學位論文

    Files in This Item:

    File Description SizeFormat
    100801.pdf8604KbAdobe PDF1View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback