政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/152770

政大典藏 > College of Informatics > Executive Master Program of Computer Science of NCCU > Theses > Item 140.119/152770

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/152770

Title:	應用生成式資料擴增提升魚眼鏡頭物件偵測模型效能 Enhancing Fisheye Lens Object Detection Using Generative Data Augmentation
Authors:	程品潔 Cheng, Pin-Chieh
Contributors:	廖文宏 Liao, Wen-Hung 程品潔 Cheng, Pin-Chieh
Keywords:	魚眼鏡頭魚眼校正物件偵測擴散模型生成式資料擴增 Fisheye Camera Fisheye Correction Object Detection Diffusion Model Generative Data Augmentation
Date:	2024
Issue Date:	2024-08-05 13:56:22 (UTC+8)
Abstract:	智慧城市旨在利用創新科技提升城市運行效率、安全性及生活品質，先進的監控系統和物件偵測技術是智慧城市的重要組成部分，有助於管理和優化公共空間。頂照式魚眼鏡頭因其超廣角視野，非常適合用於大範圍監控，但也帶來了嚴重的影像失真問題。此外，鑒於隱私保護需求和場景的多樣性，獲取足夠且多樣化的公開影像極其困難，從而阻礙了相關研究的發展。針對上述問題，本研究選擇圖書館這一常見且重要的公共場所作為研究對象，旨在解決使用頂照式魚眼鏡頭進行物件偵測時的兩大挑戰：資料稀缺性和魚眼影像失真問題。透過使用文本到圖像的生成模型來擴增訓練資料，並結合基於預設相機內部參數的失真校正方法，我們成功提高了物件偵測的準確率。實驗結果顯示，使用生成式AI模型生成的圖像進行訓練，並逐步策略性地增加合成實例，能夠顯著提升模型的偵測性能，將資料集校正後對小尺寸物件的偵測效果尤其顯著。我們提出的魚眼校正微調模型跟YOLOv8基準模型相比，整體的mAP(0.5) 從0.246提升到0.688，mAP(0.5-0.95) 從0.122提升到0.518；在特定的小尺寸物件類別（飲料）上，mAP(0.5) 從0.507提升到0.795，mAP(0.5-0.95) 從0.268提升到0.586。此外，混合適當比例的合成資料與真實資料進行訓練，不僅可以提升訓練過程的穩健性，還有助於進一步優化模型性能。這些發現證實了本研究採用的方法在頂照式魚眼鏡頭物件偵測應用中的潛力。 Smart cities aim to leverage innovative technologies to enhance urban operational efficiency, safety, and quality of life. Advanced surveillance systems and object detection technologies are crucial components of smart cities, aiding in the management and optimization of public spaces. Overhead fisheye lenses, with their ultra-wide field of view, are well-suited for large-scale surveillance but present significant image distortion challenges. Furthermore, due to privacy protection requirements and the diversity of scenes, acquiring sufficient and diverse public images is extremely difficult, hindering the development of related research. Addressing these issues, this study focuses on libraries, a common and important public venue, aiming to address two major challenges in object detection using overhead fisheye images: data scarcity and fisheye lens distortion. By using text-to-image generative models to augment the training data and combining them with distortion correction methods based on preset camera intrinsic parameters, we successfully improved object detection accuracy. Experimental results show that training with images generated by generative AI models and gradually and strategically increasing synthetic instances can significantly enhance the model's detection performance, with particularly notable improvements in detecting small objects after correcting the dataset. Our fisheye correction fine-tuning model, compared to the YOLOv8 baseline model, improved the overall mAP(0.5) from 0.246 to 0.688 and mAP(0.5-0.95) from 0.122 to 0.518; for specific small object categories (beverages), mAP(0.5) increased from 0.507 to 0.795, and mAP(0.5-0.95) from 0.268 to 0.586. Additionally, mixing an appropriate proportion of synthetic data with real data for training not only enhances the robustness of the training process but also helps further optimize model performance. These findings confirm the potential of our approach in the application of object detection using overhead fisheye lenses.
Reference:	[1] Xu, J., Han, D.-W., Li, K., Li, J.-J., & Ma, Z.-Y. (2024). A Comprehensive Overview of Fish-Eye Camera Distortion Correction Methods. arXiv. [2] Electrical & Computer Engineering, Visual Information Processing, Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/. [3] Electrical & Computer Engineering, Visual Information Processing, Rotated Bounding-Box Annotations for Mirror Worlds Dataset (MW-R). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/mw-r/. [4] Electrical & Computer Engineering, Visual Information Processing, Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/. [5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems (pp. 2672-2680). [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593. [Submitted on 30 Mar 2017 (v1), last revised 24 Aug 2020 (this version, v7)]. [7] Besnier, V., Jain, H., Bursuc, A., Cord, M., & Perez, P. (2019). THIS DATASET DOES NOT EXIST: TRAINING MODELS FROM GENERATED IMAGES. arXiv. [8] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs. arXiv preprint arXiv:1606.03498. [Submitted on 10 Jun 2016]. [9] Prafulla Dhariwal, Alex Nichol. (2021). Diffusion Models Beat GANs on Image Synthesis. arXiv preprint arXiv:2105.05233. [Submitted on 11 May 2021 (v1), last revised 1 Jun 2021 (this version, v4)]. [10] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239. [Submitted on 19 Jun 2020 (v1), last revised 16 Dec 2020 (this version, v2)]. [11] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. arXiv preprint arXiv:1605.05396. [Submitted on 17 May 2016 (v1), last revised 5 Jun 2016 (this version, v2)]. [12] Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R., Pai, D., Gromov, A., Roberts, D. A., Yang, D., & Donoho, D. L., & Koyejo, S. (2024). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv preprint arXiv:2404.12345. [Submitted on 1 Apr 2024 (v1), last revised 29 Apr 2024 (this version, v2)]. [13] Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels. (2022). Analysis of Training Object Detection Models with Synthetic Data. arXiv preprint arXiv:2211.15432. [Submitted on 29 Nov 2022]. [14] Seib, V., Roosen, M., Germann, I., Wirtz, S., & Paulus, D. (2024). Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs. arXiv. [15] Huibing Wanga, Tianxiang Cuia, Mingze Yaoa, Huijuan Panga, Yushan Dua. (2023). Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos. arXiv preprint arXiv:2308.04322v1 [cs.CV]. [Submitted on 8 Aug 2023]. [16] Fu, Y., Chen, C., Qiao, Y., & Yu, Y. (2024). DreamDA: Generative Data Augmentation with Diffusion Models. arXiv preprint arXiv:2403.09876. [Submitted on 19 Mar 2024]. [17] Zhu-Cun Xue, Nan Xue, Gui-Song Xia. (2020). Fisheye Distortion Rectification from Deep Straight Lines. arXiv preprint arXiv:2003.11767. [Submitted on 25 Mar 2020]. [18] Yang, S., Lin, C., Liao, K., Zhang, C., & Zhao, Y. (2021). Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow. arXiv preprint arXiv:2103.12345. [Submitted on 30 Mar 2021 (v1), last revised 31 Mar 2021 (this version, v2)]. [19] Yang, S., Lin, C., Liao, K., & Zhao, Y. (2023). Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization. arXiv preprint arXiv:2301.09876. [Submitted on 26 Jan 2023]. [20] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv preprint arXiv:1506.02640. [Submitted on 8 Jun 2015 (v1), last revised 9 May 2016 (this version, v5)]. [21] Gochoo, M., Otgonbold, M., Ganbold, E., Hsieh, J.-W., Chang, M.-C., Chen, P.-Y., Dorj, B., Al Jassmi, H., Batnasan, G., Alnajjar, F., Abduljabbar, M., & Lin, F.-P. (2023). FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection. arXiv preprint arXiv:2305.09876. [Submitted on 27 May 2023 (v1), last revised 6 Jun 2023 (this version, v2)]. [22] Li, S., Tezcan, M. O., Ishwar, P., & Konrad, J. (2019). Supervised People Counting Using An Overhead Fisheye Camera. IEEE Transactions on Image Processing, 28(12), 6142-6157. [23] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. arXiv preprint arXiv:2012.12345. [Submitted on 3 Dec 2020 (v1), last revised 21 Dec 2022 (this version, v2)]. [24] Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020. [25] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving. arXiv preprint arXiv:2004.13621. [26] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Varley, P., O'Dea, D., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Chennupati, S., Nayak, S., Mansoor, S., Perroton, X., & Perez, P. (2021). WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving. arXiv. [27] Peterson, M. (2024). Dos and don’ts when fine-tuning generative AI models. RWS. Retrieved from https://www.rws.com/artificial-intelligence/train-ai-data-services/blog/dos-and-donts-when-fine-tuning-generative-ai-models/?utm_campaign=TrainAI%20Data%20Services%20-%20GenAI%20Campaign&utm_content=281536374&utm_medium=social&utm_source=linkedin&hss_channel=lcp-12582389.
Description:	碩士國立政治大學資訊科學系碩士在職專班 111971008
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0111971008
Data Type:	thesis
Appears in Collections:	[Executive Master Program of Computer Science of NCCU] Theses

Files in This Item:

File	Description	Size	Format
100801.pdf		8604Kb	Adobe PDF	2	View/Open

社群 sharing

Loading...