Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/155972
|
Title: | 基於高解析度真實雨天影像資料集與VLM數據優化提升影像除雨模型效能之方法 Enhancing Image Deraining via a High-resolution Paired Real Rainy Dataset and VLM-based Data Refinement |
Authors: | 梁師睿 Liang, Shih-Jui |
Contributors: | 彭彥璁 陳駿丞 Peng, Yan-Tsung Chen, Jun-Cheng 梁師睿 Liang, Shih-Jui |
Keywords: | 真實雨資料集 除雨模型 視覺大語言模型 資料分類 Real-world rain dataset Rain removal model Vision-Language Model Data annotation |
Date: | 2025 |
Issue Date: | 2025-03-03 14:03:49 (UTC+8) |
Abstract: | 由於在自動駕駛和監控系統等應用領域中除雨技術相當重要,近年來影像除雨已吸引了廣泛的關注。雖然許多影像除雨模型在去除雨並提升影像清晰度方面都取得了顯著成果,但這些模型通常在合成資料集上進行訓練。由於合成雨與真實雨條件存在差異,當把訓練於合成雨之模型應用於真實世界場景時,經常會出現效能落差。儘管目前已有一些真實世界資料集,但其對照影像往往背景不對齊,或資料品質較低,導致模型效能不佳。取得高品質且背景對齊的真實世界雨景影像對於有效訓練模型來說一直是一項艱鉅的挑戰。
為了改善影像除雨模型在真實世界資料上的泛化能力與穩定度,我們提出了 RealRain-AURA,一個高解析度、分類清晰且高品質的真實雨景成對資料集。此外,我們也提出名為 AURA(Automated Understanding and Refinement Agents)的自動化理解與優化代理人系統,透過視覺-語言模型(Vision-Language Models, VLMs)對資料進行篩選與雨紋密度(如小雨、中雨、大雨)分類,以過濾不適合的訓練資料並將影像做精細分類。這種優化與分類框架能夠提升資料集品質,進而增進影像除雨模型在真實場景下的除雨效能。 Image deraining has recently garnered significant attention due to its critical role in applications such as autonomous driving and surveillance systems. Although many image deraining models have successfully eliminated rain and improved image clarity, they are predominantly trained on synthetic datasets. This reliance on synthetic data creates a performance gap when these models are applied in real-world contexts, as synthetic rain often does not accurately replicate actual rain conditions. Although some real-world datasets are available, they typically lack refinement with misaligned backgrounds in image pairs, resulting in less-than-ideal model performance. The main challenge is to acquire high-quality pairs of real-world rainy images with aligned backgrounds for effective model training. We present RealRain-AURA, a dataset of high-resolution, high-quality paired real rain images, aiming to enhance the generalization and robustness of image deraining models on real-world data. In addition, we introduce Automated Understanding and Refinement Agents (AURA), which employ Vision-Language Models (VLMs) to refine deraining datasets by eliminating unsuitable training data and categorizing images according to rain density (e.g., light, moderate, and heavy rain). This refinement and categorization framework improves dataset quality, thereby boosting the performance of rain removal models in real-world applications. |
Reference: | [1] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour de-tection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2010. [2] Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso M de Melo, Suya You, Stefano Soatto, Alex Wong, et al. Not just streaks: Towards ground truth for single image deraining. In European Conference on Computer Vision, pages 723–740. Springer, 2022. [3] Tom B Brown. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020. [4] Yi-Lei Chen and Chiou-Ting Hsu. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In Proc. Conf. Comput. Vis. Pattern Recog-nit., 2013. [5] Yuetian Chen, Bowen Shi, and Mei Si. Prompt to gpt-3: Step-by-step thinking in-structions for humor generation, 2023. [6] Yingjun Du, Jun Xu, Qiang Qiu, Xiantong Zhen, and Lei Zhang. Variational image deraining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2406–2415, 2020. [7] Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2704–2714, 2023. [8] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. Removing rain from single images via a deep detail network. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017. [9] Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020. [10] Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12097–12107, 2023. [11] Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, and Jianfeng Gao. Towards learning a generic agent for vision-and-language navigation via pre-training, 2020. [12] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022. [13] Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, and Nenghai Yu. Diversity-aware meta visual prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10878–10887, 2023. [14] Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022. [15] Minghan Li, Xiangyong Cao, Qian Zhao, Lei Zhang, and Deyu Meng. Online rain/snow removal from surveillance videos. IEEE Transactions on Image Process- ing, 30:2029–2044, 2021. [16] Siyuan Li, Wenqi Ren, Jiawan Zhang, Jinke Yu, and Xiaojie Guo. Single image rain removal via a deep decomposition–composition network. Computer Vision and Image Understanding, 2019. [17] Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond. arXiv preprint arXiv:2206.05514, 2022. [18] Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021. [19] Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual prompting for low-level structure segmentations, 2023. [20] Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021. [21] Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Unifying image processing as visual prompting question answering. arXiv preprint arXiv:2310.10513, 2023. [22] Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single image via discriminative sparse coding. In Proc. Int. Conf. Comput. Vis., 2015. [23] Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023. [24] Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, and Lefei Zhang. Prores: Exploring degradation-aware visual prompt for universal image restoration. arXiv preprint arXiv:2306.13653, 2023. [25] Armin Mehri, Parichehr B Ardakani, and Angel D Sappa. Mprnet: Multi-path resid-ual network for lightweight image super resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. [26] V Potlapalli, SW Zamir, S Khan, and FS Khan. Promptir: Prompting for all-in-one blind image restoration. arxiv 2023. arXiv preprint arXiv:2306.13090, 2023. [27] Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one blind image restoration, 2023. [28] Yinhe Qi, Huanrong Zhang, Zhi Jin, and Wanquan Liu. Depth-guided asymmetric cyclegan for rain synthesis and image deraining. Multimedia Tools and Applications, 2022. [29] Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9147–9156, 2021. [30] Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, and Jiwen Lu. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18082–18091, 2022. [31] Gerald Schaefer and Michal Stich. Ucid: An uncompressed color image database. In Storage and retrieval methods and applications for multimedia 2004, volume 5307, pages 472–480. SPIE, 2003. [32] Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. [33] Hong Wang, Qi Xie, Qian Zhao, Yuexiang Li, Yong Liang, Yefeng Zheng, and Deyu Meng. Rcdnet: An interpretable rain convolutional dictionary network for single image deraining. IEEE Transactions on Neural Networks and Learning Systems, 2023. [34] Kaige Wang, Long Chen, Tianming Wang, Qixiang Meng, Huatao Jiang, and Lin Chang. Image deraining and denoising convolutional neural network for autonomous driving. In 2021 International Conference on High Performance Big Data and In-telligent Systems (HPBD&IS), pages 241–245. IEEE, 2021. [35] Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. In Pro-ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019. [36] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022. [37] Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149, 2022. [38] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022. [39] Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023. [40] Wenhan Yang, Jiaying Liu, and Jiashi Feng. Frame-consistent recurrent video derain-ing with dual-level flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1661–1670, 2019. [41] Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection and removal from a single image. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017. [42] Rajeev Yasarla and Vishal M Patel. Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8405–8414, 2019. [43] Rajeev Yasarla, Jeya Maria Jose Valanarasu, and Vishal M Patel. Exploring over-complete representations for single image deraining using cnns. IEEE Journal of Selected Topics in Signal Processing, 15(2):229–239, 2020. [44] Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuanjing Huang, and Xipeng Qiu. Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15135–15153, Singapore, December 2023. Association for Computational Linguistics. [45] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022. [46] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In Proc. Conf. Comput. Vis. Pattern Recognit., 2021. [47] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF con-ference on computer vision and pattern recognition, pages 16816–16825, 2022. [48] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022. |
Description: | 碩士 國立政治大學 資訊科學系 111753216 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0111753216 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
321601.pdf | 11728Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|