English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 114898/145937 (79%)
Visitors : 53921688      Online Users : 1098
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/155972
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/155972


    Title: 基於高解析度真實雨天影像資料集與VLM數據優化提升影像除雨模型效能之方法
    Enhancing Image Deraining via a High-resolution Paired Real Rainy Dataset and VLM-based Data Refinement
    Authors: 梁師睿
    Liang, Shih-Jui
    Contributors: 彭彥璁
    陳駿丞

    Peng, Yan-Tsung
    Chen, Jun-Cheng

    梁師睿
    Liang, Shih-Jui
    Keywords: 真實雨資料集
    除雨模型
    視覺大語言模型
    資料分類
    Real-world rain dataset
    Rain removal model
    Vision-Language Model
    Data annotation
    Date: 2025
    Issue Date: 2025-03-03 14:03:49 (UTC+8)
    Abstract: 由於在自動駕駛和監控系統等應用領域中除雨技術相當重要,近年來影像除雨已吸引了廣泛的關注。雖然許多影像除雨模型在去除雨並提升影像清晰度方面都取得了顯著成果,但這些模型通常在合成資料集上進行訓練。由於合成雨與真實雨條件存在差異,當把訓練於合成雨之模型應用於真實世界場景時,經常會出現效能落差。儘管目前已有一些真實世界資料集,但其對照影像往往背景不對齊,或資料品質較低,導致模型效能不佳。取得高品質且背景對齊的真實世界雨景影像對於有效訓練模型來說一直是一項艱鉅的挑戰。

    為了改善影像除雨模型在真實世界資料上的泛化能力與穩定度,我們提出了 RealRain-AURA,一個高解析度、分類清晰且高品質的真實雨景成對資料集。此外,我們也提出名為 AURA(Automated Understanding and Refinement Agents)的自動化理解與優化代理人系統,透過視覺-語言模型(Vision-Language Models, VLMs)對資料進行篩選與雨紋密度(如小雨、中雨、大雨)分類,以過濾不適合的訓練資料並將影像做精細分類。這種優化與分類框架能夠提升資料集品質,進而增進影像除雨模型在真實場景下的除雨效能。
    Image deraining has recently garnered significant attention due to its critical role in applications such as autonomous driving and surveillance systems. Although many image deraining models have successfully eliminated rain and improved image clarity, they are predominantly trained on synthetic datasets. This reliance on synthetic data creates a performance gap when these models are applied in real-world contexts, as synthetic rain often does not accurately replicate actual rain conditions. Although some real-world datasets are available, they typically lack refinement with misaligned backgrounds in image pairs, resulting in less-than-ideal model performance. The main challenge is to acquire high-quality pairs of real-world rainy images with aligned backgrounds for effective model training. We present RealRain-AURA, a dataset of high-resolution, high-quality paired real rain images, aiming to enhance the generalization and robustness of image deraining models on real-world data. In addition, we introduce Automated Understanding and Refinement Agents (AURA), which employ Vision-Language Models (VLMs) to refine deraining datasets by eliminating unsuitable training data and categorizing images according to rain density (e.g., light, moderate, and heavy rain). This refinement and categorization framework improves dataset quality, thereby boosting the performance of rain removal models in real-world applications.
    Reference: [1] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour de-tection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2010.
    [2] Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso M de Melo, Suya You, Stefano Soatto, Alex Wong, et al. Not just streaks: Towards ground truth for single image deraining. In European Conference on Computer Vision, pages 723–740. Springer, 2022.
    [3] Tom B Brown. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
    [4] Yi-Lei Chen and Chiou-Ting Hsu. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In Proc. Conf. Comput. Vis. Pattern Recog-nit., 2013.
    [5] Yuetian Chen, Bowen Shi, and Mei Si. Prompt to gpt-3: Step-by-step thinking in-structions for humor generation, 2023.
    [6] Yingjun Du, Jun Xu, Qiang Qiu, Xiantong Zhen, and Lei Zhang. Variational image deraining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2406–2415, 2020.
    [7] Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2704–2714, 2023.
    [8] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. Removing rain from single images via a deep detail network. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017.
    [9] Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020.
    [10] Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12097–12107, 2023.
    [11] Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, and Jianfeng Gao. Towards learning a generic agent for vision-and-language navigation via pre-training, 2020.
    [12] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
    [13] Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, and Nenghai Yu. Diversity-aware meta visual prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10878–10887, 2023.
    [14] Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
    [15] Minghan Li, Xiangyong Cao, Qian Zhao, Lei Zhang, and Deyu Meng. Online rain/snow removal from surveillance videos. IEEE Transactions on Image Process- ing, 30:2029–2044, 2021.
    [16] Siyuan Li, Wenqi Ren, Jiawan Zhang, Jinke Yu, and Xiaojie Guo. Single image rain removal via a deep decomposition–composition network. Computer Vision and Image Understanding, 2019.
    [17] Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond. arXiv preprint arXiv:2206.05514, 2022.
    [18] Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
    [19] Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual prompting for low-level structure segmentations, 2023.
    [20] Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
    [21] Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Unifying image processing as visual prompting question answering. arXiv preprint arXiv:2310.10513, 2023.
    [22] Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single image via discriminative sparse coding. In Proc. Int. Conf. Comput. Vis., 2015.
    [23] Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023.
    [24] Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, and Lefei Zhang. Prores: Exploring degradation-aware visual prompt for universal image restoration. arXiv preprint arXiv:2306.13653, 2023.
    [25] Armin Mehri, Parichehr B Ardakani, and Angel D Sappa. Mprnet: Multi-path resid-ual network for lightweight image super resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021.
    [26] V Potlapalli, SW Zamir, S Khan, and FS Khan. Promptir: Prompting for all-in-one blind image restoration. arxiv 2023. arXiv preprint arXiv:2306.13090, 2023.
    [27] Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one blind image restoration, 2023.
    [28] Yinhe Qi, Huanrong Zhang, Zhi Jin, and Wanquan Liu. Depth-guided asymmetric cyclegan for rain synthesis and image deraining. Multimedia Tools and Applications, 2022.
    [29] Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9147–9156, 2021.
    [30] Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, and Jiwen Lu. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18082–18091, 2022.
    [31] Gerald Schaefer and Michal Stich. Ucid: An uncompressed color image database. In Storage and retrieval methods and applications for multimedia 2004, volume 5307, pages 472–480. SPIE, 2003.
    [32] Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
    [33] Hong Wang, Qi Xie, Qian Zhao, Yuexiang Li, Yong Liang, Yefeng Zheng, and Deyu Meng. Rcdnet: An interpretable rain convolutional dictionary network for single image deraining. IEEE Transactions on Neural Networks and Learning Systems, 2023.
    [34] Kaige Wang, Long Chen, Tianming Wang, Qixiang Meng, Huatao Jiang, and Lin Chang. Image deraining and denoising convolutional neural network for autonomous driving. In 2021 International Conference on High Performance Big Data and In-telligent Systems (HPBD&IS), pages 241–245. IEEE, 2021.
    [35] Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. In Pro-ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019.
    [36] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
    [37] Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149, 2022.
    [38] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
    [39] Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023.
    [40] Wenhan Yang, Jiaying Liu, and Jiashi Feng. Frame-consistent recurrent video derain-ing with dual-level flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1661–1670, 2019.
    [41] Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection and removal from a single image. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017.
    [42] Rajeev Yasarla and Vishal M Patel. Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8405–8414, 2019.
    [43] Rajeev Yasarla, Jeya Maria Jose Valanarasu, and Vishal M Patel. Exploring over-complete representations for single image deraining using cnns. IEEE Journal of Selected Topics in Signal Processing, 15(2):229–239, 2020.
    [44] Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuanjing Huang, and Xipeng Qiu. Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15135–15153, Singapore, December 2023. Association for Computational Linguistics.
    [45] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022.
    [46] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In Proc. Conf. Comput. Vis. Pattern Recognit., 2021.
    [47] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF con-ference on computer vision and pattern recognition, pages 16816–16825, 2022.
    [48] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
    Description: 碩士
    國立政治大學
    資訊科學系
    111753216
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0111753216
    Data Type: thesis
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File SizeFormat
    321601.pdf11728KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback