政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/159410

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 118204/149236 (79%)
造訪人次 : 74194387 線上人數 : 129

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/159410

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/159410

題名:	基於穩定擴散模型之空拍影像生成互動編輯 Interactive editing of aerial images generation based on Stable Diffusion Model
作者:	黃泓棋 Huang, Hung-Chi
貢獻者:	紀明德 Chi, Ming-Te 黃泓棋 Huang, Hung-Chi
關鍵詞:	無人機影像跨視角生成
日期:	2025
上傳時間:	2025-09-01 16:56:29 (UTC+8)
摘要:	我們提出一條由衛星影像跨視角轉換至無人機視角、同時支援互動式局部編輯的完整管線，針對既有擴散式生成在「可控性不足、局部與整體不一致、高頻紋理被抹平」等問題加以改進。方法上，我們整合 DIIF 超解析度、SAM 遮罩分割、ControlNet 結構化條件與 SDXL + LoRA 生成，採取「先超解析度、再生成」策略：先恢復細節與尺度，再以文字與多模態條件導引受控生成，並以領域化 LoRA 維持空拍風格與幾何一致；系統支援多區域指定與零樣本單張範例引導。實驗以 xView（WorldView-3, GSD 0.3 m）驗證整體流程，量化指標顯示在僅小幅犧牲 PSNR、SSIM 的前提下，LPIPS 明顯下降，主觀紋理與真實感提升，且在結構條件約束下能保持全局佈局穩定。綜合而言，本研究在跨視角生成與區域可控編輯上同時達成「高真實感、可重複控制、與原圖特徵一致」的目標，為遙測到低空視角的影像生成提供實用且可擴充的解決方案。
參考文獻:	[1] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020. [2] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020. [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684–10695. [4] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or, “Prompt-to-prompt image editing with cross attention control,” arXiv preprint arXiv:2208.01626, 2022. [5] O. Avrahami, O. Fried, and D. Lischinski, “Blended latent diffusion,” ACM Trans. Graph. (TOG), vol. 42, no. 4, pp. 1–11, 2023. [6] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2023, pp. 3836–3847. [7] C. Corneanu, R. Gadde, and A. M. Martinez, “Latentpaint: Image inpainting in latent space with diffusion models,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision (WACV), Jan. 2024, pp. 4334–4343. [8] Y. Wang, T. Su, Y. Li, J. Cao, G. Wang, and X. Liu, “Ddistill-sr: Reparameterized dynamic distillation network for lightweight image super-resolution,” IEEE Trans. Multimedia, vol. 25, pp. 7222–7234, 2022. [9] Z. He and Z. Jin, “Dynamic implicit image function for efficient arbitrary-scale image representation,” arXiv preprint arXiv:2306.12321, 2023. [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014. [11] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2023, pp. 4015–4026. [12] O. Avrahami, D. Lischinski, and O. Fried, “Blended diffusion for text-driven editing of natural images,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2022, pp. 18208–18218. [13] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Machine Learning (ICML). PMLR, 2021, pp. 8748–8763. [14] P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12873–12883. [15] D. P. Kingma, “Auto-encoding variational Bayes,” arXiv preprint arXiv:1312.6114, 2013. [16] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021. [17] D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y. Bulatov, and B. McCord, “xView: Objects in context in overhead imagery,” arXiv preprint arXiv:1802.07856, 2018. [Online]. Available: https://arxiv.org/abs/1802.07856 [18] G. Li, “Riveravssd: River aerial view semantic segmentation dataset,” National Center for High Performance Computing Data Platform, Aug. 2023. [Online]. Available: https://scidm.nchc.org.tw/dataset/riveravssd , accessed: Jul. 22, 2025. [19] quadeer15sh, “Augmented forest segmentation,” Kaggle, Jun. 2025. [Online]. Available: https://www.kaggle.com/datasets/quadeer15sh/augmented-forest-segmentation , accessed: Jul. 22, 2025. [20] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410. [21] S. Brade, B. Wang, M. Sousa, S. Oore, and T. Grossman, “Promptify: Text-to-image generation through interactive prompt exploration with large language models,” in Proc. 36th Annu. ACM Symp. User Interface Software and Technology (UIST), 2023, pp. 1–14. [22] Y. Feng, X. Wang, K. K. Wong, S. Wang, Y. Lu, M. Zhu, B. Wang, and W. Chen, “Promptmagician: Interactive prompt engineering for text-to-image creation,” IEEE Trans. Visualization and Computer Graphics, 2023. [23] A. Sauer, K. Schwarz, and A. Geiger, “StyleGAN-XL: Scaling StyleGAN to large diverse datasets,” in ACM SIGGRAPH Conf. Proc., 2022, pp. 1–10. [24] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8110–8119. [25] A. Sauer, T. Karras, S. Laine, A. Geiger, and T. Aila, “StyleGAN-T: Unlocking the power of GANs for fast large-scale text-to-image synthesis,” in Proc. Int. Conf. Machine Learning (ICML). PMLR, 2023, pp. 30105–30118. [26] Y. Lyu, T. Lin, F. Li, D. He, J. Dong, and T. Tan, “DeltaEdit: Exploring text-free training for text-driven image manipulation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2023, pp. 6894–6903. [27] R. Gal, O. Patashnik, H. Maron, A. H. Bermano, G. Chechik, and D. Cohen-Or, “StyleGAN-NADA: CLIP-guided domain adaptation of image generators,” ACM Trans. Graph. (TOG), vol. 41, no. 4, pp. 1–13, 2022. [28] Y. Nitzan, K. Aberman, Q. He, O. Liba, M. Yarom, Y. Gandelsman, I. Mosseri, Y. Pritch, and D. Cohen-Or, “MyStyle: A personalized generative prior,” ACM Trans. Graph. (TOG), vol. 41, no. 6, pp. 1–10, 2022. [29] K. Song, L. Han, B. Liu, D. Metaxas, and A. Elgammal, “Diffusion guided domain adaptation of image generators,” arXiv preprint arXiv:2212.04473, 2022. [30] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015. [31] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Machine Learning (ICML). PMLR, 2017, pp. 214–223. [32] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014. [33] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: A StyleGAN encoder for image-to-image translation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2287–2296. [34] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1125–1134. [35] A. Bochkovskii, A. Delaunoy, H. Germain, M. Santos, Y. Zhou, S. R. Richter, and V. Koltun, “DepthPro: Sharp monocular metric depth in less than a second,” arXiv preprint arXiv:2410.02073, 2024.
描述:	碩士國立政治大學資訊科學系 111753159
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0111753159
資料類型:	thesis
顯示於類別:	[資訊科學系] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
315901.pdf		14067Kb	Adobe PDF	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋