English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 118204/149236 (79%)
造訪人次 : 74194387      線上人數 : 129
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/159410
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/159410


    題名: 基於穩定擴散模型之空拍影像生成互動編輯
    Interactive editing of aerial images generation based on Stable Diffusion Model
    作者: 黃泓棋
    Huang, Hung-Chi
    貢獻者: 紀明德
    Chi, Ming-Te
    黃泓棋
    Huang, Hung-Chi
    關鍵詞: 無人機影像
    跨視角生成
    日期: 2025
    上傳時間: 2025-09-01 16:56:29 (UTC+8)
    摘要: 我們提出一條由衛星影像跨視角轉換至無人機視角、同時支援互動式局部編輯的完整管線,針對既有擴散式生成在「可控性不足、局部與整體不一致、高頻紋理被抹平」等問題加以改進。方法上,我們整合 DIIF 超解析度、SAM 遮罩分割、ControlNet 結構化條件與 SDXL + LoRA 生成,採取「先超解析度、再生成」策略:先恢復細節與尺度,再以文字與多模態條件導引受控生成,並以領域化 LoRA 維持空拍風格與幾何一致;系統支援多區域指定與零樣本單張範例引導。實驗以 xView(WorldView-3, GSD 0.3 m)驗證整體流程,量化指標顯示在僅小幅犧牲 PSNR、SSIM 的前提下,LPIPS 明顯下降,主觀紋理與真實感提升,且在結構條件約束下能保持全局佈局穩定。綜合而言,本研究在跨視角生成與區域可控編輯上同時達成「高真實感、可重複控制、與原圖特徵一致」的目標,為遙測到低空視角的影像生成提供實用且可擴充的解決方案。
    參考文獻: [1] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.

    [2] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.

    [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10684–10695.

    [4] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or, “Prompt-to-prompt image editing with cross attention control,” arXiv preprint arXiv:2208.01626, 2022.

    [5] O. Avrahami, O. Fried, and D. Lischinski, “Blended latent diffusion,” ACM Trans. Graph. (TOG), vol. 42, no. 4, pp. 1–11, 2023.

    [6] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2023, pp. 3836–3847.

    [7] C. Corneanu, R. Gadde, and A. M. Martinez, “Latentpaint: Image inpainting in latent space with diffusion models,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision (WACV), Jan. 2024, pp. 4334–4343.

    [8] Y. Wang, T. Su, Y. Li, J. Cao, G. Wang, and X. Liu, “Ddistill-sr: Reparameterized dynamic distillation network for lightweight image super-resolution,” IEEE Trans. Multimedia, vol. 25, pp. 7222–7234, 2022.

    [9] Z. He and Z. Jin, “Dynamic implicit image function for efficient arbitrary-scale image representation,” arXiv preprint arXiv:2306.12321, 2023.

    [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.

    [11] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2023, pp. 4015–4026.

    [12] O. Avrahami, D. Lischinski, and O. Fried, “Blended diffusion for text-driven editing of natural images,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2022, pp. 18208–18218.

    [13] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Machine Learning (ICML). PMLR, 2021, pp. 8748–8763.

    [14] P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12873–12883.

    [15] D. P. Kingma, “Auto-encoding variational Bayes,” arXiv preprint arXiv:1312.6114, 2013.

    [16] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.

    [17] D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y. Bulatov, and B. McCord, “xView: Objects in context in overhead imagery,” arXiv preprint arXiv:1802.07856, 2018. [Online]. Available: https://arxiv.org/abs/1802.07856

    [18] G. Li, “Riveravssd: River aerial view semantic segmentation dataset,” National Center for High Performance Computing Data Platform, Aug. 2023. [Online]. Available: https://scidm.nchc.org.tw/dataset/riveravssd
    , accessed: Jul. 22, 2025.

    [19] quadeer15sh, “Augmented forest segmentation,” Kaggle, Jun. 2025. [Online]. Available: https://www.kaggle.com/datasets/quadeer15sh/augmented-forest-segmentation
    , accessed: Jul. 22, 2025.

    [20] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410.

    [21] S. Brade, B. Wang, M. Sousa, S. Oore, and T. Grossman, “Promptify: Text-to-image generation through interactive prompt exploration with large language models,” in Proc. 36th Annu. ACM Symp. User Interface Software and Technology (UIST), 2023, pp. 1–14.

    [22] Y. Feng, X. Wang, K. K. Wong, S. Wang, Y. Lu, M. Zhu, B. Wang, and W. Chen, “Promptmagician: Interactive prompt engineering for text-to-image creation,” IEEE Trans. Visualization and Computer Graphics, 2023.

    [23] A. Sauer, K. Schwarz, and A. Geiger, “StyleGAN-XL: Scaling StyleGAN to large diverse datasets,” in ACM SIGGRAPH Conf. Proc., 2022, pp. 1–10.

    [24] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8110–8119.

    [25] A. Sauer, T. Karras, S. Laine, A. Geiger, and T. Aila, “StyleGAN-T: Unlocking the power of GANs for fast large-scale text-to-image synthesis,” in Proc. Int. Conf. Machine Learning (ICML). PMLR, 2023, pp. 30105–30118.

    [26] Y. Lyu, T. Lin, F. Li, D. He, J. Dong, and T. Tan, “DeltaEdit: Exploring text-free training for text-driven image manipulation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2023, pp. 6894–6903.

    [27] R. Gal, O. Patashnik, H. Maron, A. H. Bermano, G. Chechik, and D. Cohen-Or, “StyleGAN-NADA: CLIP-guided domain adaptation of image generators,” ACM Trans. Graph. (TOG), vol. 41, no. 4, pp. 1–13, 2022.

    [28] Y. Nitzan, K. Aberman, Q. He, O. Liba, M. Yarom, Y. Gandelsman, I. Mosseri, Y. Pritch, and D. Cohen-Or, “MyStyle: A personalized generative prior,” ACM Trans. Graph. (TOG), vol. 41, no. 6, pp. 1–10, 2022.

    [29] K. Song, L. Han, B. Liu, D. Metaxas, and A. Elgammal, “Diffusion guided domain adaptation of image generators,” arXiv preprint arXiv:2212.04473, 2022.

    [30] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.

    [31] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Machine Learning (ICML). PMLR, 2017, pp. 214–223.

    [32] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.

    [33] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: A StyleGAN encoder for image-to-image translation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2287–2296.

    [34] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1125–1134.

    [35] A. Bochkovskii, A. Delaunoy, H. Germain, M. Santos, Y. Zhou, S. R. Richter, and V. Koltun, “DepthPro: Sharp monocular metric depth in less than a second,” arXiv preprint arXiv:2410.02073, 2024.
    描述: 碩士
    國立政治大學
    資訊科學系
    111753159
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0111753159
    資料類型: thesis
    顯示於類別:[資訊科學系] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    315901.pdf14067KbAdobe PDF0檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋