資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/159411
|
題名: | Stable Diffusion用於人像構圖轉換:從我到迷因 Stable Diffusion used for Portrait Layout Transformation: From Me to Meme |
作者: | 李峻安 Li, Chun-An |
貢獻者: | 紀明德 Chi, Ming-Te 李峻安 Li, Chun-An |
關鍵詞: | Stable Diffusion Portrait Style Meme Image Composition Stable Diffusion Portrait Style Meme Image Composition |
日期: | 2025 |
上傳時間: | 2025-09-01 16:56:43 (UTC+8) |
摘要: | 網路文化中,迷因指的是迅速擴散的內容、迷因圖片、影片或語 句,具有高度可模仿性、變異性與社群認同感。它們常常具備幽默、 諷刺或時事元素,使得使用者能夠快速「共鳴」並創造延伸版本。本 研究針對迷因圖片因形變導致的人臉檢測問題,本研究提出一個基於 穩定擴散模型與圖像分割技術的轉換方法,該方法在保留原圖像關鍵 識別特徵的前提下針對形變效果進行平滑修正,以提高人臉識別的準 確率,並確保生成結果具備良好的視覺一致性。在個人化影像合成方 面已經有顯著的進展,方法像是 InstantID 和 LoRA 。然而他們在現實 世界的應用受限於迷因人臉偵測的限制。人像風格與迷因圖片構圖相 融合的需求有技術門檻、冗長的模型微調以及圖像篩選或前處理。針 對這些挑戰,本研究提出 Me2Meme ——一種基於擴散模型的解決方 案。同時評估這一方法的效果和實用性,為跨領域藝術和圖像處理領 域帶來新的啟發和應用。 In internet culture, memes refer to rapidly spreading content, images, videos, or phrases characterized by high imitability, variability, and community resonance. They often incorporate humor, satire, or topical elements, enabling users to quickly connect and create derivative versions. This study addresses the issue of face detection in meme images caused by deformation. We propose a transformation method based on stable diffusion models and image segmentation techniques. This method smooths deformation effects while preserving key identifying features of the original image, thereby improving face detection accuracy and ensuring visually consistent outputs. Significant progress has been made in personalized image synthesis, with methods like InstantID and LoRA. However, their real-world applications are limited by challenges in meme face detection. The integration of portrait styles with meme image compositions faces technical barriers, including complex model fine-tuning, image screening, or preprocessing. To address these challenges, this study introduces Me2Meme, a diffusion model-based solution. We also evaluate the effectiveness and practicality of this approach, offering new insights and applications for cross-disciplinary fields in art and image processing. |
參考文獻: | [1] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, pp. 1–15, 07 2019. [2] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [4] N. Ruiz, Y. Li, V. Jampani, T. Hou, P. O. Pinheiro, T. Liu, A. Goyal, A. Lehrmann, and J. Johnson, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [6] Z. Huang, K. C. K. Chan, Y. Jiang, and Z. Liu, “Collaborative diffusion for multi-modal face generation and editing,” 2023. [Online]. Available: https://arxiv.org/abs/2304.10530 [7] X. Ju, A. Zeng, C. Zhao, J. Wang, L. Zhang, and Q. Xu, “Humansd: A native skeleton-guided diffusion model for human image generation,” 2023. [Online]. Available: https://arxiv.org/abs/2304.04269 [8] X. Liu, J. Ren, A. Siarohin, I. Skorokhodov, Y. Li, D. Lin, X. Liu, Z. Liu, and S. Tulyakov, “Hyperhuman: Hyper-realistic human generation with latent structural diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08579 [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [10] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [11] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman,“Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [12] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [13] Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, “Hyper-sd: Trajectory segmented consistency model for efficient image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2404.13686 [14] R. Dawkins, The Selfish Gene. Oxford University Press, 1976. [15] L. Shifman, Memes in Digital Culture. MIT Press, 2014. [16] R. M. Milner, The World Made Meme: Public Conversations and Participatory Media. MIT Press, 2016. [17] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 [18] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2021. [Online]. Available: https://arxiv.org/abs/2112.10741 [19] W. Peebles, S. Xie, and A. Kanazawa, “Dit: Self-supervised pre-training for vision transformers using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2212.09748 [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022. [Online]. Available: https://arxiv.org/abs/2204.06125 [21] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. Gupta, B. Hechtman et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11487 [22] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 [23] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [24] P. Wei, Y. Yuan, L. Li, X. Yuan, and H. Yu, “Ip-adapter: Text compatible image prompt adapter for vision-language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06173 [25] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [26] N. Kumari, M. N. Zeiler, E. Arazi, and D. Cohen-Or, “Custom diffusion: Personalizing text-to-image diffusion models with iterative training,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04082 [27] E. J. Hu, Y. Shen, P. Wallis, Z. Lu, R. Majumder, and G. Yang, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [28] S. Liu, H. Zhou, Z. He, Z. Zhu, D. Li, X. Zhan, T. Xiang, and C. C. Loy, “Fastcomposer: Tuning-free multi-subject image generation with localized textual inversions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.10431 [29] Q. Zhang, G. Lin, W. Lin, H. Yu, H. Chen, and C. Miao, “Photomaker: Controllable personalized image generation via semantic layouts,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13521 [30] H. Zhang, F. Wang, R. Zhang, S. Lyu, H. Zhao, B. Dai, and B. Zhou, “Instantid: Text-guided instant personalization of text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08190 [31] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2018. [32] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Pearson, 2011. [33] C. Zhang and Z. Zhang, “Face detection with boosted gaussian features,” Pattern Recognition, vol. 43, no. 3, pp. 1025–1035, 2010. [34] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294 [35] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. [36] N. Zhou, D. Jurgens, and D. Bamman, “Social meme-ing: Measuring linguistic variation in memes,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09130 [37] J. Huang, X. Dong, W. Song, H. Li, J. Zhou, Y. Cheng, S. Liao, L. Chen, Y. Yan, S. Liao, and X. Liang, “Consistentid: Portrait generation with multimodal fine-grained identity preserving,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16771 [38] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 3730–3738. [39] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” CoRR, vol. abs/1812.04948, 2018. [Online]. Available: http://arxiv.org/abs/1812.04948 [40] AUTOMATIC1111, “Stable diffusion webui,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022. [41] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. [Online]. Available: http://arxiv.org/abs/1812.08008 [42] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [43] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022,pp. 10 684–10 695. |
描述: | 碩士 國立政治大學 資訊科學系 111753222 |
資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0111753222 |
資料類型: | thesis |
顯示於類別: | [資訊科學系] 學位論文
|
文件中的檔案:
檔案 |
大小 | 格式 | 瀏覽次數 |
322201.pdf | 7868Kb | Adobe PDF | 0 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|