政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/159411

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 118204/149236 (79%)
造訪人次 : 74194391 線上人數 : 126

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/159411

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/159411

題名:	Stable Diffusion用於人像構圖轉換：從我到迷因 Stable Diffusion used for Portrait Layout Transformation: From Me to Meme
作者:	李峻安 Li, Chun-An
貢獻者:	紀明德 Chi, Ming-Te 李峻安 Li, Chun-An
關鍵詞:	Stable Diffusion Portrait Style Meme Image Composition Stable Diffusion Portrait Style Meme Image Composition
日期:	2025
上傳時間:	2025-09-01 16:56:43 (UTC+8)
摘要:	網路文化中，迷因指的是迅速擴散的內容、迷因圖片、影片或語句，具有高度可模仿性、變異性與社群認同感。它們常常具備幽默、諷刺或時事元素，使得使用者能夠快速「共鳴」並創造延伸版本。本研究針對迷因圖片因形變導致的人臉檢測問題，本研究提出一個基於穩定擴散模型與圖像分割技術的轉換方法，該方法在保留原圖像關鍵識別特徵的前提下針對形變效果進行平滑修正，以提高人臉識別的準確率，並確保生成結果具備良好的視覺一致性。在個人化影像合成方面已經有顯著的進展，方法像是 InstantID 和 LoRA 。然而他們在現實世界的應用受限於迷因人臉偵測的限制。人像風格與迷因圖片構圖相融合的需求有技術門檻、冗長的模型微調以及圖像篩選或前處理。針對這些挑戰，本研究提出 Me2Meme ——一種基於擴散模型的解決方案。同時評估這一方法的效果和實用性，為跨領域藝術和圖像處理領域帶來新的啟發和應用。 In internet culture, memes refer to rapidly spreading content, images, videos, or phrases characterized by high imitability, variability, and community resonance. They often incorporate humor, satire, or topical elements, enabling users to quickly connect and create derivative versions. This study addresses the issue of face detection in meme images caused by deformation. We propose a transformation method based on stable diffusion models and image segmentation techniques. This method smooths deformation effects while preserving key identifying features of the original image, thereby improving face detection accuracy and ensuring visually consistent outputs. Significant progress has been made in personalized image synthesis, with methods like InstantID and LoRA. However, their real-world applications are limited by challenges in meme face detection. The integration of portrait styles with meme image compositions faces technical barriers, including complex model fine-tuning, image screening, or preprocessing. To address these challenges, this study introduces Me2Meme, a diffusion model-based solution. We also evaluate the effectiveness and practicality of this approach, offering new insights and applications for cross-disciplinary fields in art and image processing.
參考文獻:	[1] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, pp. 1–15, 07 2019. [2] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [4] N. Ruiz, Y. Li, V. Jampani, T. Hou, P. O. Pinheiro, T. Liu, A. Goyal, A. Lehrmann, and J. Johnson, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [6] Z. Huang, K. C. K. Chan, Y. Jiang, and Z. Liu, “Collaborative diffusion for multi-modal face generation and editing,” 2023. [Online]. Available: https://arxiv.org/abs/2304.10530 [7] X. Ju, A. Zeng, C. Zhao, J. Wang, L. Zhang, and Q. Xu, “Humansd: A native skeleton-guided diffusion model for human image generation,” 2023. [Online]. Available: https://arxiv.org/abs/2304.04269 [8] X. Liu, J. Ren, A. Siarohin, I. Skorokhodov, Y. Li, D. Lin, X. Liu, Z. Liu, and S. Tulyakov, “Hyperhuman: Hyper-realistic human generation with latent structural diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08579 [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [10] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [11] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman,“Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [12] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [13] Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, “Hyper-sd: Trajectory segmented consistency model for efficient image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2404.13686 [14] R. Dawkins, The Selfish Gene. Oxford University Press, 1976. [15] L. Shifman, Memes in Digital Culture. MIT Press, 2014. [16] R. M. Milner, The World Made Meme: Public Conversations and Participatory Media. MIT Press, 2016. [17] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 [18] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2021. [Online]. Available: https://arxiv.org/abs/2112.10741 [19] W. Peebles, S. Xie, and A. Kanazawa, “Dit: Self-supervised pre-training for vision transformers using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2212.09748 [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022. [Online]. Available: https://arxiv.org/abs/2204.06125 [21] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. Gupta, B. Hechtman et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11487 [22] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 [23] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [24] P. Wei, Y. Yuan, L. Li, X. Yuan, and H. Yu, “Ip-adapter: Text compatible image prompt adapter for vision-language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06173 [25] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [26] N. Kumari, M. N. Zeiler, E. Arazi, and D. Cohen-Or, “Custom diffusion: Personalizing text-to-image diffusion models with iterative training,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04082 [27] E. J. Hu, Y. Shen, P. Wallis, Z. Lu, R. Majumder, and G. Yang, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [28] S. Liu, H. Zhou, Z. He, Z. Zhu, D. Li, X. Zhan, T. Xiang, and C. C. Loy, “Fastcomposer: Tuning-free multi-subject image generation with localized textual inversions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.10431 [29] Q. Zhang, G. Lin, W. Lin, H. Yu, H. Chen, and C. Miao, “Photomaker: Controllable personalized image generation via semantic layouts,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13521 [30] H. Zhang, F. Wang, R. Zhang, S. Lyu, H. Zhao, B. Dai, and B. Zhou, “Instantid: Text-guided instant personalization of text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08190 [31] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2018. [32] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Pearson, 2011. [33] C. Zhang and Z. Zhang, “Face detection with boosted gaussian features,” Pattern Recognition, vol. 43, no. 3, pp. 1025–1035, 2010. [34] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294 [35] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. [36] N. Zhou, D. Jurgens, and D. Bamman, “Social meme-ing: Measuring linguistic variation in memes,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09130 [37] J. Huang, X. Dong, W. Song, H. Li, J. Zhou, Y. Cheng, S. Liao, L. Chen, Y. Yan, S. Liao, and X. Liang, “Consistentid: Portrait generation with multimodal fine-grained identity preserving,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16771 [38] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 3730–3738. [39] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” CoRR, vol. abs/1812.04948, 2018. [Online]. Available: http://arxiv.org/abs/1812.04948 [40] AUTOMATIC1111, “Stable diffusion webui,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022. [41] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. [Online]. Available: http://arxiv.org/abs/1812.08008 [42] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [43] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022,pp. 10 684–10 695.
描述:	碩士國立政治大學資訊科學系 111753222
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0111753222
資料類型:	thesis
顯示於類別:	[資訊科學系] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
322201.pdf	7868Kb	Adobe PDF	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋