政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/156487

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 116849/147881 (79%)
造訪人次 : 64137898 線上人數 : 426

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/156487

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/156487

題名:	基於文字生成圖片之擴散模型的視覺化輔助設計系統 Visualization-Assisted Design System Based on Text-to-Image Diffusion Model
作者:	王良文 Wang, Liang-Wen
貢獻者:	紀明德 Chi, Ming-Te 王良文 Wang, Liang-Wen
關鍵詞:	提示工程視覺化擴散模型文字到圖像生成模型命名實體識別 Prompt Engineering Visualization Diffusion Models Text-to-Image Generation Named Entity Recognition
日期:	2025
上傳時間:	2025-04-01 12:27:33 (UTC+8)
摘要:	近年來，擴散模型顯著提升了文本生成圖像技術的品質，讓使用者能以提示詞創造出高關聯度且前所未見的圖像。然而，生成結果深受提示詞選擇影響，導致初學者難以掌握有效的提示詞設計。為此，本研究提出一套基於視覺化的輔助設計系統，協助使用者理解提示詞與圖像生成之間的關係，並提供優化的提示詞建議。我們利用 DiffusionDB 數據集並結合自然語言處理技術，分析提示詞語義，並運用 UMAP 將高維度提示詞關聯投影至直觀的二維視覺化空間。透過系統的動態迭代機制，使用者可隨時調整提示詞並即時觀察圖像變化，從而獲得創意啟發並創作出多樣的圖像。為了提供更多元的提示詞選擇，我們比較使用者輸入的提示詞與 DiffusionDB 的語義相似度，並進一步探討在標註實體任務中，GPT 模型在不同提示詞組合下的穩定度，以提升提示詞建議系統的可靠性。 In recent years, diffusion models have greatly improved text-to-image generation, allowing users to produce highly relevant and novel images through prompts. However, prompt design can be challenging for beginners. This study introduces a visualization-based assistive design system that leverages the DiffusionDB dataset and NLP techniques to analyze prompt semantics, using UMAP dimensionality reduction to create an interactive two-dimensional visualization of prompt relationships. Through iterative refinement, users can modify prompts and observe real-time image generation results, gaining creative inspiration for diverse outputs. By comparing user prompts with DiffusionDB via semantic similarity analysis, the system suggests various prompt options. Additionally, we examine GPT model stability under different prompt combinations in named entity annotation tasks to enhance the reliability of the prompt recommendation system.
參考文獻:	[1]J. Oppenlaender, “A taxonomy of prompt modifiers for text-to-image genera- tion,” Behaviour & Information Technology, pp. 1–14, 2023. [2] L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approx- imation and projection for dimension reduction. arxiv 2018,” arXiv preprint arXiv:1802.03426, vol. 10, 1802. [3]J. Oppenlaender, “Prompt engineering for text-to-image synthesis,” figshare. Presentation, 2022. [Online]. Available: https://doi.org/10.6084/m9.figshare. 18899801.v1 [4]A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831. [5]C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to- image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 479–36 494, 2022. [6]V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Hu- man Factors in Computing Systems, 2022, pp. 1–23. [7] Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau, “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint arXiv:2210.14896, 2022. [8] Y. Feng, X. Wang, K. K. Wong, S. Wang, Y. Lu, M. Zhu, B. Wang, and W. Chen, “Promptmagician: Interactive prompt engineering for text-to-image creation,” IEEE Transactions on Visualization and Computer Graphics, 2023. [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Github repos- itory for high-resolution image synthesis with latent diffusion models,” https://github.com/CompVis/stable-diffusion?tab=readme-ov-file, 2022. [10] ——, “High-resolution image synthesis with latent diffusion models,” in Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, 2022, pp. 10 684–10 695. [11] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Ad- vances in neural information processing systems, vol. 33, pp. 6840–6851, 2020. [12] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763. [13]S. Hsueh, M. Ciolfi Felice, S. F. Alaoui, and W. E. Mackay, “What counts as ‘creative’work? articulating four epistemic positions in creativity-oriented hci research,” in Proceedings of the CHI Conference on Human Factors in Comput- ing Systems, 2024, pp. 1–15. [14]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [15]Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle- moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining ap- proach,” arXiv preprint arXiv:1907.11692, 2019. [16]M. Grootendorst, “Keybert: Minimal keyword extraction with bert.” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.4461265 [17]T. Xie, Q. Li, Y. Zhang, Z. Liu, and H. Wang, “Self-improving for zero- shot named entity recognition with large language models,” arXiv preprint arXiv:2311.08921, 2023. [18]X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022. [19]W. Shao, R. Zhang, P. Ji, D. Fan, Y. Hu, X. Yan, C. Cui, Y. Tao, L. Mi, and L. Chen, “Astronomical knowledge entity extraction in astrophysics journal articles via large language models,” Research in Astronomy and Astrophysics, vol. 24, no. 6, p. 065012, 2024. [20]J. Ke, K. Ye, J. Yu, Y. Wu, P. Milanfar, and F. Yang, “Vila: Learning image aesthetics from user comments with vision-language pretraining,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10 041–10 051. [21] J. Xu, X. Liu, Y. Wu, Y. Tong, Q. Li, M. Ding, J. Tang, and Y. Dong, “Imagere-ward: Learning and evaluating human preferences for text-to-image generation,” Advances in Neural Information Processing Systems, vol. 36, 2024. [22] StabilityAI, “Stable diffusion dream studio beta terms of service,” https:// stability.ai/stablediffusion-terms-of-service, 2022, accessed: 2024-03-17. [23]W. Wang, H. Bao, S. Huang, L. Dong, and F. Wei, “Minilmv2: Multi-head self- attention relation distillation for compressing pretrained transformers,” arXiv preprint arXiv:2012.15828, 2020. [24]M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spacy: Industrial- strength natural language processing in python,” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.1212303 [25]N. Reimers, “Sentence-bert: Sentence embeddings using siamese bert- networks,” arXiv preprint arXiv:1908.10084, 2019. [26]L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008. [27]L. Derczynski, E. Nichols, M. Van Erp, and N. Limsopatham, “Results of the wnut2017 shared task on novel and emerging entity recognition,” in Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 140–147. [28]J. Hessel, A. Holtzman, M. Forbes, R. L. Bras, and Y. Choi, “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021. [29]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [30]T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877– 1901, 2020.
描述:	碩士國立政治大學資訊科學系 111753152
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0111753152
資料類型:	thesis
顯示於類別:	[資訊科學系] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
315201.pdf		14358Kb	Adobe PDF	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋