English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 115256/146303 (79%)
Visitors : 54507919      Online Users : 668
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/156487
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/156487


    Title: 基於文字生成圖片之擴散模型的視覺化輔助設計系統
    Visualization-Assisted Design System Based on Text-to-Image Diffusion Model
    Authors: 王良文
    Wang, Liang-Wen
    Contributors: 紀明德
    Chi, Ming-Te
    王良文
    Wang, Liang-Wen
    Keywords: 提示工程
    視覺化
    擴散模型
    文字到圖像生成模型
    命名實體識別
    Prompt Engineering
    Visualization
    Diffusion Models
    Text-to-Image Generation
    Named Entity Recognition
    Date: 2025
    Issue Date: 2025-04-01 12:27:33 (UTC+8)
    Abstract: 近年來,擴散模型顯著提升了文本生成圖像技術的品質,讓使用者能以提示詞創造出高關聯度且前所未見的圖像。然而,生成結果深受提示詞選擇影響,導致初學者難以掌握有效的提示詞設計。為此,本研究提出一套基於視覺化的輔助設計系統,協助使用者理解提示詞與圖像生成之間的關係,並提供優化的提示詞建議。我們利用 DiffusionDB 數據集並結合自然語言處理技術,分析提示詞語義,並運用 UMAP 將高維度提示詞關聯投影至直觀的二維視覺化空間。透過系統的動態迭代機制,使用者可隨時調整提示詞並即時觀察圖像變化,從而獲得創意啟發並創作出多樣的圖像。為了提供更多元的提示詞選擇,我們比較使用者輸入的提示詞與 DiffusionDB 的語義相似度,並進一步探討在標註實體任務中,GPT 模型在不同提示詞組合下的穩定度,以提升提示詞建議系統的可靠性。
    In recent years, diffusion models have greatly improved text-to-image generation, allowing users to produce highly relevant and novel images through prompts. However, prompt design can be challenging for beginners. This study introduces a visualization-based assistive design system that leverages the DiffusionDB dataset and NLP techniques to analyze prompt semantics, using UMAP dimensionality reduction to create an interactive two-dimensional visualization of prompt relationships. Through iterative refinement, users can modify prompts and observe real-time image generation results, gaining creative inspiration for diverse outputs. By comparing user prompts with DiffusionDB via semantic similarity analysis, the system suggests various prompt options. Additionally, we examine GPT model stability under different prompt combinations in named entity annotation tasks to enhance the reliability of the prompt recommendation system.
    Reference: [1]J. Oppenlaender, “A taxonomy of prompt modifiers for text-to-image genera- tion,” Behaviour & Information Technology, pp. 1–14, 2023.
    [2] L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approx- imation and projection for dimension reduction. arxiv 2018,” arXiv preprint arXiv:1802.03426, vol. 10, 1802.
    [3]J. Oppenlaender, “Prompt engineering for text-to-image synthesis,” figshare. Presentation, 2022. [Online]. Available: https://doi.org/10.6084/m9.figshare. 18899801.v1
    [4]A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning.  PMLR, 2021, pp. 8821–8831.
    [5]C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to- image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 479–36 494, 2022.
    [6]V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Hu- man Factors in Computing Systems, 2022, pp. 1–23.
    [7] Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau, “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint arXiv:2210.14896, 2022.
    [8] Y. Feng, X. Wang, K. K. Wong, S. Wang, Y. Lu, M. Zhu, B. Wang, and W. Chen, “Promptmagician: Interactive prompt engineering for text-to-image creation,” IEEE Transactions on Visualization and Computer Graphics, 2023.
    [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Github repos- itory for high-resolution image synthesis with latent diffusion models,” https://github.com/CompVis/stable-diffusion?tab=readme-ov-file, 2022.
    [10] ——, “High-resolution image synthesis with latent diffusion models,” in Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, 2022, pp. 10 684–10 695.
    [11] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Ad- vances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
    [12] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
    [13]S. Hsueh, M. Ciolfi Felice, S. F. Alaoui, and W. E. Mackay, “What counts as ‘creative’work? articulating four epistemic positions in creativity-oriented hci research,” in Proceedings of the CHI Conference on Human Factors in Comput- ing Systems, 2024, pp. 1–15.
    [14]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    [15]Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle- moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining ap- proach,” arXiv preprint arXiv:1907.11692, 2019.
    [16]M. Grootendorst, “Keybert: Minimal keyword extraction with bert.” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.4461265
    [17]T. Xie, Q. Li, Y. Zhang, Z. Liu, and H. Wang, “Self-improving for zero- shot named entity recognition with large language models,” arXiv preprint arXiv:2311.08921, 2023.
    [18]X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.
    [19]W. Shao, R. Zhang, P. Ji, D. Fan, Y. Hu, X. Yan, C. Cui, Y. Tao, L. Mi, and L. Chen, “Astronomical knowledge entity extraction in astrophysics journal articles via large language models,” Research in Astronomy and Astrophysics, vol. 24, no. 6, p. 065012, 2024.
    [20]J. Ke, K. Ye, J. Yu, Y. Wu, P. Milanfar, and F. Yang, “Vila: Learning image aesthetics from user comments with vision-language pretraining,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10 041–10 051.
    [21] J. Xu, X. Liu, Y. Wu, Y. Tong, Q. Li, M. Ding, J. Tang, and Y. Dong, “Imagere-ward: Learning and evaluating human preferences for text-to-image generation,” Advances in Neural Information Processing Systems, vol. 36, 2024.
    [22] StabilityAI, “Stable diffusion dream studio beta terms of service,” https:// stability.ai/stablediffusion-terms-of-service, 2022, accessed: 2024-03-17.
    [23]W. Wang, H. Bao, S. Huang, L. Dong, and F. Wei, “Minilmv2: Multi-head self- attention relation distillation for compressing pretrained transformers,” arXiv preprint arXiv:2012.15828, 2020.
    [24]M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spacy: Industrial- strength natural language processing in python,” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.1212303
    [25]N. Reimers, “Sentence-bert: Sentence embeddings using siamese bert- networks,” arXiv preprint arXiv:1908.10084, 2019.
    [26]L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
    [27]L. Derczynski, E. Nichols, M. Van Erp, and N. Limsopatham, “Results of the wnut2017 shared task on novel and emerging entity recognition,” in Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 140–147.
    [28]J. Hessel, A. Holtzman, M. Forbes, R. L. Bras, and Y. Choi, “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021.
    [29]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
    [30]T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877– 1901, 2020.
    Description: 碩士
    國立政治大學
    資訊科學系
    111753152
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0111753152
    Data Type: thesis
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    315201.pdf14358KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback