Reference: | [1]J. Oppenlaender, “A taxonomy of prompt modifiers for text-to-image genera- tion,” Behaviour & Information Technology, pp. 1–14, 2023. [2] L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approx- imation and projection for dimension reduction. arxiv 2018,” arXiv preprint arXiv:1802.03426, vol. 10, 1802. [3]J. Oppenlaender, “Prompt engineering for text-to-image synthesis,” figshare. Presentation, 2022. [Online]. Available: https://doi.org/10.6084/m9.figshare. 18899801.v1 [4]A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831. [5]C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to- image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 479–36 494, 2022. [6]V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Hu- man Factors in Computing Systems, 2022, pp. 1–23. [7] Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau, “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint arXiv:2210.14896, 2022. [8] Y. Feng, X. Wang, K. K. Wong, S. Wang, Y. Lu, M. Zhu, B. Wang, and W. Chen, “Promptmagician: Interactive prompt engineering for text-to-image creation,” IEEE Transactions on Visualization and Computer Graphics, 2023. [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Github repos- itory for high-resolution image synthesis with latent diffusion models,” https://github.com/CompVis/stable-diffusion?tab=readme-ov-file, 2022. [10] ——, “High-resolution image synthesis with latent diffusion models,” in Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, 2022, pp. 10 684–10 695. [11] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Ad- vances in neural information processing systems, vol. 33, pp. 6840–6851, 2020. [12] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763. [13]S. Hsueh, M. Ciolfi Felice, S. F. Alaoui, and W. E. Mackay, “What counts as ‘creative’work? articulating four epistemic positions in creativity-oriented hci research,” in Proceedings of the CHI Conference on Human Factors in Comput- ing Systems, 2024, pp. 1–15. [14]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [15]Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle- moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining ap- proach,” arXiv preprint arXiv:1907.11692, 2019. [16]M. Grootendorst, “Keybert: Minimal keyword extraction with bert.” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.4461265 [17]T. Xie, Q. Li, Y. Zhang, Z. Liu, and H. Wang, “Self-improving for zero- shot named entity recognition with large language models,” arXiv preprint arXiv:2311.08921, 2023. [18]X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022. [19]W. Shao, R. Zhang, P. Ji, D. Fan, Y. Hu, X. Yan, C. Cui, Y. Tao, L. Mi, and L. Chen, “Astronomical knowledge entity extraction in astrophysics journal articles via large language models,” Research in Astronomy and Astrophysics, vol. 24, no. 6, p. 065012, 2024. [20]J. Ke, K. Ye, J. Yu, Y. Wu, P. Milanfar, and F. Yang, “Vila: Learning image aesthetics from user comments with vision-language pretraining,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10 041–10 051. [21] J. Xu, X. Liu, Y. Wu, Y. Tong, Q. Li, M. Ding, J. Tang, and Y. Dong, “Imagere-ward: Learning and evaluating human preferences for text-to-image generation,” Advances in Neural Information Processing Systems, vol. 36, 2024. [22] StabilityAI, “Stable diffusion dream studio beta terms of service,” https:// stability.ai/stablediffusion-terms-of-service, 2022, accessed: 2024-03-17. [23]W. Wang, H. Bao, S. Huang, L. Dong, and F. Wei, “Minilmv2: Multi-head self- attention relation distillation for compressing pretrained transformers,” arXiv preprint arXiv:2012.15828, 2020. [24]M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spacy: Industrial- strength natural language processing in python,” 2020. [Online]. Available: https://doi.org/10.5281/zenodo.1212303 [25]N. Reimers, “Sentence-bert: Sentence embeddings using siamese bert- networks,” arXiv preprint arXiv:1908.10084, 2019. [26]L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008. [27]L. Derczynski, E. Nichols, M. Van Erp, and N. Limsopatham, “Results of the wnut2017 shared task on novel and emerging entity recognition,” in Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 140–147. [28]J. Hessel, A. Holtzman, M. Forbes, R. L. Bras, and Y. Choi, “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021. [29]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [30]T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877– 1901, 2020. |