政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/151504
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113318/144297 (79%)
造访人次 : 50997530      在线人数 : 811
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/151504


    题名: 結合肢體動作識別及擴散模型的文字生成舞蹈機制
    Text-to-dance mechanism using human pose estimation and stable diffusion
    作者: 洪健庭
    Hung, Chien-Ting
    贡献者: 廖文宏
    Liao, Wen-Hung
    洪健庭
    Hung, Chien-Ting
    关键词: 深度學習
    肢體辨識
    生成式人工智慧
    文字生成舞蹈
    Deep Learning
    Human Pose Recognition
    Generative AI
    Text-to-Dance
    日期: 2024
    上传时间: 2024-06-03 11:42:54 (UTC+8)
    摘要: 肢體辨識在機器視覺領域是一個很重要的問題,如何在影像以及圖像中抓取人體骨骼的節點(如肩膀、手肘、手腕等)座標,不僅可以知道人物在圖像中的位置,還可藉由辨識結果去預測該人物在做什麼動作。
    擴散模型(Diffusion Model)在近年得到廣大的關注,最令人驚豔的是其在AIGC(AI Generated Conten)領域的表現,許多文字生成圖片都是基於擴散模型的應用,包含DALL·E、Imagen、Midjourney和StableDiffusion等。除了在圖片生成任務上表現出色之外,其他任務的生成效果也相當卓越。
    本論文探討使用Stable Diffusion 和OpenPose 來生成流暢的舞蹈動作,前者利用自定義文字產生人物外觀以及產生單位舞蹈動作的排序,並使用線性轉換的方式串接整體舞蹈動作,後者在連續舞蹈動作任務中作出肢體辨識,使以利自由設定角色外觀以及排序舞蹈動作。
    結合上述方式,本論文提出的使用文字產生舞蹈動作方法,不僅為影像製作領域引入一種新的模式,更在製作過程中可以更方便選擇角色、場景以及角色動作的設定,過往需要每一幀的繪畫出來或者真人根據設定動作去呈現,如果加上角色需要更換的情況下,相比傳統方法節省很多步驟及時間,這個的方法不僅擴展了影像生成的研究範疇,同時結合AIGC的方法為實際應用中提供了一種可行的解決方案。
    Pose estimation is a significant problem in the field of computer vision. It involves capturing the coordinates of skeletal joints (such as shoulders, elbows, wrists, etc.) of a human body in images and videos. This not only provides information about the person's position in the image but also enables predicting their actions based on the recognized joints.
    In recent years, diffusion models have gained significant attention, particularly for their impressive performance in the field of AI Generated Content (AIGC). Many text-to-image generation applications, including DALL·E, Imagen, Midjourney, and StableDiffusion, are based on diffusion models. These models have shown outstanding performance not only in image generation tasks but also in various other generative tasks.
    This thesis explores the use of the Stable Diffusion and OpenPose. The former, within the framework of this paper, allows for generating custom character appearances and producing ordered unit-level dance movements based on custom text inputs. These movements are then concatenated using linear transformations to create coherent overall dance sequences. The latter, OpenPose, performs pose estimation in continuous dance movement tasks. This framework enables the flexible configuration of character appearances and the sequencing of dance movements.
    Combining the above-mentioned approaches, the method proposed in this work, which utilizes text to generate dance movements, not only introduces a new pattern into the field of image production but also facilitates the selection of characters, scenes, and character actions during the production process. Previously, each frame required drawing or presenting actions based on set movements by real individuals. With the added flexibility for changing characters, our method significantly reduces steps and time compared to traditional approaches. In conjunction with AIGC methods, the proposed mechanism provides a viable solution for practical applications.
    參考文獻: [1] Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P. S., & Sun, L. (2023). A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv preprint arXiv:2303.04226.
    [2] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
    [3] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).
    [4] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.
    [5] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
    [6] Zhang, L., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543.
    [7] Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
    [8] Li, R., Yang, S., Ross, D. A., & Kanazawa, A. (2021). Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 13401-13412).
    [9] Tseng, J., Castellon, R., & Liu, K. (2023). Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 448-458).
    [10] Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., & Bermano, A. H. (2022). Human motion diffusion model. arXiv preprint arXiv:2209.14916.
    [11] Wang, T., Li, L., Lin, K., Lin, C. C., Yang, Z., Zhang, H., ... & Wang, L. (2023). DisCo: Disentangled Control for Referring Human Dance Generation in Real World. arXiv preprint arXiv:2307.00040.
    [12] Zhang, M., Guo, X., Pan, L., Cai, Z., Hong, F., Li, H., ... & Liu, Z. (2023). ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model. arXiv preprint arXiv:2304.01116.
    [13] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
    [14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
    [15] Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
    [16] Esser, P., Rombach, R., & Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12873-12883).
    [17] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). Springer International Publishing.
    [18] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [19] Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3), 37-52.
    [20] Hung-yi Lee【機器學習 2023】(生成式 AI)
    https://youtube.com/playlist?list=PLJV_el3uVTsOePyfmkfivYZ7Rqr2nMk3W&si=bLQJWEJsVmMG1HL3
    [21] Hugging Face – The AI community building the future.
    https://huggingface.co/
    [22] Civitai | Stable Diffusion models, embeddings, LoRAs and more
    https://civitai.com/
    [23] Wikipedia:Linear interpolation
    https://en.wikipedia.org/wiki/Linear_interpolation
    描述: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    110971024
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0110971024
    数据类型: thesis
    显示于类别:[資訊科學系碩士在職專班] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    102401.pdf4608KbAdobe PDF0检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈