English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 112704/143671 (78%)
Visitors : 49771663      Online Users : 220
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/151504


    Title: 結合肢體動作識別及擴散模型的文字生成舞蹈機制
    Text-to-dance mechanism using human pose estimation and stable diffusion
    Authors: 洪健庭
    Hung, Chien-Ting
    Contributors: 廖文宏
    Liao, Wen-Hung
    洪健庭
    Hung, Chien-Ting
    Keywords: 深度學習
    肢體辨識
    生成式人工智慧
    文字生成舞蹈
    Deep Learning
    Human Pose Recognition
    Generative AI
    Text-to-Dance
    Date: 2024
    Issue Date: 2024-06-03 11:42:54 (UTC+8)
    Abstract: 肢體辨識在機器視覺領域是一個很重要的問題,如何在影像以及圖像中抓取人體骨骼的節點(如肩膀、手肘、手腕等)座標,不僅可以知道人物在圖像中的位置,還可藉由辨識結果去預測該人物在做什麼動作。
    擴散模型(Diffusion Model)在近年得到廣大的關注,最令人驚豔的是其在AIGC(AI Generated Conten)領域的表現,許多文字生成圖片都是基於擴散模型的應用,包含DALL·E、Imagen、Midjourney和StableDiffusion等。除了在圖片生成任務上表現出色之外,其他任務的生成效果也相當卓越。
    本論文探討使用Stable Diffusion 和OpenPose 來生成流暢的舞蹈動作,前者利用自定義文字產生人物外觀以及產生單位舞蹈動作的排序,並使用線性轉換的方式串接整體舞蹈動作,後者在連續舞蹈動作任務中作出肢體辨識,使以利自由設定角色外觀以及排序舞蹈動作。
    結合上述方式,本論文提出的使用文字產生舞蹈動作方法,不僅為影像製作領域引入一種新的模式,更在製作過程中可以更方便選擇角色、場景以及角色動作的設定,過往需要每一幀的繪畫出來或者真人根據設定動作去呈現,如果加上角色需要更換的情況下,相比傳統方法節省很多步驟及時間,這個的方法不僅擴展了影像生成的研究範疇,同時結合AIGC的方法為實際應用中提供了一種可行的解決方案。
    Pose estimation is a significant problem in the field of computer vision. It involves capturing the coordinates of skeletal joints (such as shoulders, elbows, wrists, etc.) of a human body in images and videos. This not only provides information about the person's position in the image but also enables predicting their actions based on the recognized joints.
    In recent years, diffusion models have gained significant attention, particularly for their impressive performance in the field of AI Generated Content (AIGC). Many text-to-image generation applications, including DALL·E, Imagen, Midjourney, and StableDiffusion, are based on diffusion models. These models have shown outstanding performance not only in image generation tasks but also in various other generative tasks.
    This thesis explores the use of the Stable Diffusion and OpenPose. The former, within the framework of this paper, allows for generating custom character appearances and producing ordered unit-level dance movements based on custom text inputs. These movements are then concatenated using linear transformations to create coherent overall dance sequences. The latter, OpenPose, performs pose estimation in continuous dance movement tasks. This framework enables the flexible configuration of character appearances and the sequencing of dance movements.
    Combining the above-mentioned approaches, the method proposed in this work, which utilizes text to generate dance movements, not only introduces a new pattern into the field of image production but also facilitates the selection of characters, scenes, and character actions during the production process. Previously, each frame required drawing or presenting actions based on set movements by real individuals. With the added flexibility for changing characters, our method significantly reduces steps and time compared to traditional approaches. In conjunction with AIGC methods, the proposed mechanism provides a viable solution for practical applications.
    Reference: [1] Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P. S., & Sun, L. (2023). A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv preprint arXiv:2303.04226.
    [2] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
    [3] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).
    [4] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.
    [5] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
    [6] Zhang, L., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543.
    [7] Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
    [8] Li, R., Yang, S., Ross, D. A., & Kanazawa, A. (2021). Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 13401-13412).
    [9] Tseng, J., Castellon, R., & Liu, K. (2023). Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 448-458).
    [10] Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., & Bermano, A. H. (2022). Human motion diffusion model. arXiv preprint arXiv:2209.14916.
    [11] Wang, T., Li, L., Lin, K., Lin, C. C., Yang, Z., Zhang, H., ... & Wang, L. (2023). DisCo: Disentangled Control for Referring Human Dance Generation in Real World. arXiv preprint arXiv:2307.00040.
    [12] Zhang, M., Guo, X., Pan, L., Cai, Z., Hong, F., Li, H., ... & Liu, Z. (2023). ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model. arXiv preprint arXiv:2304.01116.
    [13] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
    [14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
    [15] Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
    [16] Esser, P., Rombach, R., & Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12873-12883).
    [17] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). Springer International Publishing.
    [18] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [19] Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3), 37-52.
    [20] Hung-yi Lee【機器學習 2023】(生成式 AI)
    https://youtube.com/playlist?list=PLJV_el3uVTsOePyfmkfivYZ7Rqr2nMk3W&si=bLQJWEJsVmMG1HL3
    [21] Hugging Face – The AI community building the future.
    https://huggingface.co/
    [22] Civitai | Stable Diffusion models, embeddings, LoRAs and more
    https://civitai.com/
    [23] Wikipedia:Linear interpolation
    https://en.wikipedia.org/wiki/Linear_interpolation
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    110971024
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0110971024
    Data Type: thesis
    Appears in Collections:[資訊科學系碩士在職專班] 學位論文

    Files in This Item:

    File Description SizeFormat
    102401.pdf4608KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback