English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 118874/149939 (79%)
Visitors : 82683546      Online Users : 523
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/160072


    Title: 結合關鍵目標擾動增強浮水印技術以抵抗語音克隆攻擊
    Enhancement Watermark with Pivotal Objective Perturbation against Voice Cloning Attack
    Authors: 林柏含
    Lin, Bo-Han
    Contributors: 胡毓忠
    Hu, Yuh-Jong
    林柏含
    Lin, Bo-Han
    Keywords: 語音克隆
    反電子欺騙
    數位浮水印
    關鍵目標擾動雜訊
    Voice cloning
    Anti-Spoofing
    Digital watermarking
    Perturbation noise
    Date: 2025
    Issue Date: 2025-11-03 14:45:04 (UTC+8)
    Abstract: 近年來,生成式人工智慧模型的快速進步推動了多項應用的突破性發展。其中,生成式語音合成技術以極高的精確度模仿人類聲音,這項技術的快速發展也帶來了嚴重的安全隱患,包括身份盜用、詐欺行為以及操縱性內容的傳播。這些威脅不僅危害個人隱私,還可能對社會信任與公共安全造成深遠影響。
    本研究提出一種多層次防護策略,結合數位浮水印與關鍵目標擾動雜訊技術,以應對上述風險。數位浮水印可用於驗證語音內容是否經過授權或被他人篡改;而關鍵目標擾動雜訊則能有效防止語音特徵被盜用,從而阻礙未經授權的語音生成。
    實驗結果顯示,結合這兩種防禦機制後,數位浮水印的檢測性能依然高度可靠,AUC 值達到 0.993,未受擾動雜訊影響。此外,當使用語音合成模型使用受保護的語音想生成語音時,生成器無法產生有效辨識的語音,生成詞錯誤率(WER)高達 1.01,表明生成的語音難以被人理解。
    總體而言,這兩種防禦機制能夠協同運作,互不影響性能為語音合成技術的安全應用提供了有效保障。
    Recent advances in generative AI have driven breakthroughs in applications, particularly in speech synthesis, which can accurately mimic human voices. However, this progress raises serious security concerns, including identity theft, fraud, and manipulated content, threatening individual privacy and societal trust.
    This study proposes a dual defense strategy combining digital watermarking and targeted perturbation noise. Watermarking verifies audio authenticity, while perturbation prevents unauthorized voice synthesis.
    Results show watermark detection remains reliable AUC 0.9993 despite perturbation, and synthesized voices are unintelligible WER 1.01. These mechanisms coexist effectively, ensuring robust protection for speech synthesis applications.
    Reference: [1]Anonymous. “Proactive Detection of Voice Cloning with Localized Watermark-ing”. In: arXiv preprint arXiv:2401.17264 (2024).
    [2]Michael Arnold. Techniques and Applications of Digital Watermarking and Content Protection. Artech House, 2003. ISBN: 9781580531115.
    [3]Starling Bank. Starling Bank Launches Safe Phrases Campaign. https://www.starlingbank.com/news/starling-bank-launches-safe-phrases-campaign/. Accessed: 2025-06-29. 2023.
    [4]Nicholas Carlini and David Wagner. “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text”. In: 2018 IEEE Security and Privacy Workshops (SPW)(2018), pp. 1–7. DOI: 10.1109/SPW.2018.00009.
    [5]Hyeonseung Choi, Jihoon Lee, and Youngjin Park. “Robustness of Mel-Spectrogram Features in Speaker Recognition”. In: IEEE Signal Processing Letters 26.8 (2019), pp. 1187–1191. DOI: 10.1109/LSP.2019.2921912.
    [6]Federal Trade Commission. FTC Proposes New Protections to Combat AI Impersonation of Individuals. https://www.ftc.gov/news-events/news/pressreleases/2024/02/ftc-proposes-new-protections-combat-ai-impersonation-individuals. Accessed: 2025-06-29. 2024.
    [7]Keith Ito and Linda Johnson. LJ Speech Dataset. 2017. URL: https://keithito.com/LJ-Speech-Dataset/.
    [8]Jaehyeon Kim, Jungil Kong, and Juhee Son. “Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech”. In: International Conference on Machine Learning (2021). arXiv:2106.06103.
    [9]G. Kubin, B. S. Atal, and W. B. Kleijn. “Performance of noise excitation for unvoiced speech”. In: IEEE Workshop on Speech Coding for Telecommunications(1996).
    [10]Yixin Liu et al. “Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise”. In: arXiv preprint arXiv:2302.04847(2023).
    [11]Trend Micro. Unusual CEO Fraud via Deepfake Audio Steals $243,000 from U.K. Company. https://www.trendmicro.com/vinfo/us/security/news/cyberattacks/unusual-ceo-fraud-via-deepfake-audio-steals-us-243-000-from-u-k-company. Accessed: 2025-06-29. 2019.
    [12]Robin San Roman et al. “Proactive Detection of Voice Cloning with Localized Watermarking”. In: arXiv preprint arXiv:2401.17264 (2024).
    [13]pindrop security. Pindrop Security Raises 100M illiontoExpandDeepf akeDetectionT echnology. https://www.securityweek.com/pindrop-security-raises-100-million-to-expand-deepfake-detection-technology/. Accessed: 2025-07-08. 2024.
    [14]Xin Shen et al. “Deepfakes: The Coming Infocalypse in Audio and Video”. In: IEEE Transactions on Multimedia 22.10 (2020), pp. 2601–2612. DOI: 10.1109/TMM.2020.2982567.
    [15]Kenneth N. Stevens. Acoustic Phonetics. MIT Press, 1998. ISBN: 9780262194044.
    [16]Christian Szegedy et al. “Intriguing Properties of Neural Networks”. In: International Conference on Learning Representations (ICLR) (2014). arXiv:1312.6199.
    [17]truecaller. Truecaller Insights 2021 U.S. Spam Scam Report. https://www.truecaller.com/blog/insights/us-spam-scam-report-21. Accessed: 2025-07-08. 2021.
    [18]Changhan Wang et al. “VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation”. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021, pp. 993–1003. DOI: 10 . 18653 / v1 / 2021 . acl - long . 80. URL: https://aclanthology.org/2021.acl-long.80/.
    [19]Rui Wang, Xin Zhang, and Yang Liu. “Detecting Audio Deepfakes Using Mel-Spectrogram Features and Convolutional Neural Networks”. In: Computer Speech Language 68 (2021), p. 101203. DOI: 10.1016/j.csl.2021.101203.
    [20]Zhiyuan Yu et al. “AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis”. In: arXiv preprint arXiv:2305.12737 (2023).
    [21]Heiga Zen et al. “LibriTTS: A corpus derived from LibriSpeech for text-to-speech”. In: arXiv preprint arXiv:1904.02882 (2019).
    [22]Zhisheng Zhang et al. “Mitigating Unauthorized Speech Synthesis for Voice Protection”. In: arXiv preprint arXiv:2405.12686 (2024).
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    109971006
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109971006
    Data Type: thesis
    Appears in Collections:[資訊科學系碩士在職專班] 學位論文

    Files in This Item:

    File SizeFormat
    100601.pdf13149KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback