Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/160072
|
| Title: | 結合關鍵目標擾動增強浮水印技術以抵抗語音克隆攻擊 Enhancement Watermark with Pivotal Objective Perturbation against Voice Cloning Attack |
| Authors: | 林柏含 Lin, Bo-Han |
| Contributors: | 胡毓忠 Hu, Yuh-Jong 林柏含 Lin, Bo-Han |
| Keywords: | 語音克隆 反電子欺騙 數位浮水印 關鍵目標擾動雜訊 Voice cloning Anti-Spoofing Digital watermarking Perturbation noise |
| Date: | 2025 |
| Issue Date: | 2025-11-03 14:45:04 (UTC+8) |
| Abstract: | 近年來,生成式人工智慧模型的快速進步推動了多項應用的突破性發展。其中,生成式語音合成技術以極高的精確度模仿人類聲音,這項技術的快速發展也帶來了嚴重的安全隱患,包括身份盜用、詐欺行為以及操縱性內容的傳播。這些威脅不僅危害個人隱私,還可能對社會信任與公共安全造成深遠影響。 本研究提出一種多層次防護策略,結合數位浮水印與關鍵目標擾動雜訊技術,以應對上述風險。數位浮水印可用於驗證語音內容是否經過授權或被他人篡改;而關鍵目標擾動雜訊則能有效防止語音特徵被盜用,從而阻礙未經授權的語音生成。 實驗結果顯示,結合這兩種防禦機制後,數位浮水印的檢測性能依然高度可靠,AUC 值達到 0.993,未受擾動雜訊影響。此外,當使用語音合成模型使用受保護的語音想生成語音時,生成器無法產生有效辨識的語音,生成詞錯誤率(WER)高達 1.01,表明生成的語音難以被人理解。 總體而言,這兩種防禦機制能夠協同運作,互不影響性能為語音合成技術的安全應用提供了有效保障。 Recent advances in generative AI have driven breakthroughs in applications, particularly in speech synthesis, which can accurately mimic human voices. However, this progress raises serious security concerns, including identity theft, fraud, and manipulated content, threatening individual privacy and societal trust. This study proposes a dual defense strategy combining digital watermarking and targeted perturbation noise. Watermarking verifies audio authenticity, while perturbation prevents unauthorized voice synthesis. Results show watermark detection remains reliable AUC 0.9993 despite perturbation, and synthesized voices are unintelligible WER 1.01. These mechanisms coexist effectively, ensuring robust protection for speech synthesis applications. |
| Reference: | [1]Anonymous. “Proactive Detection of Voice Cloning with Localized Watermark-ing”. In: arXiv preprint arXiv:2401.17264 (2024). [2]Michael Arnold. Techniques and Applications of Digital Watermarking and Content Protection. Artech House, 2003. ISBN: 9781580531115. [3]Starling Bank. Starling Bank Launches Safe Phrases Campaign. https://www.starlingbank.com/news/starling-bank-launches-safe-phrases-campaign/. Accessed: 2025-06-29. 2023. [4]Nicholas Carlini and David Wagner. “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text”. In: 2018 IEEE Security and Privacy Workshops (SPW)(2018), pp. 1–7. DOI: 10.1109/SPW.2018.00009. [5]Hyeonseung Choi, Jihoon Lee, and Youngjin Park. “Robustness of Mel-Spectrogram Features in Speaker Recognition”. In: IEEE Signal Processing Letters 26.8 (2019), pp. 1187–1191. DOI: 10.1109/LSP.2019.2921912. [6]Federal Trade Commission. FTC Proposes New Protections to Combat AI Impersonation of Individuals. https://www.ftc.gov/news-events/news/pressreleases/2024/02/ftc-proposes-new-protections-combat-ai-impersonation-individuals. Accessed: 2025-06-29. 2024. [7]Keith Ito and Linda Johnson. LJ Speech Dataset. 2017. URL: https://keithito.com/LJ-Speech-Dataset/. [8]Jaehyeon Kim, Jungil Kong, and Juhee Son. “Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech”. In: International Conference on Machine Learning (2021). arXiv:2106.06103. [9]G. Kubin, B. S. Atal, and W. B. Kleijn. “Performance of noise excitation for unvoiced speech”. In: IEEE Workshop on Speech Coding for Telecommunications(1996). [10]Yixin Liu et al. “Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise”. In: arXiv preprint arXiv:2302.04847(2023). [11]Trend Micro. Unusual CEO Fraud via Deepfake Audio Steals $243,000 from U.K. Company. https://www.trendmicro.com/vinfo/us/security/news/cyberattacks/unusual-ceo-fraud-via-deepfake-audio-steals-us-243-000-from-u-k-company. Accessed: 2025-06-29. 2019. [12]Robin San Roman et al. “Proactive Detection of Voice Cloning with Localized Watermarking”. In: arXiv preprint arXiv:2401.17264 (2024). [13]pindrop security. Pindrop Security Raises 100M illiontoExpandDeepf akeDetectionT echnology. https://www.securityweek.com/pindrop-security-raises-100-million-to-expand-deepfake-detection-technology/. Accessed: 2025-07-08. 2024. [14]Xin Shen et al. “Deepfakes: The Coming Infocalypse in Audio and Video”. In: IEEE Transactions on Multimedia 22.10 (2020), pp. 2601–2612. DOI: 10.1109/TMM.2020.2982567. [15]Kenneth N. Stevens. Acoustic Phonetics. MIT Press, 1998. ISBN: 9780262194044. [16]Christian Szegedy et al. “Intriguing Properties of Neural Networks”. In: International Conference on Learning Representations (ICLR) (2014). arXiv:1312.6199. [17]truecaller. Truecaller Insights 2021 U.S. Spam Scam Report. https://www.truecaller.com/blog/insights/us-spam-scam-report-21. Accessed: 2025-07-08. 2021. [18]Changhan Wang et al. “VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation”. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021, pp. 993–1003. DOI: 10 . 18653 / v1 / 2021 . acl - long . 80. URL: https://aclanthology.org/2021.acl-long.80/. [19]Rui Wang, Xin Zhang, and Yang Liu. “Detecting Audio Deepfakes Using Mel-Spectrogram Features and Convolutional Neural Networks”. In: Computer Speech Language 68 (2021), p. 101203. DOI: 10.1016/j.csl.2021.101203. [20]Zhiyuan Yu et al. “AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis”. In: arXiv preprint arXiv:2305.12737 (2023). [21]Heiga Zen et al. “LibriTTS: A corpus derived from LibriSpeech for text-to-speech”. In: arXiv preprint arXiv:1904.02882 (2019). [22]Zhisheng Zhang et al. “Mitigating Unauthorized Speech Synthesis for Voice Protection”. In: arXiv preprint arXiv:2405.12686 (2024). |
| Description: | 碩士 國立政治大學 資訊科學系碩士在職專班 109971006 |
| Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109971006 |
| Data Type: | thesis |
| Appears in Collections: | [資訊科學系碩士在職專班] 學位論文
|
Files in This Item:
| File |
Size | Format | |
| 100601.pdf | 13149Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|