Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/119801
|
Title: | 基於i-Vector 特徵之聲音風格分析 Analysis of Voice Styles Using i-Vector Features |
Authors: | 高文聰 Kao, Wen-Tsung |
Contributors: | 廖文宏 Liao, Wen-Hung 高文聰 Kao, Wen-Tsung |
Keywords: | 聲音風格 機器學習 模式分類 i-Vector ALIZE Sound style Machine learning Pattern recognition I-Vector ALIZE |
Date: | 2018 |
Issue Date: | 2018-08-29 16:04:21 (UTC+8) |
Abstract: | 聲音的風格有若干常見的形容詞,但難以被精確定義。本論文試圖從語者辨識(Speaker Recognition)的觀點出發,針對不同的聲音風格進行分析,使用的方法為目前在語音辨識中常用的特徵值向量i-Vector,並搭配支援向量機(SVM)做分類。為了測試i-Vector對於聲音風格描述的可用性,在過程中我們事先做了許多的驗證,包含基本語者辨識、最短輸入聲音長度測試、白噪音對於語者驗證的影響、說話內容關聯性測試、聲音取樣率測試與配音員使用不同聲調對於風格的測試。確認特徵之相關性後,我們挑選日常生活中常見的八種聲音風格類型進行分類,分析結果是否具一致性,證實利用語者辨識系統也可以有效的辨識聲音的風格類型。 Many adjectives have been used to describe voice characteristics, yet it is challenging to define sound styles precisely using quantitative measure. In this thesis, we attempt to tackle the sound style classification problem based on techniques designed for speaker recognition. Specifically, we employ i-Vector, a widely adopted feature in speaker identification together with support vector machine (SVM) for style classification. In order to verify the reliability of i-vector, we conducted a series of experiments, including basic speaker recognition function, minimum voice duration¸ noise sensitivity, context dependency, sensitivity to different sampling rates and style classification of samples from voice actors. The results indicate that i-Vector can indeed be utlilized to classify sound styles that are commonly perceived in daily life. |
Reference: | [1] Heap, Michael. "Neuro-linguistic programming." Hypnosis: Current clinical, experimental and forensic practices (1988): 268-280. [2] NIST, “Speaker Recognition”, https://www.nist.gov/itl/iad/mig/speaker-recognition [3] Tong, Rong, et al. "The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features." presentation in 5th Int. Symp. on Chinese Spoken Language Processing, ISCSLP. 2006. [4] Hasan, Md Rashidul, Mustafa Jamil, and M. G. R. M. S. Rahman. "Speaker identification using mel frequency cepstral coefficients." variations 1.4 (2004). [5] Reynolds, Douglas A., and Richard C. Rose. "Robust text-independent speaker identification using Gaussian mixture speaker models." IEEE transactions on speech and audio processing 3.1 (1995): 72-83. [6] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3 (2000): 19-41. [7] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 14 (2005): 28-29. [8] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798. [9] AlplaGo, https://deepmind.com/research/alphago/ [10] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297. [11] Franc, Vojtech, Alexander Zien, and Bernhard Schölkopf. "Support vector machines as probabilistic models." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011. [12] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798 [13] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 215 (2005). [14] Larcher, Anthony, et al. "I-vectors in the context of phonetically-constrained short utterances for speaker verification." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012. [15] 陳嘉穎,“應用因素分析與識別向量於語音情緒辨識”, 國立中山大學碩士論文, 2016. [16] Bonastre, J-F., Frédéric Wils, and Sylvain Meignier. "ALIZE, a free toolkit for speaker recognition." Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP`05). IEEE International Conference on. Vol. 1. IEEE, 2005. [17] Larcher, Anthony, et al. "ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition." Interspeech. 2013. [18] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27 [19] SoX, “Sound eXchange”, http://sox.sourceforge.net [20] ALIZÉ, http://alize.univ-avignon.fr/ [21] SPro, http://www.irisa.fr/metiss/guig/spro/ [22] Audacity, https://www.audacityteam.org/ [23] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902. [24] Hyvärinen, Aapo, Juha Karhunen, and Erkki Oja. Independent component analysis. Vol. 46. John Wiley & Sons, 2004. [25] FFmpeg, https://www.ffmpeg.org/ [26] 娃娃音,維基百科,https://zh.wikipedia.org/wiki/%E5%A8%83%E5%A8%83%E9%9F%B3 [27] Youtube, https://www.youtube.com/ [28] 愛樂電台,https://www.e-classical.com.tw/index.html [29] 警察廣播電台,https://www.pbs.gov.tw/cht/index.php [30] Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length normalization in speaker recognition systems." Twelfth Annual Conference of the International Speech Communication Association. 2011. [31] 百度語音,http://fanyi.baidu.com/#auto/zh/ [32] Google語音, https://translate.google.com.tw/ |
Description: | 碩士 國立政治大學 資訊科學系碩士在職專班 103971014 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0103971014 |
Data Type: | thesis |
DOI: | 10.6814/THE.NCCU.EMCS.007.2018.B02 |
Appears in Collections: | [資訊科學系碩士在職專班] 學位論文
|
Files in This Item:
File |
Size | Format | |
101401.pdf | 8950Kb | Adobe PDF2 | 263 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|