政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/119801

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 115726/146760 (79%)
Visitors : 56661686 Online Users : 27

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系碩士在職專班 > 學位論文 > Item 140.119/119801

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/119801

Title:	基於i-Vector 特徵之聲音風格分析 Analysis of Voice Styles Using i-Vector Features
Authors:	高文聰 Kao, Wen-Tsung
Contributors:	廖文宏 Liao, Wen-Hung 高文聰 Kao, Wen-Tsung
Keywords:	聲音風格機器學習模式分類 i-Vector ALIZE Sound style Machine learning Pattern recognition I-Vector ALIZE
Date:	2018
Issue Date:	2018-08-29 16:04:21 (UTC+8)
Abstract:	聲音的風格有若干常見的形容詞，但難以被精確定義。本論文試圖從語者辨識(Speaker Recognition）的觀點出發，針對不同的聲音風格進行分析，使用的方法為目前在語音辨識中常用的特徵值向量i-Vector，並搭配支援向量機(SVM）做分類。為了測試i-Vector對於聲音風格描述的可用性，在過程中我們事先做了許多的驗證，包含基本語者辨識、最短輸入聲音長度測試、白噪音對於語者驗證的影響、說話內容關聯性測試、聲音取樣率測試與配音員使用不同聲調對於風格的測試。確認特徵之相關性後，我們挑選日常生活中常見的八種聲音風格類型進行分類，分析結果是否具一致性，證實利用語者辨識系統也可以有效的辨識聲音的風格類型。 Many adjectives have been used to describe voice characteristics, yet it is challenging to define sound styles precisely using quantitative measure. In this thesis, we attempt to tackle the sound style classification problem based on techniques designed for speaker recognition. Specifically, we employ i-Vector, a widely adopted feature in speaker identification together with support vector machine (SVM) for style classification. In order to verify the reliability of i-vector, we conducted a series of experiments, including basic speaker recognition function, minimum voice duration¸ noise sensitivity, context dependency, sensitivity to different sampling rates and style classification of samples from voice actors. The results indicate that i-Vector can indeed be utlilized to classify sound styles that are commonly perceived in daily life.
Reference:	[1] Heap, Michael. "Neuro-linguistic programming." Hypnosis: Current clinical, experimental and forensic practices (1988): 268-280. [2] NIST, “Speaker Recognition”, https://www.nist.gov/itl/iad/mig/speaker-recognition [3] Tong, Rong, et al. "The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features." presentation in 5th Int. Symp. on Chinese Spoken Language Processing, ISCSLP. 2006. [4] Hasan, Md Rashidul, Mustafa Jamil, and M. G. R. M. S. Rahman. "Speaker identification using mel frequency cepstral coefficients." variations 1.4 (2004). [5] Reynolds, Douglas A., and Richard C. Rose. "Robust text-independent speaker identification using Gaussian mixture speaker models." IEEE transactions on speech and audio processing 3.1 (1995): 72-83. [6] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3 (2000): 19-41. [7] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 14 (2005): 28-29. [8] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798. [9] AlplaGo, https://deepmind.com/research/alphago/ [10] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297. [11] Franc, Vojtech, Alexander Zien, and Bernhard Schölkopf. "Support vector machines as probabilistic models." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011. [12] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798 [13] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 215 (2005). [14] Larcher, Anthony, et al. "I-vectors in the context of phonetically-constrained short utterances for speaker verification." Acoustics, Speech and Signal Processing （ICASSP）, 2012 IEEE International Conference on. IEEE, 2012. [15] 陳嘉穎,“應用因素分析與識別向量於語音情緒辨識”, 國立中山大學碩士論文, 2016. [16] Bonastre, J-F., Frédéric Wils, and Sylvain Meignier. "ALIZE, a free toolkit for speaker recognition." Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP`05). IEEE International Conference on. Vol. 1. IEEE, 2005. [17] Larcher, Anthony, et al. "ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition." Interspeech. 2013. [18] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27 [19] SoX, “Sound eXchange”, http://sox.sourceforge.net [20] ALIZÉ, http://alize.univ-avignon.fr/ [21] SPro, http://www.irisa.fr/metiss/guig/spro/ [22] Audacity, https://www.audacityteam.org/ [23] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902. [24] Hyvärinen, Aapo, Juha Karhunen, and Erkki Oja. Independent component analysis. Vol. 46. John Wiley & Sons, 2004. [25] FFmpeg, https://www.ffmpeg.org/ [26] 娃娃音，維基百科，https://zh.wikipedia.org/wiki/%E5%A8%83%E5%A8%83%E9%9F%B3 [27] Youtube, https://www.youtube.com/ [28] 愛樂電台，https://www.e-classical.com.tw/index.html [29] 警察廣播電台，https://www.pbs.gov.tw/cht/index.php [30] Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length normalization in speaker recognition systems." Twelfth Annual Conference of the International Speech Communication Association. 2011. [31] 百度語音，http://fanyi.baidu.com/#auto/zh/ [32] Google語音， https://translate.google.com.tw/
Description:	碩士國立政治大學資訊科學系碩士在職專班 103971014
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0103971014
Data Type:	thesis
DOI:	10.6814/THE.NCCU.EMCS.007.2018.B02
Appears in Collections:	[資訊科學系碩士在職專班] 學位論文

Files in This Item:

File	Size	Format
101401.pdf	8950Kb	Adobe PDF2	263	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback