English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113318/144297 (79%)
Visitors : 50995248      Online Users : 790
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/131936


    Title: 混合人聲之聲音場景辨識
    Classification of Acoustic Scenes with Mixtures of Human Voice and Background Audio
    Authors: 李御國
    Li, Yu-Guo
    Contributors: 廖文宏
    Liao, Wen-Hung
    李御國
    Li, Yu-Guo
    Keywords: 卷積神經網路
    DCASE音訊資料集
    聲音場景辨識
    線上身份驗證
    Voice-based Online Identity Verification
    Convolutional Neural Network
    DCASE Dataset
    Acoustic Scene Classification
    Date: 2020
    Issue Date: 2020-09-02 13:15:07 (UTC+8)
    Abstract: 日常生活環境週遭聲音,從來不是單獨事件,而是多種音源重疊在一起,使得環境音辨識充滿了各種挑戰。本研究以DCASE2016 比賽Task1所提供的音訊資料,包括海邊(Beach)與輕軌電車(Tram)等共15種場景的環境錄音為基礎,搭配16位人聲進行合成,針對混合人聲後的場景進行分析與辨識。聲音特徵萃取採用了普遍使用於聲音辨識的對數梅爾頻譜(Log-Mel Spectrogram),用以保留最多聲音特徵,並利用卷積神經網路(CNN)來分辨出這些相互疊合聲音場景,整體平均辨識率達79%,於車輛(Car)類別辨識率可達93%,希望能將其運用在線上身份驗證之聲紋辨識的前處理階段。
    The sounds around the environment of daily life are never separate events but consist of overlapping audio sources, making environmental sound recognition a challenging issue. This research employs audio data provided by Task1 of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE2016) competition, including environmental recordings of 15 scenes in different settings such as beach and tram. They are mixed with 16 human voices to create a new dataset. Acoustic features are extracted from the Log-Mel spectrogram, which is commonly used in voice recognition to retain the most distinct sound properties. Convolutional neural network (CNN) is employed to distinguish these overlapping sound scenes. We achiveve an overall accuracy of 79% and 93% accudacy in the ‘car’ scene. We expect the outcome to be applied as the pre-processing stage of voice-based online identity verification.
    Reference: [1] ESC Dataset https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YDEPUT
    [2] UrbanSound8K
    https://urbansounddataset.weebly.com/urbansound8k.html
    [3] DCASE Challenge
    http://dcase.community/
    [4] Liao, Wen-Hung, Jin-Yao Wen, and Jen-Ho Kuo. "Streaming audio classification in smart home environments." The First Asian Conference on Pattern Recognition. IEEE, 2011.
    [5] Nordby, Jon Opedal. Environmental sound classification on microcontrollers using Convolutional Neural Networks. MS thesis. Norwegian University of Life Sciences, Ås, 2019.
    [6] Wu, Yuzhong, and Tan Lee. "Enhancing sound texture in CNN-based acoustic scene classification." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.
    [7] Salamon, Justin, and Juan Pablo Bello. "Deep convolutional neural networks and data augmentation for environmental sound classification." IEEE Signal Processing Letters 24.3 (2017): 279-283.
    [8] Dai Wei, Juncheng Li, et al. "Acoustic scene recognition with deep neural networks (DCASE challenge 2016)." Robert Bosch Research and Technology Center 3 (2016).
    [9] Hussain, Khalid, Mazhar Hussain, and Muhammad Gufran Khan. "An Improved Acoustic Scene Classification Method Using Convolutional Neural Networks (CNNs)." American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS) 44.1 (2018): 68-76.
    [10] Han, Yoonchang, and Kyogu Lee. "Acoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation." arXiv preprint arXiv:1607.02383 (2016).
    [11] Kim, Jaehun, and Kyogu Lee. "Empirical study on ensemble method of deep neural networks for acoustic scene classification." Proc. of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016).
    [12] Santoso, Andri, Chien-Yao Wang, and Jia-Ching Wang. Acoustic scene classification using network-in-network based convolutional neural network. DCASE2016 Challenge, Tech. Rep, 2016.
    [13] Becker, Sören, et al. "Interpreting and explaining deep neural networks for classification of audio signals." arXiv preprint arXiv:1807.03418 (2018).
    [14] Keren, Gil, and Björn Schuller. "Convolutional RNN: an enhanced model for extracting features from sequential data." 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016.
    [15] CH.Tseng,初探卷積神經網路
    https://chtseng.wordpress.com/2017/09/12/%E5%88%9D%E6%8E%A2%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF/
    [16] Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
    [17] Y. LeCun, Y. Bengio, G. Hinton, L. Y., B. Y., and H. G., “Deep learning,” Nature, vol. 521,no. 7553, pp. 436–444, 2015.
    [18] NVIDIA DIGITS
    https://developer.nvidia.com/digits
    [19] Keras
    https://keras.io/
    [20] François Chollet,Deep learning 深度學習必讀:Keras 大神帶你用 Python 實作,旗標,ISBN:9789863125501,2019
    [21] 郭秋田等,多媒體導論與應用第三版,旗標,ISBN:9574426246,2008。
    [22] 丁建均,時頻分析近年來的發展
    http://www.ancad.com.tw/Training/ppt_download/%E4%B8%81%E5%BB%BA%E5%9D%87%E6%95%99%E6%8E%880628.pdf
    [23] Pu Sun, “Comparison of STFT and Wavelet Transform in Timefrequency Analysis”,2014.
    [24] Solovyev, Roman A., et al. "Deep Learning Approaches for Understanding Simple Speech Commands." arXiv preprint arXiv:1810.02364 (2018).
    [25] Librosa
    https://librosa.github.io/librosa/feature.html
    [26] Pydub, AudioSegment
    https://github.com/jiaaro/pydub
    [27] Sklearn.preprocessing.StandardScaler
    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
    [28] description of acoustic scene classes in TUT Acoustic scenes 2016 dataset.
    http://www.cs.tut.fi/sgn/arg/dcase2016/acoustic-scenes
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    105971016
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0105971016
    Data Type: thesis
    DOI: 10.6814/NCCU202001422
    Appears in Collections:[資訊科學系碩士在職專班] 學位論文

    Files in This Item:

    File Description SizeFormat
    101601.pdf4986KbAdobe PDF2393View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback