English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 118786/149850 (79%)
Visitors : 81482340      Online Users : 136
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/160082


    Title: Telling compounds and phrases apart in Vietnamese: A random forest classification
    區分越南語中的複合詞和短語:隨機森林分類法
    Authors: Ommen, Sandrien van;Torres, Catalina;Đông, Lâm Quang;Giraud, Anne-Lise;Bickel, Balthasar
    Contributors: 台灣語言學期刊
    Keywords: Phonetics;Random Forest Classification;Prosodic Hierarchy;Speech Production;Chunking;Phonology
    語音學;隨機森林分類;韻律層級;語言輸出;分塊;音韻學
    Date: 2025-10
    Issue Date: 2025-11-04 09:56:49 (UTC+8)
    Abstract: Vietnamese is an isolating language with rich productive compounding, but no morphosyntactic, phonotactic or phonological evidence to assume a linguistic level between the syllable and the phrase (Schiering et al. 2010). We model an artificial listener with a Random Forest Classifier, to study the phonetic distinguishability of compounds vs. phrases, following Nguyen and Ingram (2007). This Machine Learning algorithm represents the maximal potential for a system to differentiate the two classes based on phonetics alone. It ranks the importance of each phonetic correlate to the differentiation of these classes. This allows an interpretation beyond whether a difference on a particular phonetic dimension exists including how important this difference is. The results confirm that the two classes can only be phonetically separated under circumstances of maximal contrast, and that maximal contrast is realized through juncture marking. Furthermore, we show that the two classes cannot be perfectly separated even under conditions of maximal contrast and additionally that there is an across-the-board preference for a compound interpretation from the phonetic data, even when the Random Forest Classifier was trained on maximal contrast data.
    越南語是一種孤立語,具有豐富的複合詞生產能力,但沒有存在形態句法、音位佈局或音韻學的證據來假設其音節與短語之間存在一個語言層面(Schiering et al. 2010)。我們利用隨機森林分類器模擬一個人工聽眾,以研究複合詞與短語在語音上的可區分性,這項研究遵循了Nguyen和Ingram(2007)的方法。這種機器學習演算法展現了系統僅憑語音學特徵區分這兩個類別的最大潛力。此外,它還對每個語音相關因素在區分這些類別中的重要性進行了排序,使我們能夠解釋不僅包括特定語音維度上是否存在差異,還包括這種差異的重要性。結果證實,這兩個類別只能在最大對比條件下透過語音來分離,且最大對比是透過接縫標記實現的。進一步地,我們展示了即使在最大對比條件下,這兩個類別也不能完美分離。而且從語音資料中也產生了複合詞偏見,即使隨機森林分類器是基於最大對比資料進行訓練的。
    Relation: 台灣語言學期刊 (Taiwan Journal of Linguistics), 23(3), 23-48
    Data Type: article
    DOI 連結: https://doi.org/10.6519/TJL.202509_23(3).0002
    DOI: 10.6519/TJL.202509_23(3).0002
    Appears in Collections:[台灣語言學期刊 THCI Core ] 期刊論文

    Files in This Item:

    File Description SizeFormat
    23-3-2.pdf1207KbAdobe PDF1View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback