Abstract: | 本案計畫主要採跨年齡層的量化研究,大規模收集母語為泰語的學習者(3歲至18歲)學習華語的語言偏誤類別,透過良好信效度的語料且提供聲學參數的偏誤語料為基礎。偏誤類別涵蓋語音、語意及句法的偏誤數值及偏誤取代現象。前兩年計畫著重詞彙及語音的偏誤;第一年主要是先以詞彙為出發點,由詞彙往下端研究音韻底層的偏誤,其中涵蓋音節數量、單/雙音節詞、多音節詞、音節結構、子音、母音、聲調、音節韻律等偏誤類別及代換;詞彙偏誤則紀錄泰國學生的單詞(single word)、雙詞結合(word combination)、詞彙數量、語意內容與分類及詞性相關的錯誤取代。第三年則將研究主題延升為詞彙銜接的句法詞組,紀錄句法結構的偏誤,短句錯誤次序,及需要連接詞連接的複合句型。本計畫期望透過語音、語意及句法的語料對於華語教學研究相關的領域帶出更深層的瞭解認識。本申請計畫主要的研究對象來自於泰國曼谷中華國際學校以母語為泰語的非華裔學生為主,採同步收集跨年齡層大量收集的量化研究,對象涵蓋幼兒組(幼兒園3歲至6歲幼童),孩童組(小學7歲至12歲孩童),少年組(13歲至15歲初中生)及青年組(16歲至18歲高中生)。所有語料都以聲學軟體PRAAT將詞彙人工斷詞,語意詞形分類將依照中研院平衡語料庫的詞頻而標記(Tseng, 2013)。語意類別之評量方式則依照劉惠美及陳昱君(2015)的詞彙評量標準做評斷,句法類別則依據Lin(2009)設計的句法評量為指標。本計畫擬探討深研的主題共分為五類,涵蓋如下:(1)台灣華語語音/音韻的偏誤類別;語音偏誤涵蓋韻律結構、聲調與音節數量的偏誤關連、聲調偏誤、子音偏誤、母音偏誤、子音與母音偏誤關係等。(2)台灣華語詞彙的偏誤類別;詞彙偏誤涵蓋詞性分類、詞性偏誤排序及使用頻率、詞彙之間語意關連性等;(3)台灣華語句型的偏誤類別,其中涵蓋詞序類別、平均語句長度、簡易句型互置、包接子句、甚至複雜句型如關係子句、連動句構、兼語句構及主詞/受詞的補語子句;(4)台灣華語外語學習與母語習得的偏誤類別比較;(5)語言學理論及第二外語理論之應用關連。 在本計畫中,計畫主持人主要的工作為收集語料、詞彙與語音分析及處理PRAAT的相關事項,共同主持人嘉義大學郭怡君教授負責詞彙與句法分析解讀,協同主持人政治大學資科系劉昭麟教授參與網路平台電腦應用程式、R-語言及其他統計分析等。 The aim of this project is to provide a detailed analysis of error patterns drawn from a cross-sectional observational study by Thai speakers (aged 3 to 18; four groups) learning Taiwan Mandarin as their foreign language at Thai-Chinese International School in Bangkok, Thailand. The novelty of this work is to collect and analyze large-scale errors through a highly reliable corpus and to provide PRAAT acoustic parameters for the assessed error distribution by looking at the frequency and the various patterns involving linguistic units. These units involved in the errors are classified by phonological, lexical and syntactic structure, starting from segmenting word shapes into monosyllabic, disyllabic or multi-syllabic, based on the number of syllables. The next step is to look into the lower level that composes the consonants, the vowels, the consonant-vowel interaction, the tones as well as the higher rank of relationship involving syllable and prosodic levels (i.e., tones) to see whether Thai speakers will show a segmental-prosodic relationship such as the rhythmic effect when they learn a different tone language. The units involved in lexical errors include parts of speech, content/function words, single words, word combinations, and lexical-semantic relationships (i.e., semantic features, semantically-related associates, or general taxonomies of semantic relatedness). The units involved in syntactic errors include simple sentence structure (noun phrases, verb phrases, questions and negations) as well as complexity of syntactic frames, which contain embedded sentences, compound sentences, serial verb construction, pivotal construction, subject-verb sequences, verb-object sequences, subject-verb-object sequences and among many other conjunction structures. The word segmentation, tagging system and the evaluation processes are based on the following studies in Tseng (2013), Liu & Chen (2015) and Lin (2008), respectively. A pilot study has been conducted, and the result shows that some phonological errors can be influenced by learners' own creative manipulations regardless of their first language background whereas syntactic errors, especially in word order, so far have entirely honored the learners' syntactic knowledge of their first language in the way which reveals their current knowledge and abilities. In this research project, the PI and the research team will mainly collect the data, set the PRAAT package, analyze the error patterns by segmenting the linguistic components in the dataset and will cover all the phonological errors and segmental-prosodic units. The Co-PI will help analyze the lexical-semantic errors as well as syntactic errors. The associate investigator will work on the computer techniques and running statistic programs. The PI and the entire researchers are hoping to add a growing body of knowledge and understanding from the corpus involving frequency of error units, error patterns and error distribution made by the Thai speakers, and the findings will be essential for research in language teaching, linguistic research or even clinical domain. |