政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/140442
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113324/144300 (79%)
造访人次 : 51151630      在线人数 : 868
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/140442


    题名: 臺灣客語語料庫建置與客語詞彙使用初探
    The Construction of Taiwan Hakka Corpus and Preliminary Analysis of Hakka Lexical Usage
    作者: 賴惠玲;葉秋杏;劉吉軒
    Lai, Huei-ling;Yeh, Chiou-shing;Liu, Jyi-shane
    贡献者: 英文系
    关键词: 臺灣客語語料庫;語料庫建構;瀕危語言;高頻詞;齊夫定律
    Taiwan Hakka Corpus;Corpus construction;Endangered language;High-frequency words;Zipf`s law
    日期: 2021-10
    上传时间: 2022-06-28 13:55:26 (UTC+8)
    摘要: 本文旨在介紹建置中的「臺灣客語語料庫」,其重要性在於其為臺灣第一個書面語料與口語語料兼具的帶標記客語語料庫,以系統化方式收錄臺灣客語六腔語料。為克服於建構過程面臨到之眾多挑戰,本語料庫制訂符合客語真實語言表現之相關規範,解決客語用字及難字輸入問題,介面一律中文化,並獨立開發專屬客語的檢索與斷詞系統。後以高頻詞為引,藉由探索臺灣客語語料庫、中央研究院現代漢語平衡語料庫(臺灣華語)、美國當代英語語料庫(美式英語)前300名高頻詞的詞頻排序結果,檢視此三自然語言是否皆符合齊夫定律,接續則更進一步著重探究臺灣客語與臺灣華語的前十大高頻詞比較,展示語料庫研究具量化數據統計與質性文本分析集於一體之應用實證特性。
    This paper aims to address the procedural implications of Taiwan Hakka Corpus under construction. With both written and spoken varieties of Taiwan Hakka language collected in a systematic manner, Taiwan Hakka Corpus is the first part-of-speech-tagged corpus in Taiwan. While confronting various challenges, Taiwan Hakka Corpus manifests its distinctive insignias by formulating standards based on the authentic language performance of Hakka, as well as by tackling the issues of the inputs of Hakka (rare-used) characters. In addition, concordance and segmentation system is developed exclusively for Taiwan Hakka language, with its interface in all Chinese, facilitating users to access the corpus. The distribution of top 300 words in three corpora is subsequently compared and contrasted, examining whether Zipf`s law for word frequencies is observed in the three languages (Taiwan Hakka in Taiwan Hakka Corpus; Taiwan Mandarin in Academia Sinica Balanced Corpus of Modern Chinese [Sinica Corpus]; American English in Corpus of Contemporary American English [COCA]). The result exemplifies an empirical quantitative and qualitative experiment made possible for Taiwan Hakka language, thanks to the construction of this corpus.
    關聯: 數位典藏與數位人文, 8, 75-131
    数据类型: article
    DOI 連結: https://doi.org/10.6853/DADH.202110_(8).0003
    DOI: 10.6853/DADH.202110_(8).0003
    显示于类别:[英國語文學系] 期刊論文
    [資訊科學系] 期刊論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    13.pdf12627KbAdobe PDF2253检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈