English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113656/144643 (79%)
Visitors : 51732480      Online Users : 651
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/115435


    Title: 從文言文到白話文:由生態變遷探討語言變化
    Authors: 余清祥
    Contributors: 統計學系
    Keywords: 文字採礦;生物多樣性;《新青年》;探索性資料分析;相似指標
    Classical and Modern Chinese;New Youth Magazine;Unstructured Data;Exploratory Data Analysis;Logistic Regression
    Date: 2015
    Issue Date: 2017-12-26 17:46:39 (UTC+8)
    Abstract: 多數國人中文字彙約一萬三千字,常用字則為五千餘字,大家知道及常用的字彙差異不大,但用字方式非常不同,亦即字彙的偏好或厭惡因人而異。如果把字彙視為生物物種,將每個人的寫作風格比擬為不同生態系統,用字習慣的改變則類似於生態變遷,而物種演化及生態結構的想法可用於比較寫作風格,以及探討風格變化的軌跡。本計畫以《新青年》十一卷及傳教士報刊為素材,結合生態變遷及統計方法比較各卷的異同,觀察語言轉換歷程,尋找可以建立
    客觀判讀文言文、白話文篇章的指標。本文以卷為分析單位,多角度量化《新青年》雜誌各卷的文本結構,包括文本用字、用句、文言和白虛字使用以及常用字詞共用等方面,通過多種圖表相結合的呈現方式,窺探《新青年》雜誌語言變
    化歷程以及轉變特點。這其中既包括了對文言文到白話文轉變機制的探索,也包括白話語言演化的探索。其次,根據各卷初探的結果,尋找可區隔文言文和白話文兩種語言形式的文本特徵變數,再以《新青年》第一卷和第七卷為訓練樣本,結合主成分和羅吉斯迴歸,對文、白兩種語言形式的文章進行分類訓練,再利用第四卷進行測試。結果證實,所提取的文本變數能夠有效實現對文、白兩種語言形式的文章的區分。此外,本文亦根據前述初探結果以及人文學者經驗,探索《新青年》雜誌後期語言形式的變化,即從五四運動時期的白話文至以「紅色中文」為特徵的白話文(二戰之後中國使用的白話文)的變化。以第七卷和第十一卷為樣本進行訓練,結果證實這兩卷語言形式存在明顯區別;並加入台灣《聯合報》和中國大陸《人民日報》進行分類預測,發現兩類報刊的語言偏向有明顯差異,值得後續深入研究。
    Many scholars have studied the language revolution in early Republic. Our team also contributed by bringing in digital humanity methods to find effective ways to tell the classical Chinese from modern Chinese by computer. It was when we analyzed the data of The New Youth magazine that we noticed something rarely discussed by modern scholars: the language changed significantly again in Volume 8 of the magazine. In this study, we adapt the idea of un-supervised learning from statistical learning theory to define and identify important variables. In particular, we will use the notion of Exploratory Data Analysis (or EDA, proposed by famous statistician J.W. Tukey in 1977) to evaluate potential variables which can differentiate the language styles of Volume 7 and Volumes 8~11 in New Youth Magazine. Our study shows that the styles of articles from United
    Daily News are close to that of Volume 7 in The New Youth magazine, while those from People’s Daily have about equal probability for being classified to Volume 7 or to Volumes 8~11. This implies that the Chinese writing styles of Taiwan and mainland China are different and the writing style of mainland China is likely to be influenced by the Soviet language.
    Relation: 執行起迄:2015/08/01~2017/07/31
    104-2420-H-004-036-MY2
    Data Type: report
    Appears in Collections:[資訊科學系] 國科會研究計畫

    Files in This Item:

    File Description SizeFormat
    104-2420-H-004-036-MY2.pdf636KbAdobe PDF2287View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback