English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113311/144292 (79%)
Visitors : 50918305      Online Users : 866
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/116837


    Title: 從文言到白話:《新青年》雜誌語言變化統計研究
    From Classical Chinese to Modern Chinese: A Study of Function Words from Xin Qing Nian
    Authors: 何立行
    Ho, Li-Hsing
    余清祥
    Yue, Ching-Syang
    鄭文惠
    Cheng, Wen-Huei
    Keywords: 文體分析;五四運動;《新青年》;虛字分析;生物多樣性
    Stylistic Analysis;May 4^th Movement;La Jeunesse;Function Words Analysis;Species Diversity
    Date: 2014-12
    Issue Date: 2018-04-10 18:04:56 (UTC+8)
    Abstract: 現代漢語與古代漢語最重要的區別之一,乃是書面語以語體文為主,又稱為白話文,與古代的文言文相對。目前學者考察現代白話的發展,多推前至晚清傳教士所辦報刊,但使白話有效取代文言成為主流書面語的是五四運動,而五四運動時期倡導白話文最力的莫過於《新青年》雜誌。過去學者主要以文本分析為研究方法,就理論建樹、創作實踐和議論宣傳等方面,探討《新青年》雜誌對白話文通行和白話文學發展的貢獻。然而,漢語書面語由以文言為主變為白話當家,從文人學者在《新青年》一類的刊物上提出主張,到真正在社會上普及,歷經了多長的時間?轉變的過程為何?白話什麼時候取代了文言?如何證明?這些問題恐怕難以用傳統的研究方式來回答,因為,再勤奮的研究者也無法以人力按時序遍讀五四前後現存的巨量文獻,一一區分文白計算消長。但我們能否藉助數位研究方法,另闢蹊徑,尋找答案?或許,從建立客觀(而非直覺)判讀文、白篇章的有效工具開始,是一個值得探索的方向。本文以《新青年》全文共十一卷為素材,透過統計方法比較各卷的異同,觀察語言轉換歷程,尋找可以建立客觀判讀文、白篇章的指標。使用的方法大致分為兩類:監督學習(supervised learning)、非監督學習(un-supervised learning)。第一類先設定比較用的指標(或是變數、關鍵詞),再分析各卷的指標特性;第二類不預設比較標的,以不同角度探討文章風格,藉以找出區隔文、白篇章的關鍵因素。本研究的監督學習選用文、白的特定虛字,選擇虛字而非實詞作為統計對象,乃是為了將文章內容對語言形式的影響降至最低,驗證從慣用虛字區別文、白篇章的可行性。非監督學習的分析角度以用字、句子架構為主要方向,因為字彙多寡、使用頻率等統計數據,在比較文學中歷來都用以判斷寫作風格。無論監督學習式的虛字分析或非監督學習式的用字習慣分析,都能反映出《新青年》初期與晚期文體的變化。就發展客觀判讀工具而言,以虛字為指標也許較具潛力。值得注意的是,在總字數、不同字數、每句字數等的比較中,我們發現文言與白話有著明顯差異:文言篇章總字數少而用字多,白話篇章則是總字數多而用字少。明顯可看出白話文主要俾利於世俗啟蒙,因而總字數多而用字少;此外,我們或可借用生物多樣性的概念,追問文言、白話兩者內部生態系的差異;並進一步思考,在這樣的差異下,除了虛字、字彙總數及其使用比例之外,還有哪些具有成為客觀區辨指標潛力的語言表徵,值得我們繼續開發。
    Is it possible for computers to tell whether a text was written in classical Chinese or vernacular modern Chinese? Can the new developments of digital humanities help find out the transformation of written Chinese language during the late Qing and early Republic? As previous scholars have pointed out, in the early stage of the history of modern Chinese, missionaries and reformists only used vernacular language as a tool to enlighten the public. Classical Chinese remained the standard written language until May Forth Movement in 1919, when Xin Qing Nian became the most influential publication. Throughout the last century, scholars have scrutinized the theoretical arguments and creative writing practices in Xin Qing Nian and several other progressive magazines to delineate the changing history of the language. But questions such as how long did it take for literati as well as the general public to adopt the vernacular language as the written standard, or how did the new standard spread from radical revolutionary magazines to other publications like entertainment magazines or newspapers, remain unanswered. If we can teach computers to distinguish between classical and modern Chinese, it would be possible to bring in much more digitized texts in that period to study and to answer those questions. To achieve this goal, we adopt the concept of "genome mapping" to differentiate between classical and modern Chinese in this study. We propose two approaches, supervised learning and un-supervised learning, to compare the differences in writing style between classical Chinese and modern Chinese. In addition to concepts and methods used in a lexical analysis, we also adapt the ideas in ecology. Supervised learning has long been used in linguistics to differentiate authorship via keywords. We choose ten function words for classical and modern Chinese each as the keywords, and we use Gini`s index of volumes 1 and 11 from Xin Qing Nian to demonstrate the comparison. There are no standard operating procedures for applying the unsupervised learning, and it is the main reason why this type of approaches is difficult to implement. In this study, we choose the diversity indices for un-supervising learning, for example, Gini`s index, entropy, and Simpson`s index, for measuring the statistical dispersion and evenness (or equality) of the words used. Based on our analyses, it seems that the later volumes (such as Volume 11) have lower species diversity, indicating that people can read articles without recognizing many words, which matches to the purpose of the May 4^(th) Movement.
    Relation: 東亞觀念史集刊, 7, 427-454
    Data Type: article
    Appears in Collections:[東亞觀念史集刊] 期刊論文

    Files in This Item:

    File Description SizeFormat
    7-11.pdf1817KbAdobe PDF2382View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback