English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 112879/143845 (78%)
Visitors : 49976613      Online Users : 466
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/141544
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/141544


    Title: 從文字探勘比較臺灣與中國之寫作風格—以《聯合報》與《人民日報》為例
    Comparing the Writing Style of Newspaper between Taiwan and China
    Authors: 吳蒨芸
    Wu, Qian Yun
    Contributors: 陳麗霞
    余清祥

    吳蒨芸
    Wu, Qian Yun
    Keywords: 文字分析
    風格變遷
    探索性資料分析
    關聯指標
    兩岸差異
    Text mining
    Style change
    Exploratory data analysis
    Related index
    Areal differentiation
    Date: 2022
    Issue Date: 2022-09-02 14:45:06 (UTC+8)
    Abstract: 臺灣與中國同為華人社會,中文也是兩岸居民的共同語言,但書面寫作與口語對話使用方式卻有不小出入,這些差異似乎隨著時間逐漸增加,近年國際局勢變化更能讓人感受兩岸分治的不同。本文以臺灣《聯合報》和中國《人民日報》為研究素材,探討從第一代領導者蔣介石、毛澤東之後,兩岸文字風格的變遷及其差異,透過文字分析檢視文字使用變化及其意涵。除了藉由探索性資料分析(Exploratory Data Analysis,EDA)比較字詞的多樣性及不均度,本文也提出衡量雙字詞及多字詞間的方法,用以測量字詞間的關聯性,從中粹取近年台灣與中國思想的代表觀念,比較兩者有何重大變化與差異。本文使用《人民日報》1946年~2020年頭版報導、《聯合報》1960年~2020年社論,這些文章大多偏向於國際關係、國家層級等大事,較少著眼於地方性事務及社會新聞。
    分析發現《人民日報》、《聯合報》在字詞多樣性有明顯差異,《人民日報》的字詞多樣性在1960年文革時期最低,之後逐年遞增至1980年代後期,繼而隨時間遞減;《聯合報》的字詞多樣性在1990年之前遞降,之後大幅提高。以出現頻率最高的前500個雙字詞作為解釋變數,代入分群模型,可將臺灣、中國分為四個年代。將本文提出的雙字詞關聯分析,發現「同一句內後一個雙字詞」找到的詞組效果較好,而且與歷史發展頗為契合。像是《人民日報》「中國」作為先行詞,四個年代可找出關聯最強的詞組為「人民、大使、特色、特色」,這與中共建國初期強調「中國→人民」民族主義的概念,之後為了進入國際舞台(如:聯合國)而出現「中國→大使」,經濟開放後則為強調「中國→特色」,向全世界推銷「中國特色社會主義」。這樣的詞與詞之間的關係,可用於描述一個概念或議題,未來可與人文學者合作,藉由詞叢關聯找出文章特色與內容大義。
    Taiwan and China are both Chinese societies, and Chinese is a common language for residents on both sides of the Taiwan Strait. However, there are considerable differences in the way written and spoken language are used, and these differences seem to have increased over time. This paper uses the Taiwanese newspaper United Daily News and the Chinese newspaper People`s Daily News as research materials to explore the changes in writing styles and the differences between the two sides of the Taiwan Strait since the first leaders Kai-shek Chiang and Zedong Mao, and to examine the changes in the use of words and their meanings through textual analysis. In addition to comparing the diversity and unevenness of words through Exploratory Data Analysis (EDA), this paper also proposes a method to measure the association between two-character words and multi- character words, from which representative concepts of Taiwanese and Chinese thought in recent years are extracted to compare the significant changes and differences between them. This paper uses headlines in People`s Daily from 1946 to 2020 and editorials in United Daily News from 1960 to 2020, which mostly focus on international relations and national-level events, but less on local affairs and social news.
    The word diversity of People`s Daily and United Daily News is significantly different. The word diversity of People`s Daily was the lowest in 1960 during the Cultural Revolution, and then increased year by year to the late 1980s, and then decreased over time; the word diversity of United Daily News decreased until 1990, and then increased significantly. The top 500 most frequently occurring two-character words are used as explanatory variables, and by substituting them into the clustering model, Taiwan and China can be divided into four eras. In the analysis of the two-character word association proposed in this paper, it was found that the phrase "the last two-character word in the same sentence" was found to be more effective and fit well with the historical development. For example, in the People`s Daily, "China" is used as the first word, and the strongest word group can be found in the four eras: "people, ambassador, characteristic, characteristic", which is similar to the concept of "China→people" nationalism in the early years of the Chinese Communist Party, and "China→ambassador" in order to enter the international arena (e.g., the United Nations). After the opening of the economy, "socialism with Chinese characteristics" was marketed to the world to emphasize "Chinese characteristics. Such a relationship between words can be used to describe a concept or an issue, and in the future, we can work with humanities scholars to identify the characteristics and content of articles through word clusters.
    Reference: 一、中文文獻
    1.何立行、余清祥、鄭文惠(2014),「從文言到白話:《新青年》雜誌語言變化統計研究」,《東亞觀念史集刊》,7,頁427-454。
    2.余清祥(1998),「統計在紅樓夢的應用」,《政大學報》,76,頁303-327。
    3.余清祥、葉昱廷(2020),「以文字探勘技術分析臺灣四大報文字風格」,《數位典藏與數位人文》,第六卷。
    4.林志軒(2020)。「維度縮減於文本風格之應用研究」,政治大學統計學系學位論文。
    5.林晏辰(2020)。「中文關鍵詞偵測的探討」,政治大學統計學系學位論文。
    6.金觀濤(2011),「數位人文研究的理論基礎」,收錄於《數位人文研究的新視野:基礎與想像》,項潔主編,頁45-61,臺灣大學。
    7.洪嘉馡、黃居仁、馬偉雲、中央研究院語言學研究所、中央研究院資訊科學研究所 (2008) ,「語料庫為本的兩岸對應詞彙發掘」,《語言暨語言學》,9(2), 頁221-238。
    8.夏天(2013)。「詞語位置加權TextRank的關鍵詞抽取研究」,《現代圖書情報技術》,9,頁30-34。
    9.徐超(2017)。「《人民日報》社論詞彙統計與分析」,《采寫編》,(3),頁144-145。
    10.梁家安(2017)。「從國共內戰到改革開放: 人民日報風格變遷之量化研究」,政治大學統計學系學位論文。
    11.馮建三(2020)。「分析台灣主要報紙的兩岸新聞與言論聚焦在《聯合報》(1951-2019)」,《台灣社會研究季刊》,115,頁151-235。
    12.黃秋林、吳本虎(2009)。「政治隱喻的歷時分析——基於《人民日報》(1978—2007) 兩會社論的研究」,《語言教學與研究》,(5),頁91-96。

    二、英文文獻
    1.Archer, J. and Jockers, M.L. (2016). The Bestseller Code, New York: St. Martin’s Press.
    2.Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1.
    3.Faust, K. (1994). Social Network Analysis in the Social and Behavioral Sciences, in Social Network Analysis: Methods and Applications, Cambridge University Press.
    4.Kumar, A., Dabas, V., and Hooda, P. (2020). “Text classification algorithms for mining unstructured data: a SWOT analysis”, International Journal of Information Technology, Vol. 12, 1159–1169.
    5.Manschreck, T. C., Maher, B. A. and Ader, D. N. (1981). “Formal thought disorder, the type-token ratio and disturbed voluntary motor movement in schizophrenia”, British Journal of Psychiatry, Vol. 139, 7–15.
    6.Mihalcea R., Tarau, P. (2004). “TextRank: Bringing order into texts.”, In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Vol. 4(4), 404-411.
    7.Namugenyi, C., Nimmagadda, S.L., and Reiners, T. (2019). “Design of a SWOT Analysis Model and its Evaluation in Diverse Digital Business Ecosystem Contexts”, Procedia Computer Science, Vol. 159, 11451154.
    8.Real, R., & Vargas, J. M. (1996). “The probabilistic basis of Jaccard`s index of similarity”, Systematic Biology, Vol. 45(3), 380-385.
    9.Siddiqi, S., & Sharan, A. (2015). “Keyword and keyphrase extraction techniques: a literature review”, International Journal of Computer Applications, Vol. 109(2), 18-23.
    10.Yue, C.J., Ho, L., Pan, Y., and Cheng, W.(2016). “A Quantitative Study of Chinese Writing Style based on the New Youth Magazine”, Concepts & Context in East Asia, Vol. 5, 87-102.
    11.Yue, J.C. and Clayton, M.K. (2005). “A similarity measure based on species proportions”, Communications in Statistics-Theory and Methods, Vol. 34(11), 2123-2131.
    Description: 碩士
    國立政治大學
    統計學系
    109354001
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109354001
    Data Type: thesis
    DOI: 10.6814/NCCU202201492
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    400101.pdf4003KbAdobe PDF20View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback