政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/136830
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 114205/145239 (79%)
造访人次 : 52794234      在线人数 : 730
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/136830


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/136830


    题名: 運用文字探勘分析人民日報的風格變遷
    A Study of Writing Style of The People’s Daily
    作者: 陳庭偉
    Chen, Ting-Wei
    贡献者: 陳麗霞
    余清祥

    陳庭偉
    Chen, Ting-Wei
    关键词: 寫作風格
    風格變遷
    群集分析
    關鍵詞
    挑選變數
    Writing Style
    Style Change
    Cluster Analysis
    Keyword
    Variable Selection
    日期: 2021
    上传时间: 2021-09-02 15:38:09 (UTC+8)
    摘要: 大數據發展促使各類型資料的數位化,文字探勘更是當中典範,在不同領域都可看到相關應用,寫作風格是常見議題之一。然而,文章風格容易受到議題的影響,即便是同一作者或文本,文字使用可能因為時空背景等因素而產生差異。以中國共產黨機關報刊《人民日報》為例,內容及題材不僅呈現當代特色,也會顧及官方立場與目的,該報的特色變化可反映中共建國至今的政治及社會變遷。因此本文以《人民日報》的風格變化為研究目標,藉由比較各年度的遣詞用字差異,透過統計方法及分群劃分不同時期;另外,本文也運用多種關鍵詞偵測指標,篩選各時期的代表詞作為分類的解釋變數,希望能夠兼顧準確率、運算速度、解釋性。
    本文以《人民日報》1949~2019年頭版報導為研究素材,因為頭版內容大多涉及全國性及國際等重大事務,避免某些地方性事務造成用詞的異質性。本文先考量探索性資料分析,包括字、詞以及字詞的Jaccard、Yue相似指標,挖掘《人民日報》的文字基本特性;接著套用群集分析近年中國分成數個時期,再與專家的分期結果比較。研究發現:透過雙字詞更能看出各時期的差異,如果以雙字詞或相似指標進行分群,《人民日報》可分為四個時期(或可命名為「建國」、「文化革命」、「改革開放」、「現代化」),不同分群方法的分析結果相當一致,而各時期的用詞風格有明顯差異。另外,分類解釋變數的挑選以本文提出的代表詞偵測指標最佳,無論是準確率、運算速度、解釋性三者的結果,都優於卡方指標或維度縮減等方法。
    Big data enhances the quantitative analysis in all kinds of data and text mining is one of them. Identifying authors’ writing style is one popular topic of text mining. However, the writing style can be affected by, for example, the theme and language of articles. Take the People’s Daily, official newspaper of the Central Committee of the Chinese Communist Party, as an example. The Chinese Communist Party attaches great importance to the People`s Daily, and has given strong guidance to the work of the People`s Daily in all periods of revolution, construction and reform. In order words, through the text analysis of the People’s Daily, we may find the changes of political/social environment of Chinese Communist Party, and we want to know if it is possible to differentiate different periods of China (1949~2019) via text analysis of the articles in the People’s Daily.
    We first conduct exploratory data analysis, including characters, words, Jaccard and Yue’s Index. Then we use cluster analysis to divide modern China into several periods, and then compare with the results of experts` research. The research found that the differences between the periods can be more clearly seen through the two-character words. If the two-character words or similar indicators are used to cluster, the People`s Daily can be divided into four periods. Besides, we use multiple keyword indicators to select representative words in each period, and we select these representative words as explanatory variables to classify. Whether in terms of accuracy, calculation speed, or explanatory performance, it is better than chi-square indicators or dimensionality reduction methods.
    參考文獻: 一、中文文獻
    1. 王宇(2012)。「框架視野下的食品安全報導——以《人民日報》近10年的報導為例」,《現代傳播: 中國傳媒大學學報》,34(2),頁43-47。
    2. 曲青山(2021)。《中國共產黨百年輝煌》。北京市: 人民出版社。
    3. 余清祥、葉昱廷(2020)。「以文字探勘技術分析臺灣四大報文字風格」,《數位典藏與數位人文》,6,頁69-96。
    4. 於韜、王洪岩(2018)。「基於 TF-IDF 算法的文本信息提取」,《科技視界》,16,頁117-118。
    5. 林志軒(2020)。「維度縮減於文本風格之應用研究」,政治大學統計學系學位論文,頁1-51。
    6. 林晏辰(2020)。「中文關鍵詞偵測的探討」,政治大學統計學系學位論文,頁1-62。
    7. 胡適(2016)。《紅樓夢考證》。北京市: 北京出版社。
    8. 姚興山(2009)。「基於詞頻的中文文本分類研究」,《現代情報》,29(2),頁179-181。
    9. 孫曉明、馬少平(2001)。「基於寫作風格的作者識別」,《中國中文信息學會第五屆全國會員代表大會暨成立二十週年學術會議論文集》,北京:清華大學出版社。
    10. 陳鳳芝(2003)。「中西方思維差異與寫作風格對比分析」,《三峽大學學報: 人文社會科學版》,25(3),頁95-97。
    11. 夏天(2013)。「詞語位置加權TextRank的關鍵詞抽取研究」,《現代圖書情報技術》,9,頁30-34。
    12. 徐超(2017)。「《人民日報》社論詞彙統計與分析」,《采寫編》,(3),頁144-145。
    13. 張運良、朱禮軍、喬曉東、張全(2009)。「基於句類特徵的作者寫作風格分類研究」,《計算機工程與應用》,45(22),頁129-131。
    14. 黃秋林、吳本虎(2009)。「政治隱喻的歷時分析——基於《人民日報》(1978—2007) 兩會社論的研究」,《語言教學與研究》,(5),頁91-96。

    二、英文文獻
    1. Archer, J. and Jockers, M.L. (2016). The Bestseller Code, New York: St. Martin’s Press.
    2. Beliga, S. (2014). “Keyword extraction: a review of methods and approaches.” University of Rijeka, Department of Informatics, Rijeka, 1-9.
    3. Ikonomakis, M., Kotsiantis, S., and Tampakas, V. (2005). “Text classification using machine learning techniques.” WSEAS transactions on computers, 4(8), 966-974.
    4. James, G., Witten, D., Hastie, T. and Tibshirani, R. (2017). An Introduction to Statistical Learning: With Applications in R, Berlin: Springer
    5. Liu, F., Pennell, D., Liu, F., and Liu, Y. (2009). “Unsupervised approaches for automatic keyword extraction using meeting transcripts.” In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (pp. 620-628).
    6. Matsuo, Y., and Ishizuka, M. (2004). “Keyword extraction from a single document using word co-occurrence statistical information.” International Journal on Artificial Intelligence Tools, 13(01), 157-169.
    7. Puglisi, R. (2006). “Being The New York Times: Thepolitical Behaviour Of A Newspaper (No. 20).” Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    8. Pervaiz, F., Pervaiz, M., Rehman, N. A., & Saif, U. (2012). “FluBreaks: early epidemic detection from Google flu trends.” Journal of medical Internet research, 14(5), e125.
    9. Rose, S., Engel, D., Cramer, N., and Cowley, W. (2010). “Automatic keyword extraction from individual documents.” Text mining: applications and theory, 1, 1-20.
    10. King, T., “80 Percent of Your Data Will Be Unstructured in Five Years.”, Retrieved June 15, 2021, from: https://solutionsreview.com/data-management/80-percent-of-your-data-will-be-unstructured-in-five-years/
    11. Zhai, Y., Song, W., Liu, X., Liu, L., and Zhao, X. (2018). “A chi-square statistics based feature selection method in text classification.” In 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) (pp. 160-163). IEEE.
    描述: 碩士
    國立政治大學
    統計學系
    108354007
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0108354007
    数据类型: thesis
    DOI: 10.6814/NCCU202101184
    显示于类别:[統計學系] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    400701.pdf4793KbAdobe PDF20检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈