政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153362
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113303/144284 (79%)
Visitors : 50802438      Online Users : 677
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/153362


    Title: 臺灣碩博士論文之文字分析—以商業及管理學門摘要為例
    Text Analysis of Master’s and Doctoral Theses in Taiwan: A Study on Abstracts in the Field of Business and Administration
    Authors: 劉貞莉
    Liu, Chen-Li
    Contributors: 陳怡如
    余清祥

    Chen, Yi-Ju
    Yue, Ching-Syang

    劉貞莉
    Liu, Chen-Li
    Keywords: 文字分析
    中文斷詞
    探索性資料分析
    文本分類
    關聯性分析
    Text Analysis
    Word Segmentation
    Exploratory Data Analysis
    Classification
    Association Analysis
    Date: 2024
    Issue Date: 2024-09-04 14:55:57 (UTC+8)
    Abstract: 自從人類發明文字,文字一直是人類傳遞知識、故事和情感的重要工具,藉由文字分析可以探索各時期的文化及科技等發展、社會特色及變遷軌跡,並能鉅細靡遺地發掘其中的關鍵因素。摘要則是文章、書籍的縮影,通常可在摘要的文字及其內容一窺全文的關鍵,以學術論文為例,讀者應能從摘要知道文章的研究目的、結論、重要啟發等要素。本研究以107至109學年度臺灣商業及管理(簡稱商管)學門的碩博士論文摘要為研究對象,除了整理論文的用字等寫作風格外,同時也嘗試使用群集分析等工具,剖析摘要三個單元的文字風格,比較商管各學類論文的特色,協助讀者撰寫及研讀商管學門的論文。
    由於現代中文主要以白話文為主,通常以兩個字及以上組成的詞彙為基本單位,分析白話文時會先經過斷詞處理,取得更接近文意的重要詞彙。本研究將先探討兩種斷詞套件:Jieba和CKIP,從執行時間、詞彙數量、詞彙比例、詞彙種類與斷詞精確度等面向進行比較,提供使用者分析中文的參考。而摘要的文字分析主要從探索性資料分析著手,以人工標示將摘要分成「動機目的」、「方法素材」與「結論建議」三個單元,並根據斷詞結果的常見詞彙、字詞多樣性與共現詞叢等角度,探索商管論文的十個學類之寫作風格。資料分析顯示,CKIP斷詞結果能捕捉到臺灣碩博士商管學門論文摘要的慣用詞語,整體結果較符合本研究的期望。摘要三個單元之間的特徵與格式相當明顯,商管學門的十大學類可分為三大集群:醫管、會計、以及其他學類。另外,以各集群與各單元的常見詞彙與共現詞叢作為解釋變數,代入分類模型能有效地區隔商管學門的三個集群、摘要三個單元。
    Writing has been a crucial tool for humans to exchange knowledge and express emotions. Through text analysis, we can explore the cultural and technological developments in various eras and understand social characteristics and changes. An abstract serves as the epitome of an article or book, often providing key insights of the full text. For example, readers are usually able to discern the research objectives, conclusions, and significant insights from the abstract of an academic paper. This paper studies the abstracts of master’s and doctoral theses in the field of business and administration (BA) in Taiwan between 2018 and 2020, using cluster analysis to dissect the textual styles of the three sections of the abstracts. The goal is to compare the characteristics of theses across various BA disciplines and to assist readers in writing and understanding BA academic papers.
    Modern Chinese writing typically consists of phrases (two or more words) as a basic unit, and thus word segmentation is the first step in analyzing Chinese text. We evaluate two word segmentation tools: Jieba and CKIP, and compare them in terms of execution time and segmentation accuracy to provide references for users analyzing Chinese text. For the study of textual style, we apply tools in exploratory data analysis and examine common terms, word diversity, and co-occurrence terms in abstracts, based on the word segmentation results. Note that the abstracts can be divided into three sections: Motivations & Purposes, Methods & Materials, and Conclusions & Suggestions. The analysis results show that the CKIP tool can capture the commonly used terms in master’s and doctoral thesis abstracts in Taiwan, aligning better with the expectations of this study. Additionally, by using the common terms and co-occurrence terms as explanatory variables in classification models, we can effectively distinguish between the three clusters of BA disciplines and the three sections of the abstracts.
    Reference: 一、中文文獻
    [1] eyck (2018)。 [XD] 中文很煩。批踢踢實業坊。https://www.ptt.cc/bbs/joke/M.1528192353.A.0A8.html
    [2] National Digital Archives Program (2003)。中文斷詞系統。https://ckipsvr.iis.sinica.edu.tw
    [3] 何立行、余清祥、鄭文惠 (2014)。「從文言到白話:《新青年》雜誌語言變化統計研究」。《東亞觀念史集刊》,7,427-454。
    [4] 余清祥 (1998)。「統計在紅樓夢的應用」。《政大學報》,76,303-327。
    [5] 余清祥、葉昱廷 (2020)。「以文字探勘技術分析臺灣四大報文字風格」。《數位典藏與數位人文》,6,67-94。
    [6] 婚嫁 (2018)。「想過過過兒過過的生活是什麼梗 逼死外國人系列啊」。壹讀。https://read01.com/0e3ynKE.html
    [7] 宋子軒、冷燮、陳瑤瑤 (2012)。「概率抽樣條件下樣本代表性事後評估方法探討」。《統計研究》,29(7),96-100。
    [8] 李宏毅 [Hung-yi Lee] (2019)。ELMO, BERT, GPT [Video]. YouTube. https://youtu.be/UYPa347-DdE?si=WFueLnLv8XDKuUF6

    二、英文文獻
    [1] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-shot Learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
    [2] Chang, P. C., Galley, M., & Manning, C. D. (2008). Optimizing Chinese Word Segmentation for Machine Translation Performance. In Proceedings of the Third Workshop on Statistical Machine Translation, 224-232.
    [3] Chen, X., Qiu, X., Zhu, C., & Liu, P. (2015). Long Short-term Memory Neural Networks for Chinese Word Segmentation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1385-1390.
    [4] Church, K. W. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing, 136-143.
    [5] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186.
    [6] Efron, B., & Thisted, R. (1976). Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know? Biometrika, 63(3), 435-447.
    [7] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-term Memory. Neural Computation, 9(8), 1735-1780.
    [8] LaPlaca, P., Lindgreen, A., & Vanhammed, J. (2018). How to Write Really Good Articles for Premier Academic Journals. Industrial Marketing Management, 68, 202-209.
    [9] Li, P. H., Fu, T. J., & Ma, W. Y. (2020). Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(5), 8236-8244.
    [10] Lin, Q. X., Chang, C. H., & Chen, C. L. (2010). A Simple and Effective Closed Test for Chinese Word Segmentation Based on Sequence Labeling. Computational Linguistics and Chinese Language Processing, 15(3-4), 161-180.
    [11] Low, J. K., Ng, H. T., & Guo, W. (2005). A Maximum Entropy Approach to Chinese Word Segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, 161-164.
    [12] Ma, J., & Hinrichs, E. (2015). Accurate Linear-time Chinese Word Segmentation via Embedding Matching. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 1735-1745.
    [13] Ma, J., Ganchev, K., & Weiss, D. (2018). State-of-the-art Chinese Word Segmentation with Bi-LSTMs. arXiv preprint arXiv:1808.06511.
    [14] Ma, W. Y., & Chen, K. J. (2003). A Bottom-up Merging Algorithm for Chinese Unknown Word Extraction. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 31-38.
    [15] Mosteller, F., & Wallace, D. L. (1984). Applied Bayesian and Classical Inference. Springer Series in Statistics.
    [16] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Blog, 1(8), 9.
    [17] Ríos-Toledo, G., Posadas-Durán, J. P. F., Sidorov, G., & Castro-Sánchez, N. A. (2022). Detection of Changes in Literary Writing Style Using N-grams as Style Markers and Supervised Machine Learning. Plos One, 17(7), e0267590.
    [18] Salton, G., & Buckley, C. (1988). Term-weighting Approaches in Automatic Text Retrieval. Information Processing & Management, 24(5), 513-523.
    [19] Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27(3), 379-423.
    [20] Simpson, E. H. (1949). Measurement of Diversity. Nature, 163(4148), 688-688.
    [21] Thisted, R., & Efron, B. (1987). Did Shakespeare Write a Newly-Discovered Poem? Biometrika, 74(3), 445-455.
    [22] Turing, A. M. (2009). Computing Machinery and Intelligence. Springer Netherlands, 23-65.
    [23] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems (NeurIPS), 30.
    [24] Yeh, W. C., Hsieh, Y. L., Chang, Y. C., & Hsu, W. L. (2022). Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 193-199.
    [25] Yue, C. J., & Clayton, M. (2005). An Similarity Measure Based on Species Proportions. Communications in Statistics: Theory and Methods, 34, 2123-2131.
    [26] Yue, C. J., Clayton, M., & Lin, F. (2001). A Nonparametric Estimator of Species Overlap. Biometrics, 57(3), 743-749.
    Description: 碩士
    國立政治大學
    統計學系
    111354014
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0111354014
    Data Type: thesis
    Appears in Collections:[Department of Statistics] Theses

    Files in This Item:

    File Description SizeFormat
    401401.pdf15837KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback