政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/132125

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 116849/147881 (79%)
Visitors : 63873080 Online Users : 555

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 學術期刊 > 圖資與檔案學刊 > 期刊論文 > Item 140.119/132125

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/132125

Title:	基於主動式學習之古漢語斷句系統發展與應用研究 Development and Application of an Ancient Chinese Sentence Segmentation System Based on Active Learning
Authors:	徐志帆 Hsu, Chih-Fan 張鐘 Chang, Chung
Contributors:	圖資與檔案學刊
Keywords:	數位人文 ; 主動學習 ; 機器學習 ; 自動化古漢語斷句 ; 人機互動 Digital humanities ; Active learning ; Machine learning ; Automatic ancient Chinese sentence segmentation ; Human-computer interaction
Date:	2019-12
Issue Date:	2020-10-07 11:54:23 (UTC+8)
Abstract:	本研究旨在開發支援數位人文研究之「基於主動式學習的古漢語文本斷句系統」，結合主動學習與機器學習演算法，透過人機合作模式降低建立自動化古漢語斷句建立模型時所需的訓練語料，並協助人文學者面對未解讀過的文獻能更有效率的進行斷句判讀作業。為了找出最合適建立「基於主動式學習的古漢語文本斷句系統」的的演算法與特徵模板，本研究設計第一個實驗採用了不同的演算法與特徵模板配合依序文本和主動學習兩種選擇文本方法所建立的斷句模型進行比較。實驗結果發現，條件隨機場（conditional random fields）與三字詞特徵模板在主動學習方法中能有效地進行學習，適合發展「主動學習斷句模式」。第二個實驗邀請人文專長領域的學者使用「基於主動式學習的古漢語文本斷句系統」進行古漢語文本的斷句判讀，以人文學者各自標註資料建立的斷句模型進行比較分析，並輔以半結構式訪談深度了解人文學者對於本研究發展之系統輔以斷句的使用感受與建議。實驗結果發現「基於主動式學習的古漢語文本斷句系統」確實能有效學習人文學者的斷句標註資料，並且模型預測能力能基於人機合作而不斷提升。最後，透過訪談結果歸納得知人文學者對於系統操作流程與介面具有正面評價，多數受訪者認為本系統的斷句預測功能在古漢語斷句上能提供有效之輔助功能。未來可考量增加命名實體模型或其他古漢語規則的特徵模板設計，以進一步提升斷句預測能力，也希冀能將發展的系統運用在人文領域教育上，發展為訓練古漢語斷句之數位人文教育平台。 This study aims to develop a sentence segmentation system of ancient Chinese texts based on active learning. It is expected that through the human-machine cooperation mode, the training corpus needed to establish a model for automated ancient Chinese sentence segmentation could be reduced and humanities researchers may work more efficiently on sentence identification of uninterpreted text. Two experiments were conducted in this study for the system development and evaluation. In the first experiment, the automatic sentence segmentation models established by applying different algorithms and feature templates to sequential text selection and active learning text selection were compared to select the most suitable algorithm and feature template to employ in establishing this system. The results show that conditional random fields combined with three-word feature template adopted in active learning could perform effective learning outcomes that would be appropriate to apply to build the active learning sentence segmentation model for ancient Chinese texts. In the second experiment, six humanities researchers were invited to use the system to conduct sentence segmentation tasks of the assigned ancient Chinese texts to evaluate the performance of the system. Sentence segmentation results produced by individual humanistic researchers using the system were compared and analyzed. Semi-structured interviews were also conducted to gather an in-depth understanding of their experience and suggestions of using the system The experimental results show that the developed ancient Chinese sentence segmentation system based on active learning could effectively learn humanities researchers sentence segmentation data and constantly improve the model prediction through human-machine cooperation. Moreover, according to the interviews, most of the humanities researchers participated in this study reported a positive experience of using the system and indicated that the sentence segmentation prediction function provided in the system could effectively assist their sentence segmentation work. The prediction of the active learning sentence segmentation model could be further improved by embedding the name entity model or applying other phonological features or POS tagging of ancient Chinese in the future study. It is also expected to develop this system into a digital humanities learning platform for ancient Chinese sentence segmentation training in the future.
Relation:	圖資與檔案學刊, 95, 117-145
Data Type:	article
DOI 連結:	https://doi.org/10.6575/JILA.201912_(95).0004
DOI:	10.6575/JILA.201912_(95).0004
Appears in Collections:	[圖資與檔案學刊] 期刊論文

Files in This Item:

File	Description	Size	Format
142.pdf		1053Kb	Adobe PDF2	357	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback