Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/58981
|
Title: | 中英文語句語意推論 Textual Entailment Recognition for Chinese and English |
Authors: | 黃瑋杰 Huang, Wei Jie |
Contributors: | 劉昭麟 Liu, Chao Lin 黃瑋杰 Huang, Wei Jie |
Keywords: | 語句推論 近義詞判定 經驗法則 機器學習 Entailment Recognition Near Synonym Recognition Heuristic Functions Machine Learning |
Date: | 2012 |
Issue Date: | 2013-07-23 13:20:37 (UTC+8) |
Abstract: | 語句的推論在自然語言處理相關領域的研究,如資訊檢索、資料擷取、自動摘要或智慧型教學等,已經日趨重要。自2005年Recognizing Textual Entailment (RTE)競賽開始,此議題逐漸受到重視,而Recognizing Inference in Text (RITE-1)競賽亦開始針對中文語句推論的研究議題提供評估的平台。本研究中我們建構一個根據文本分析設計各種函式計算推論關係的模型,並提出一套基於廣義知網的詞彙語意相似度計算方法,加強推論模型對句子語意的理解能力,進而提升推論效果;此外根據過去機器學習的作法,依照上述的函式抽取詞彙語意、語法結構、POS標記、詞彙覆蓋比例與詞彙依賴關係等特徵,採用多種演算法訓練分類模型判斷推論關係。實驗結果顯示我們的兩種系統在中文語句推論關係有不錯的效能,並在NTCIR-10 RITE-2競賽中獲得第二名的佳績,同時對機器學習分類模型效能的分析也指出中英文語料於判斷推論關係時不同的特性與較有效果的特徵集。此外我們透過閱讀測驗的實驗評估,瞭解推論系統於實際應用問題的效能,並指出未來我們可以推論系統為基底,發展閱讀測驗相關的智慧型教學系統,輔助學生閱讀理解的能力與教師在閱讀測驗編輯的品質。 Recognizing Inference in Text (RITE) has become a serious issue in several research areas, such as Information Retrieval (IR), Information Extraction (IE), Automatic Summarization, or Intelligent Tutoring Systems (ITS). The research topic is getting more important since the First Recognizing Textual Entailment Challenge (RTE-1) was held in 2005. For Asian languages, Recognizing Inference in Text (RITE-1) provides evaluation standards on recognizing entailment systems. In this research, we built a system based on textual analysis and construct several heuristic functions to compute entailment in text. Besides, we proposed a method to measure the similarity between two Chinese words based on E-HowNet and used it to enhance the system’s performance. Moreover, machine learning techniques, such as SVM, J48 and Linear Regression are used to train classification models. We extracted features based on heuristic functions and other syntactic features. The experimental results indicated that our systems achieved great performances and received second places in NTCIR-10 RITE-2. The analysis of machine learning approaches also showed Chinese and English shared different linguistic characteristics and effective features on recognizing textual entailments. Besides, the experimental results of reading comprehensions showed that we can develop intelligent tutoring system based on this research. The intelligent tutoring system is able to enhance students the ability of reading understandings and help on generating quality reading tests. |
Reference: | [1] 知網(HowNet),http://www.keenage.com/ [2] 重編國語辭典修訂版,http://dict.revised.moe.edu.tw/ [3] 劉群、李素建,“基於《知網》的辭彙語義相似度計算”,中文計算語言學期刊,7(2),頁59-76,2002。 [4] 廣義知網知識本體架構線上瀏覽系統(Extended-HowNet) ,http://ehownet.iis.sinica.edu.tw/ [5] Alexander Budanitsky and Graeme Hirst, “Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures”, Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics, 2001. [6] Andrew Hickl, Jeremy Bensley, John Williams, Kirk Roberts, Bryan Rink and Ying Shi, “Recognizing Textual Entailment with LCC’s GROUNDHOG System”, Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, pp. 80-85, 2006. [7] Cheng-Wei Shih, Cheng-Wei Lee, Ting-Hao Yang and Wen-Lian Hsu. “IASL RITE System at NTCIR-9”, Proceedings of NTCIR-9 Workshop Meeting, pp. 379-385, 2011. [8] Chih-Wei Hsu, Chih-Chung Chang and Chih Jen Lin, A Practical Guide to Support Vector Classification. Retrieved from website: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, 2010. [9] Chinese Knowledge Information Processing Group (CKIP), E-HowNet Technical Report. Retrieved from CKIP website: http://rocling.iis.sinica.edu.tw/CKIP/paper/Technical_Reprt_E-HowNet.pdf, 2009. [10] Chuan-Jie Lin and Bo-Yu Hsiao, “The Description of the NTOU RITE System in NTCIR-9”, Proceedings of NTCIR-9 Workshop Meeting, pp. 353-356, 2011. [11] CKIP Chinese Segmenter, http://ckipsvr.iis.sinica.edu.tw/ [12] Hideki Shima, Hiroshi Kanayama, Cheng-Wei Lee, Chuan-Jie Lin, Teruko Mitamura, Yusuke Miyao, Shuming Shi and Koichi Takeda, “Overview of NTCIR-9 RITE: Recognizing Inference in TExt”, Proceedings of NTCIR-9 Workshop Meeting, pp. 291-301, 2011. [13] Ido Dagon, Oren Glickman and Bernardo Magnini, “The PASCAL Recognising Textual Entailment Challenge”, Machine Learning Challenges. Lecture Notes in Computer Science, 3944, pp. 177-190, Springer, 2006. [14] Jianfeng Gao, Mu Li, Andi Wu and Chang-Ning Huang, “Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach”, Computational Linguistics, 31(4), 2005. [15] Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu, “BLEU: a Method for Automatic Evaluation of Machine Translation”, Proceedings of the Fortieth Annual Meeting on ACL, pp. 311-318, 2002. [16] Liang Zhou, Chin-Yew Lin and Eduard Hovy, “Re-evaluating Machine Translation Results with Paraphrase Support”, Proceedings of the Conference on EMNLP, pp. 77-84, 2006. [17] LibSVM – A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ [18] Ling Cao, Xipeng Qiu and Xuanjing Huang, “FudanNLP at RITE 2011: a Shallow Semantic Approach to Textual Entailment“, Proceedings of NTCIR-9 Workshop Meeting, pp. 335-338, 2011. [19] LingPipe. http://alias-i.com/lingpipe/ [20] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, “The WEKA Data Mining Software: An Update”, SIGKDD Explorations, 11(1), 2009. [21] Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi and Dan Moldovan, “COGEX at the Second Recognizing Textual Entailment Challenge”. Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, pp. 104-109, 2006. [22] Min-Yuh Day, Re-Yuan Lee, Cheng-Tai Liu, Chun Tu, Chin-Sheng Tseng, Loong Tern Yap, Allen-Green C.L. Huang, Yu-Hsuan Chiu and Wei-Ze Hong, “IMTKU Textual Entailment System for Recognizing Inference in Text at NTCIR-9 RITE”, Proceedings of NTCIR-9 Workshop Meeting, pp. 339-344, 2011. [23] Rod Adams, “Textual Entailment Through Extended Lexical Overlap”, Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, pp. 128-133, 2006. [24] Shih-Hung Wu, Wan-Chi Huang, Liang-Pu Chen and Tsun Ku, “Binary-class and Multi-class Chinese Textural Entailment System Description in NTCIR-9 RITE”, Proceedings of NTCIR-9 Workshop Meeting, pp. 422-426, 2011. [25] Stanford Dependencies, http://nlp.stanford.edu/software/stanford-dependencies.shtml [26] Stanford Named Entity Recognizer, http://www-nlp.stanford.edu/software/CRF-NER.shtml [27] Stanford Parser, http://nlp.stanford.edu/software/lex-parser.shtml [28] Stanford Tokenizer, http://nlp.stanford.edu/software/tokenizer.shtml [29] Stanford Word Segmenter, http://nlp.stanford.edu/software/segmenter.shtml [30] The Stanford Natural Language Processing Group, Stanford typed dependencies manual. Retrieved from website: http://nlp.stanford.edu/software/dependencies_manual.pdf, 2012 [31] WordNet, http://wordnet.princeton.edu/ [32] YAGO-NAGA Javatools, http://www.mpi-inf.mpg.de/yago-naga/javatools/ [33] Yaoyun Zhang, Jun Xu, Chenlong Liu, Xiaolong Wang, Ruifeng Xu, Qingcai Chen, Xuan Wang, Yongshuai Hou and Buzhou Tang, “ICRC_HITSZ at RITE: Leveraging Multiple Classifiers Voting for Textual Entailment Recognition”, Proceedings of NTCIR-9 Workshop Meeting, pp. 325-329, 2011. [34] Yotaro Watanabe, Yusuke Miyao, Junta Mizuno, Tomohide Shibata, Hiroshi Kanayama, Cheng-Wei Lee, Chuan-Jie Lin, Shuming Shi, Teruko Mitamura, Noriko Kando, Hideki Shima and Kohichi Takeda, “Overview of the Recognizing Inference in Text (RITE-2) at NTCIR-10”, Proceedings of the Tenth NTCIR Conference, 2013. |
Description: | 碩士 國立政治大學 資訊科學學系 100753014 101 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0100753014 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
301401.pdf | 8193Kb | Adobe PDF2 | 1001 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|