政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/37120
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113318/144297 (79%)
Visitors : 51105286      Online Users : 951
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/37120


    Title: 英漢專利文書文句對列與應用
    English and Chinese Sentence Alignment for Statements in Patent Documents and its Applications
    Authors: 田侃文
    Contributors: 曾元顯
    劉昭麟

    田侃文
    Keywords: 專利說明書
    電腦輔助機器翻譯
    文句對列
    餘弦相似度
    動態規劃演算法
    Date: 2008
    Issue Date: 2009-09-19 12:11:46 (UTC+8)
    Abstract: 綜觀現今全球化的趨勢,世界各國皆進行跨語言的專利文書翻譯工作。在專利文書翻譯及跨語言檢索方面,蒐集大量且正確的專利文書平行語料能夠協助相關研究的進行。利用人工進行平行語料文句的對列工作相當費時,因此,本研究利用斷句、斷詞及英文詞幹還原等前處理技術,搭配中英技術名詞對應表,透過統計詞頻調整對應詞組的權重,並以句子間的餘弦相似度作為輔助,計算中英文句子間的相似度,最後利用動態規劃演算法挑選最佳的對列組合,發展出一套中英文句對列的系統。以精確率及召回率評比對列成效,並將對列後產生的句對作為輔助式機器翻譯系統詞序調動的訓練語料,以2003年國際數學語科學教育成就趨勢調查測驗試題作為翻譯對象,採用BLEU及NIST的評比方式進行評估。實驗結果顯示本系統不僅在1:1對列模式的精確率達到0.995,且利用門檻值篩選出的大量中英文句對,確實能夠提升輔助式機器翻譯系統的翻譯品質。
    The importance of cross-language translation of patent documents has grown substantially as a result of globalization. Accurately aligned parallel corpora help researchers conduct their research projects that depend on bilingual data to develop techniques such as computer-aided translation and cross-language information retrieval. It takes time to collect parallel data manually; therefore, an English-Chinese sentence alignment system was built that will automatically complete this process.
    A variety of preprocessing techniques for natural language processing were used, such as the stemming of the English words, to build this system. Two parts of scores were considered to align sentences. The first part considered the number and weight of aligned word pairs in the Chinese and English sentences. The second part came from a special way to compute the cosine value of the Chinese and English sentence pairs. Precision and recall rates were used to evaluate the quality of the aligned results and the 1:1 alignment achieved 0.995 precision. In addition, the aligned sentences were used as training data in a machine translation for the TIMSS test items, experimental results show that the aligned sentences are helpful for the translation system.
    Reference: [1] 中央研究院中文斷詞系統。http://ckipsvr.iis.sinica.edu.tw/。
    [2] 中央研究院現代漢語一詞泛讀系統。http://elearning.ling.sinica.edu.tw/CWordframe.html。
    [3] 牛津現代英漢雙解辭典下載頁面。http://stardict.sourceforge.net/Dictionaries_zh_TW.php。
    [4] 中華民國專利資訊檢索系統。http://twpat.tipo.gov.tw/。
    [5] 台灣光華雜誌。http://www.taiwan-panorama.com/index.php。
    [6] 自由時報中英對照讀新聞。http://www.libertytimes.com.tw/2006/new/oct/29/olds-english.htm。
    [7] 李吉峰。從專利文件自動產生擷取領域相關之正規表示法,國立清華大學資訊工程所,碩士論文,2006。
    [8] 呂明欣、劉昭麟、高照明及張俊彥。針對數學與科學教育領域之電腦輔助英中試題翻譯系統,第十九屆自然語言與語音處理研討會論文集,407–421,2007。
    [9] 吳宜榛。可專利性檢索之檢索技巧研究–以「專利工程師」為例,國立臺灣師範大學圖書資訊學研究所,碩士論文,2008。
    [10] 林士能。專利文件語意之擷取與比對,國立清華大學資訊工程所,碩士論文,2005。
    [11] 威知資訊。http://www.webgenie.com.tw/。
    [12] 科學人雜誌中英對照電子書。http://edu2.wordpedia.com/taipei_sa/。
    [13] 陳光華。超越資訊檢索的語言藩籬,大學圖書館第二卷第一期,87–99,1998。
    [14] 連穎科技。http://www.ltc.tw/。
    [15] 陳光華。資訊檢索系統的評估–NTCIR會議,國立台灣大學圖書資訊學系四十週年系慶學術研討會論文集,67–86,2001。
    [16] 國立政治大學圖書館。http://www.lib.nccu.edu.tw/。
    [17] 國立編譯館學術名詞資訊網。http://terms.nict.gov.tw/search1.php。
    [18] 張智傑及劉昭麟。以範例為基礎之英漢TIMSS試題輔助翻譯,第二十屆自然語言與語音處理研討會論文集,308–322,2008。
    [19] 曾元顯。專利文字之知識探勘:技術與挑戰,現代資訊組織與檢索研討會,111–123,2004。
    [20] 經濟部智慧財產局。http://www.tipo.gov.tw/ch/。
    [21] 遠東高中.高職英文網站 - 歷年大考試題。http://www.hsenglish.com.tw/2009/teach/resource/exam_paper.asp。
    [22] 雙語網站知識管理平台新聞。http://design.taiwannews.com.tw/demosite/2005/rdec/ver10/htm/se-learning01.htm。
    [23] 譯典通線上辭典。www.dreye.com/tw/dict/dict.phtml。
    [24] P. F. Brown, J. C. Lai and R. L. Mercer, Aligning Sentences in Parallel Corpora, Proceedings of the Twenty Ninth Annual Meeting of the Association for Computational Linguistics, 169–176, 1991.
    [25] S. F. Chen, Aligning Sentences in Bilingual Corpora Using Lexical Information, Proceedings of the Thirty First Annual Meeting of the Association for Computational Linguistics, 9–16, 1993.
    [26] T. C. Chuang and J. S. Chang, Adaptive Sentence Alignment based on Length and Lexical Information, Proceedings of the Fortieth Annual Meeting of the Association for Computational Linguistics, 91–92, 2002.
    [27] W. A. Gale and K. W. Church, A Program for Aligning Sentences in Bilingual Corpora, Proceedings of the Twenty Ninth Annual Meeting of the Association for Computational Linguistics, 177–184, 1991.
    [28] HowNet。http://www.keenage.com/。
    [29] Y. Liu, Q. Tan and K. X. Shen, Modern Chinese Word Segmentation Specification and Automatic Segmentation Methods for Information Processing (in Chinese) , Beijing: Qinghua University and Nanning: Guangxi Science and Technology Press, 1994.
    [30] X. Ma, Champollion: A Robust Parallel Text Sentence Aligner, Proceedings of the Fifth International Conference of the Language Resources and Evaluation, 489–492, 2006.
    [31] K. Maeda, X. Ma and S. Strassel, Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion, Proceedings of the Sixth Language Resources and Evaluation Conference, 26–30, 2008.
    [32] C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999.
    [33] E. A. Nida, Toward a Science of Translating: With Special Reference to Principles and Procedures Involved in Bible Translating, Adler`s Foreign Books Inc, 1964.
    [34] NTCIR。http://research.nii.ac.jp/ntcir/。
    [35] D. W. Oard, Alternative Approaches for Cross-Language Text Retrieval, Working Notes of the American Association for Artificial Intelligence Spring Symposiums on Cross-Language Text and Speech Retrieval, 131–139, 1997.
    [36] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1986.
    [37] The Stanford Parser: A statistical parser。
    http://nlp.stanford.edu/software/lex-parser.shtml。
    [38] The Porter Stemming Algorithm。http://www.tartarus.org/martin/PorterStemmer/。
    [39] Y. H. Tseng, D. W. Juang and C. J. Lin, Automatic Categorization of Japanese Patents based on Surrogate Texts, Proceedings of the Fifth NTCIR Workshop on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, 348–354, 2005.
    [40] Y. H. Tseng, D. W. Juang, Y. M. Wang and C. J. Lin, Text Mining for Patent Map Analysis, Proceedings of the International Association for Computer Information Systems Pacific 2005 Conference, 1109–1116, 2005.
    [41] Y. H. Tseng, Y. M. Wang, Y. I. Lin, C. J. Lin and D. W. Juang, Patent Surrogate Extraction and Evaluation in the Context of Patent Mapping, Journal of Information Science, 718–736, 2007.
    [42] United States Patent and Trademark Office。http://www.uspto.gov/。
    [43] M. Utiyama and H. Isahara, A Japanese-English Patent Parallel Corpus, Proceedings of the Eleventh Machine Translation Summit, 475–482, 2007.
    [44] P. K. Wong and C. Chan, Chinese Word Segmentation based on Maximum Matching and Word Binding Force, Proceedings of the Sixteenth International Conference of the Computational Linguistics, 200–203, 1996.
    [45] D. Wu, Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria, Proceedings of the Thirty Second Annual Meeting of the Association for Computational Linguistics, 80–87, 1994.
    [46] C. C. Yang and K. W. Li, Automatic Construction of English/Chinese Parallel Corpora, Journal of the American Society for Information Science and Technology, 730–742, 2003.
    Description: 碩士
    國立政治大學
    資訊科學學系
    96753027
    97
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0096753027
    Data Type: thesis
    Appears in Collections:[Department of Computer Science ] Theses

    Files in This Item:

    File Description SizeFormat
    302701.pdf313KbAdobe PDF21019View/Open
    302702.pdf273KbAdobe PDF2946View/Open
    302703.pdf270KbAdobe PDF21019View/Open
    302704.pdf390KbAdobe PDF21231View/Open
    302705.pdf325KbAdobe PDF21215View/Open
    302706.pdf363KbAdobe PDF21308View/Open
    302707.pdf444KbAdobe PDF21610View/Open
    302708.pdf601KbAdobe PDF22096View/Open
    302709.pdf768KbAdobe PDF21340View/Open
    302710.pdf303KbAdobe PDF21934View/Open
    302711.pdf365KbAdobe PDF21772View/Open
    302712.pdf2220KbAdobe PDF24041View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback