English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 114105/145137 (79%)
Visitors : 52154440      Online Users : 396
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/142642
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/142642


    Title: 基於詞組的注意力機制用於長文轉換器模型
    Token-wise Attention Mechanism for Long Input Transformer Models
    Authors: 賴建郡
    Lai, Jian-Jyun
    Contributors: 黃瀚萱
    Huang, Hen-Hsen
    賴建郡
    Lai, Jian-Jyun
    Keywords: 自然語言處理
    長文處理
    轉換器
    注意力機制
    基於詞組分析
    Natural language processing
    Long text processing
    Transformer
    Attention mechanism
    Token-wise analysis
    Date: 2022
    Issue Date: 2022-12-02 15:20:46 (UTC+8)
    Abstract: 在現今的自然預言處理的領域當中,以轉換器作為基礎的模型是一個經常被使用的架構,通常來說依照使用該架構來針對大型文本進行預訓練,再針對下游不同的任務分別再進行微調被視為是有效的;在轉換器模型當中,注意力機制是該模型得以獲得資訊的關鍵,而由於注意力機制本身的架構,當字串的長度增加,使用的記憶體也會巨幅的成長,同時,轉換器模型在執行長字串的任務的表現仍舊有進步的空間。

    本文嘗試以個別詞組來重新定義注意力機制觀測的範圍,分別為詞性標記和獨立的詞組注意力機制,並以一個切隔注意力機制的矩陣計算方式來達到降低記憶體使用。

    在長字串分類和長字串問答中,使用獨立的詞組注意力機制的模型能達到與現今的傑出長字串模型—Longformer相互競爭的表現,並相較於該模型使用較少的記憶體,使其能夠更輕易的應用於自然語言任務。
    Transformer-based models are the mainstream in natural language processing (NLP). This scheme is proven an efficient method essential in pre-training and fine-tuning. In the Transformer-based models, the attention mechanism is critical to gaining information on sequences. However, the architecture in the attention mechanism has led to time-consuming and significantly affected by the length of sequences. Also, the performance of the Transformer-based models dealing with long sequences tasks still has much room for further improvement.

    In this work, we tend to use a token-wise method to redefine the limiting of the attention mechanism: POS tagging and independent attention. Moreover, with splitting attention matrix computing, the model tends to occupy less memory.

    While dealing with long sequences classification and question-answering tasks, the independent attention mechanism models show competitive performance with Longformer. In addition, memory usage also shows an advantage. Thus, using the proposed method tends to be easier in dealing with NLP tasks.
    Reference: [1] Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document
    transformer, 2020.
    [2] Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. TriviaQA: A large
    scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017.
    Association for Computational Linguistics. doi: 10.18653/v1/P17-1147. URL
    https://aclanthology.org/P17-1147.
    [3] Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh,
    David Corney, Benno Stein, and Martin Potthast. SemEval-2019 task 4: Hyperpartisan news detection. In Proceedings of the 13th International Workshop on
    Semantic Evaluation, pages 829–839, Minneapolis, Minnesota, USA, June 2019.
    Association for Computational Linguistics. doi: 10.18653/v1/S19-2145. URL
    https://aclanthology.org/S19-2145.
    [4] Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, and Kentaro Inui. Attention is not
    only a weight: Analyzing transformers with vector norms, 2020.
    [5] Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents, 2014.
    [6] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
    word representations in vector space, 2013.
    [7] Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers:
    A survey, 2020.
    [8] Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, and Che Zheng.
    Synthesizer: Rethinking self-attention in transformer models, 2021.
    [9] Trieu H. Trinh and Quoc V. Le. A simple method for commonsense reasoning, 2019.
    [10] Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi,
    Franziska Roesner, and Yejin Choi. Defending against neural fake news, 2020.
    [11] Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun,
    Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like
    visual explanations by watching movies and reading books, 2015.
    Description: 碩士
    國立政治大學
    資訊科學系
    109753205
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109753205
    Data Type: thesis
    DOI: 10.6814/NCCU202201686
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    320501.pdf35513KbAdobe PDF290View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback