Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/142642
|
Title: | 基於詞組的注意力機制用於長文轉換器模型 Token-wise Attention Mechanism for Long Input Transformer Models |
Authors: | 賴建郡 Lai, Jian-Jyun |
Contributors: | 黃瀚萱 Huang, Hen-Hsen 賴建郡 Lai, Jian-Jyun |
Keywords: | 自然語言處理 長文處理 轉換器 注意力機制 基於詞組分析 Natural language processing Long text processing Transformer Attention mechanism Token-wise analysis |
Date: | 2022 |
Issue Date: | 2022-12-02 15:20:46 (UTC+8) |
Abstract: | 在現今的自然預言處理的領域當中,以轉換器作為基礎的模型是一個經常被使用的架構,通常來說依照使用該架構來針對大型文本進行預訓練,再針對下游不同的任務分別再進行微調被視為是有效的;在轉換器模型當中,注意力機制是該模型得以獲得資訊的關鍵,而由於注意力機制本身的架構,當字串的長度增加,使用的記憶體也會巨幅的成長,同時,轉換器模型在執行長字串的任務的表現仍舊有進步的空間。
本文嘗試以個別詞組來重新定義注意力機制觀測的範圍,分別為詞性標記和獨立的詞組注意力機制,並以一個切隔注意力機制的矩陣計算方式來達到降低記憶體使用。
在長字串分類和長字串問答中,使用獨立的詞組注意力機制的模型能達到與現今的傑出長字串模型—Longformer相互競爭的表現,並相較於該模型使用較少的記憶體,使其能夠更輕易的應用於自然語言任務。 Transformer-based models are the mainstream in natural language processing (NLP). This scheme is proven an efficient method essential in pre-training and fine-tuning. In the Transformer-based models, the attention mechanism is critical to gaining information on sequences. However, the architecture in the attention mechanism has led to time-consuming and significantly affected by the length of sequences. Also, the performance of the Transformer-based models dealing with long sequences tasks still has much room for further improvement.
In this work, we tend to use a token-wise method to redefine the limiting of the attention mechanism: POS tagging and independent attention. Moreover, with splitting attention matrix computing, the model tends to occupy less memory.
While dealing with long sequences classification and question-answering tasks, the independent attention mechanism models show competitive performance with Longformer. In addition, memory usage also shows an advantage. Thus, using the proposed method tends to be easier in dealing with NLP tasks. |
Reference: | [1] Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer, 2020. [2] Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1147. URL https://aclanthology.org/P17-1147. [3] Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. SemEval-2019 task 4: Hyperpartisan news detection. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 829–839, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/S19-2145. URL https://aclanthology.org/S19-2145. [4] Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, and Kentaro Inui. Attention is not only a weight: Analyzing transformers with vector norms, 2020. [5] Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents, 2014. [6] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. [7] Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey, 2020. [8] Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, and Che Zheng. Synthesizer: Rethinking self-attention in transformer models, 2021. [9] Trieu H. Trinh and Quoc V. Le. A simple method for commonsense reasoning, 2019. [10] Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. Defending against neural fake news, 2020. [11] Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, 2015. |
Description: | 碩士 國立政治大學 資訊科學系 109753205 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109753205 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202201686 |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
320501.pdf | | 35513Kb | Adobe PDF2 | 90 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|