Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/152579
|
Title: | 以生成式AI為基礎的知識檢索 Knowledge Retrieval based on Generative AI |
Authors: | 楊德倫 Yang, Te-Lun |
Contributors: | 劉吉軒 張智星 Liu, Jyi-Shane Jang, Jyh-Shing 楊德倫 Yang, Te-Lun |
Keywords: | 檢索增強生成 密集向量搜尋 重新排序 大型語言模型 評估 Retrieval Augmented Generation Dense Vector Search Re-ranking Large Language Model Evaluation |
Date: | 2024 |
Issue Date: | 2024-08-05 12:47:14 (UTC+8) |
Abstract: | 本研究旨在建立一個以檢索增強生成(Retrieval-Augmented Generation)為基礎的問答系統。藉由使用中文維基百科(Wikipedia)與法源法律網(Lawbank)作為檢索資料來源,以及 TTQA 和 TMMLU+ 評估資料集作為評估資料集,再使用 BGE-M3 和 BGE-reranker 將問題和檢索結果進行高度相關性的排序,將最相關的檢索結果當作大型語言模型(Large Language Model)回答問題的參考依據,幫助大型語言模型能正確回答問題,以建立一個以生成式 AI 為基礎的知識檢索系統。 系統成效評估採用自動評估和輔助效能評估兩階段評估系統效能,以反映其在獨立操作和實際輔助情境下的表現:(一)自動評估是通過比對模型自動生成的標籤(例如 ABCD)與實際答案(Ground Truth),計算準確率來進行。這一階段評估系統在無人工介入的標準化條件下的表現能力。(二)輔助效能評估涉及對系統的實際應用效果進行測試。本研究隨機選取 20 個與金融相關的選擇題,並邀請 20 名非金融背景受測者進行兩階段答題。在第一階段,受測者獨立作答;第二階段,受測者將獲得系統根據問題生成的相關參考資訊,以此輔助作答。此部分評估的目的是檢視系統在提供幫助下,能否實際提升受測者的答題準確率,從而驗證系統在實際應用中的增值效果。透過兩階段的評估,不僅能量化系統自動運行的效能,也能了解其在輔助實際使用情境下的實用性,從而全面評估系統的綜合效能。 本研究的主要貢獻如下:(一)改善 LLM 對解決知識密集型任務能力的不足:導入 BGE-M3(Embedding Model),以密集向量檢索(Dense Vector Retrieval)取得與查詢問題相似度高的檢索結果,作為 LLM 參考知識的來源;導入 BGE-reranker(Re-ranking Model),重新排序檢索結果,找出實際與查詢問題高度相關的資訊,作為 LLM 回答問題的參考依據,同時降低幻覺帶來的影響,依任務需求動態取得最新的授權或公開的檢索知識來源。(二)實現資料隱私保護與降低對商業服務的依賴:透過客製化的 RAG 架構,使得系統能在本地端建立並運行 LLM,無需將私人資料傳送至外部商業服務提供商的伺服器,不僅增強了資料的安全性和隱私保護,也減少了對商業服務的依賴,降低了潛在的營運成本與隱私洩露的風險。 This study aims to develop a question-answering system based on Retrieval-Augmented Generation (RAG). Utilizing Chinese Wikipedia and Lawbank as retrieval data sources, and TTQA and TMMLU+ as evaluation datasets, the system employs BGE-M3 and BGE-reranker to rank the queries and retrieval results based on relevance. The most pertinent retrieval results are used as references for a Large Language Model (LLM) to enhance its accuracy in answering questions, thereby establishing a knowledge retrieval system based on generative AI. The performance of the system is assessed in two stages to reflect its capabilities in independent operation and assisted scenarios: (1) The automatic assessment involves comparing the labels generated by the model (e.g., ABCD) with the actual answers (Ground Truth), calculating accuracy to evaluate the system's performance under standardized conditions without human intervention. (2) The assisted performance assessment tests the system's real-world application effectiveness. For this, 20 multiple-choice questions related to finance are randomly selected, and 20 participants with non-financial backgrounds are invited to answer in two stages. In the first stage, participants respond independently; in the second stage, they are assisted by system-generated relevant information based on the questions. This part of the assessment aims to determine whether the system can actually improve participants' accuracy, thereby validating the system's added value in practical applications. The two-stage assessment quantitatively evaluates the system's autonomous operation and its utility in assisted real-world scenarios, thus comprehensively assessing the system's overall efficacy. The main contributions of this research are as follows: (1) Addressing the inadequacies of LLMs in handling knowledge-intensive tasks: Incorporating BGE-M3 (Embedding Model) for Dense Vector Retrieval to obtain highly relevant search results as sources of reference knowledge for LLM; introducing BGE-reranker (Re-ranking Model) to reorder the retrieval outcomes, identifying information highly relevant to the query problems as references for LLM responses, thus reducing the impact of hallucinations and dynamically obtaining the latest authorized or publicly available retrieval knowledge sources based on task requirements. (2) Enhancing data privacy and reducing dependency on commercial services: Through a customized RAG architecture, the system is built and operated locally, eliminating the need to transmit private data to external commercial service providers' servers. This approach not only strengthens data security and privacy but also reduces dependency on commercial services, lowering potential operational costs and risks of privacy breaches. |
Reference: | Ainslie, J.、Lee-Thorp, J.、de Jong, M.、Zemlyanskiy, Y.、Lebrón, F.、Sanghai, S.(2023)。Gqa: Training generalized multi-query transformer models from multi-head checkpoints。arXiv preprint arXiv:2305.13245,。 Bengio, Y.、Ducharme, R.、Vincent, P.(2000)。A neural probabilistic language model。Advances in neural information processing systems,13。 Blei, D. M.、Ng, A. Y.、Jordan, M. I.(2003)。Journal of machine learning research。Journal of Machine Learning Research,3,993-1022。 Brown, T.、Mann, B.、Ryder, N.、Subbiah, M.、Kaplan, J. D.、Dhariwal, P.、Neelakantan, A.、Shyam, P.、Sastry, G.、Askell, A.(2020)。Language models are few-shot learners。Advances in neural information processing systems,33,1877-1901。 Celikyilmaz, A.、Clark, E.、Gao, J.(2020)。Evaluation of text generation: A survey。arXiv preprint arXiv:2006.14799,。 Chen, J.、Xiao, S.、Zhang, P.、Luo, K.、Lian, D.、Liu, Z.(2024)。Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation。arXiv preprint arXiv:2402.03216,。 Chen, L.-C.、Li, Z.-R.(2024)。Bailong: Bilingual transfer learning based on qlora and zip-tie embedding。arXiv preprint arXiv:2404.00862,。 Chen, Z.、Tao, H.、Zuo, D.、Jiang, J.、Jun, Y.、Wei, Y.(2023)。Efficient title reranker for fast and improved knowledge-intense nlp。arXiv preprint arXiv:2312.12430,。 Chowdhery, A.、Narang, S.、Devlin, J.、Bosma, M.、Mishra, G.、Roberts, A.、Barham, P.、Chung, H. W.、Sutton, C.、Gehrmann, S.(2023)。Palm: Scaling language modeling with pathways。Journal of Machine Learning Research,24(240),1-113。 Devlin, J.、Chang, M.-W.、Lee, K.、Toutanova, K.(2018)。Bert: Pre-training of deep bidirectional transformers for language understanding。arXiv preprint arXiv:1810.04805,。 Douze, M.、Guzhva, A.、Deng, C.、Johnson, J.、Szilvasy, G.、Mazaré, P.-E.、Lomeli, M.、Hosseini, L.、Jégou, H.(2024)。The faiss library。arXiv preprint arXiv:2401.08281,。 Gao, Y.、Xiong, Y.、Gao, X.、Jia, K.、Pan, J.、Bi, Y.、Dai, Y.、Sun, J.、Wang, H.(2023)。Retrieval-augmented generation for large language models: A survey。arXiv preprint arXiv:2312.10997,。 Hendrycks, D.、Burns, C.、Basart, S.、Zou, A.、Mazeika, M.、Song, D.、Steinhardt, J.(2020)。Measuring massive multitask language understanding。arXiv preprint arXiv:2009.03300,。 Hsu, C.-J.、Liu, C.-L.、Liao, F.-T.、Hsu, P.-C.、Chen, Y.-C.、Shiu, D.-S.(2024)。Breeze-7b technical report。arXiv preprint arXiv:2403.02712,。 Huang, L.、Yu, W.、Ma, W.、Zhong, W.、Feng, Z.、Wang, H.、Chen, Q.、Peng, W.、Feng, X.、Qin, B.(2023)。A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions。arXiv preprint arXiv:2311.05232,。 Jiang, A. Q.、Sablayrolles, A.、Mensch, A.、Bamford, C.、Chaplot, D. S.、Casas, D. d. l.、Bressand, F.、Lengyel, G.、Lample, G.、Saulnier, L.(2023)。Mistral 7b。arXiv preprint arXiv:2310.06825,。 Jones, K. S.、Walker, S.、Robertson, S. E.(2000)。A probabilistic model of information retrieval: Development and comparative experiments: Part 2。Information processing & management,36(6),809-840。 Jozefowicz, R.、Vinyals, O.、Schuster, M.、Shazeer, N.、Wu, Y.(2016)。Exploring the limits of language modeling。arXiv preprint arXiv:1602.02410,。 Kaplan, J.、McCandlish, S.、Henighan, T.、Brown, T. B.、Chess, B.、Child, R.、Gray, S.、Radford, A.、Wu, J.、Amodei, D.(2020)。Scaling laws for neural language models。arXiv preprint arXiv:2001.08361,。 Karpukhin, V.、Oğuz, B.、Min, S.、Lewis, P.、Wu, L.、Edunov, S.、Chen, D.、Yih, W.-t.(2020)。Dense passage retrieval for open-domain question answering。arXiv preprint arXiv:2004.04906,。 Kotsiantis, S. B.、Zaharakis, I.、Pintelas, P.(2007)。Supervised machine learning: A review of classification techniques。Emerging artificial intelligence applications in computer engineering,160(1),3-24。 Lashkari, A. H.、Mahdavi, F.、Ghomi, V. (2009)。 A boolean model in information retrieval for search engines, 2009 International Conference on Information Management and Engineering (第 385-389頁)。 IEEE。 Lee, L.-H.、Lu, Y.(2021)。Multiple embeddings enhanced multi-graph neural networks for chinese healthcare named entity recognition。IEEE Journal of Biomedical and Health Informatics,25(7),2801-2810。 Lewis, P.、Perez, E.、Piktus, A.、Petroni, F.、Karpukhin, V.、Goyal, N.、Küttler, H.、Lewis, M.、Yih, W.-t.、Rocktäschel, T.(2020)。Retrieval-augmented generation for knowledge-intensive nlp tasks。Advances in neural information processing systems,33,9459-9474。 Lin, C.-Y. (2004)。 Rouge: A package for automatic evaluation of summaries, Text summarization branches out (第 74-81頁)。 Lin, Y.-T.、Chen, Y.-N.(2023)。Taiwan llm: Bridging the linguistic divide with a culturally aligned language model。arXiv preprint arXiv:2311.17487,。 Liu, Y.、Han, T.、Ma, S.、Zhang, J.、Yang, Y.、Tian, J.、He, H.、Li, A.、He, M.、Liu, Z.(2023)。Summary of chatgpt-related research and perspective towards the future of large language models。Meta-Radiology,,100017。 Mikolov, T.、Chen, K.、Corrado, G.、Dean, J.(2013)。Efficient estimation of word representations in vector space。arXiv preprint arXiv:1301.3781,。 Obrenovic, B.、Gu, X.、Wang, G.、Godinic, D.、Jakhongirov, I.(2024)。Generative ai and human–robot interaction: Implications and future agenda for business, society and ethics。AI & SOCIETY,,1-14。 Ouyang, L.、Wu, J.、Jiang, X.、Almeida, D.、Wainwright, C.、Mishkin, P.、Zhang, C.、Agarwal, S.、Slama, K.、Ray, A.(2022)。Training language models to follow instructions with human feedback。Advances in neural information processing systems,35,27730-27744。 Papineni, K.、Roukos, S.、Ward, T.、Zhu, W.-J. (2002)。 Bleu: A method for automatic evaluation of machine translation, Proceedings of the 40th annual meeting of the Association for Computational Linguistics (第 311-318頁)。 Radford, A.、Narasimhan, K.、Salimans, T.、Sutskever, I.(2018)。Improving language understanding by generative pre-training。。 Radford, A.、Wu, J.、Child, R.、Luan, D.、Amodei, D.、Sutskever, I.(2019)。Language models are unsupervised multitask learners。OpenAI Blog,1(8),9。 Rasool, Z.、Barnett, S.、Kurniawan, S.、Balugo, S.、Vasa, R.、Chesser, C.、Bahar-Fuchs, A.(2023)。Evaluating llms on document-based qa: Exact answer selection and numerical extraction using cogtale datase。arXiv preprint arXiv:2311.07878,。 Reimers, N.、Gurevych, I.(2019)。Sentence-bert: Sentence embeddings using siamese bert-networks。arXiv preprint arXiv:1908.10084,。 Robertson, S. E.、Belkin, N. J.(1978)。Ranking in principle。Journal of Documentation,34(2),93-100。 Salton, G.、Buckley, C.(1988)。Term-weighting approaches in automatic text retrieval。Information processing & management,24(5),513-523。 Salton, G.、Wong, A.、Yang, C.-S.(1975)。A vector space model for automatic indexing。Communications of the ACM,18(11),613-620。 Tam, Z.-R.、Pai, Y.-T.、Lee, Y.-W.、Cheng, S.、Shuai, H.-H.(2024)。An improved traditional chinese evaluation suite for foundation model。arXiv preprint arXiv:2403.01858,。 Team, G.、Mesnard, T.、Hardin, C.、Dadashi, R.、Bhupatiraju, S.、Pathak, S.、Sifre, L.、Rivière, M.、Kale, M. S.、Love, J.(2024)。Gemma: Open models based on gemini research and technology。arXiv preprint arXiv:2403.08295,。 Vaswani, A.、Shazeer, N.、Parmar, N.、Uszkoreit, J.、Jones, L.、Gomez, A.、Kaiser, L.、Polosukhin, I.(2017)。Attention is all you need. Arxiv 2017。arXiv preprint arXiv:1706.03762,。 Wei, J.、Tay, Y.、Bommasani, R.、Raffel, C.、Zoph, B.、Borgeaud, S.、Yogatama, D.、Bosma, M.、Zhou, D.、Metzler, D.(2022)。Emergent abilities of large language models。arXiv preprint arXiv:2206.07682,。 Xue, L.、Constant, N.、Roberts, A.、Kale, M.、Al-Rfou, R.、Siddhant, A.、Barua, A.、Raffel, C.(2020)。Mt5: A massively multilingual pre-trained text-to-text transformer。arXiv preprint arXiv:2010.11934,。 Yang, T.-L.、Huang, G.-L.、Tseng, Y.-H. (2023)。 Applying a vector search method in reference service question-answer retrieval systems, International Conference on Asian Digital Libraries (第 204-209頁)。 Springer。 Zhang, Q.-X.、Chi, T.-Y.、Yang, T.-L.、Jang, J.-S. R. (2022)。 Crowner at rocling 2022 shared task: Ner using macbert and adversarial training, Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022) (第 321-328頁)。 Zhang, X.、Ma, X.、Shi, P.、Lin, J.(2021)。Mr. Tydi: A multi-lingual benchmark for dense retrieval。arXiv preprint arXiv:2108.08787,。 Zhang, X.、Thakur, N.、Ogundepo, O.、Kamalloo, E.、Alfonso-Hermelo, D.、Li, X.、Liu, Q.、Rezagholizadeh, M.、Lin, J.(2022)。Making a miracl: Multilingual information retrieval across a continuum of languages。arXiv preprint arXiv:2210.09984,。 Zhao, W. X.、Zhou, K.、Li, J.、Tang, T.、Wang, X.、Hou, Y.、Min, Y.、Zhang, B.、Zhang, J.、Dong, Z.(2023)。A survey of large language models。arXiv preprint arXiv:2303.18223,。 Zhu, Y.、Yuan, H.、Wang, S.、Liu, J.、Liu, W.、Deng, C.、Dou, Z.、Wen, J.-R.(2023)。Large language models for information retrieval: A survey。arXiv preprint arXiv:2308.07107,。 陳光華(1999)。資訊檢索技術之核心。大學圖書館, 3 (1),,17-28。 曾元顯、高佐良、鄭浩(2011)。中文專利前案檢索模式之成效評估。教育資料與圖書館學,49(1),75-102。 |
Description: | 碩士 國立政治大學 資訊科學系 111971029 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0111971029 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
102901.pdf | | 6286Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|