Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/145744
|
Title: | 基於BERT模型的專利相似度計算:以台灣金融科技專利為例 Using BERT to Analyze Patent Similarity: A Study of Taiwan’s FinTech Patents |
Authors: | 蔡孟純 Tsai, Meng-Chun |
Contributors: | 宋皇志 蔡炎龍 Sung, Huang-Chih Tsai, Yen-Lung 蔡孟純 Tsai, Meng-Chun |
Keywords: | 專利檢索 自然語言處理 深度學習 BERT 中文專利 Patent Retrieval Natural Language Processing Deep Learning BERT Chinese patents |
Date: | 2023 |
Issue Date: | 2023-07-06 16:22:52 (UTC+8) |
Abstract: | 隨著科技迅速發展、全球競爭加劇,各行各業對專利的價值和保護越 來越重視。金融科技領域尤其如此,傳統金融機構面臨著科技公司帶來的 巨大競爭威脅,迫使它們重新評估競爭力和創新能力。專利不僅是技術創 新的重要保護手段,也是企業獲取競爭優勢和市場份額的關鍵要素。其中, 專利檢索任務可以提供關鍵的商業和法律信息,幫助企業做出決策並保護 自身的利益。然而,由於日益龐大的專利數據庫以及專利文本包含的技術 詞彙、艱澀用語等特性,使得專利檢索任務面臨較大時間成本與專業門檻。 本研究旨在運用BERT 模型,提出一種更準確和有效的專利相似度計 算方法,來改善專利從業人員的困境。為了進一步探討深度學習模型於中 文文本的適用性,本研究將研究範圍鎖定在台灣金融科技專利,並選用 BERT-Base-Chinese 的模型。我們將以專利範圍與專利標題作為模型的主要輸入,藉由BERT 轉換成代表向量之後,再進一步進行相似檢索分析。為 了測試本次研究的實驗成果,我們從總數據集13,478 筆專利當中,找出兩 類測試資料作為測試。分別是具有引用專利的引用專利集,共計2,123 筆資料;以及核駁案件專利集,共計640 筆資料。 實驗結果顯示,經由BERT 轉換的中文文本向量能在一定程度上保留 語意。在測試階段,兩個測試資料集在前段的表現都十分良好,尤其是在 數據集的第一分位數之前。其相似檢索表現可以濾掉約99% 數據,找到1% 的關鍵數據。然而,實驗在中後段的排名表現以及標準差數值上出現了侷 限性。總體來說,此次研究證實BERT 模型具備發展中文專利檢索的潛力。 能夠作為專利檢索方法的輔助工具,協助專利檢索者在有限條件下更有效 的找到與查詢相關的專利文件,以降低遺漏重要專利的風險。 With rapid technological advancement and intensified global competition, the value and protection of patents are increasingly emphasized across various industries. This holds particularly true in the field of FinTech, where traditional financial institutions face significant competitive threats from technology companies, compelling them to reassess their competitiveness and innovative capabilities. Patents serve as not only vital means of protecting technological innovation but also key elements for businesses to gain a competitive edge and market share. Among them, patent retrieval tasks can provide crucial business and legal information, helping companies make decisions and protect their interests. However, due to the ever-expanding patent databases and the technical terminology and complex language present in patent documents, patent retrieval tasks face challenges of higher time costs and professional thresholds. This research aims to utilize the BERT model to propose a more accurate and efficient method for patent similarity calculation, aiming to improve the plight of patent professionals. To further explore the applicability of deep learning models to Chinese texts, this study focuses on FinTech patents in Taiwan and employs BERT-Base-Chinese. We use the patent claims and titles as the primary inputs for the model. After transforming them into representative vectors using BERT, we conduct further analysis for similarity retrieval. To evaluate the experimental results, we extract two categories of test data from a total dataset of 13,478 patents: a set of cited patents with a total of 2,123 records and a set of rejected patent cases with a total of 640 records. The experimental results show that the Chinese text vectors transformed by BERT can preserve semantics to a certain extent. In the testing phase, both test data sets perform well in the early stage, particularly before the first quartile of the dataset. The similarity retrieval performance filters out approximately 99% of the data and identifies 1% of the critical data. However, the experiment reveals limitations in ranking performance and standard deviation values in the middle and later stages. Overall, this study confirms the potential of the BERT model in developing Chinese patent retrieval. It can serve as an auxiliary tool for patent retrieval, assisting patent searchers in more effectively finding relevant patent documents under limited conditions, thereby reducing the risk of overlooking important patents. |
Reference: | [1] Douglas W Arner, Janos Barberis, and Ross P Buckley. The evolution of fintech: A new post-crisis paradigm. Geo. J. Int’l L., 47:1271, 2015. [2] National Association. United Services Automobile Association v. Wells Fargo Bank. Number 5:18-cv-00246-BO. 2021. [3] Michael Buckland and Fredric Gey. The relationship between recall and precision. Journal of the American society for information science, 45(1):12–19, 1994. [4] Erik Cambria and Bebo White. Jumping nlp curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2):48–57, 2014. [5] Hung-Yun Chiang and Kuan-Yu Chen. A bert-based siamese-structured retrieval model. In Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021), pages 163–172, 2021. [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv: 1810.04805, 2018. [7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv: 1810.04805, 2018. [8] Mattyws F Grawe, Claudia A Martins, and Andreia G Bonfante. Automated patent classification using word embedding. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 408–411. IEEE, 2017. [9] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [10] Makoto Iwayama, Atsushi Fujii, Noriko Kando, and Akihiko Takano. Overview of patent retrieval task at ntcir-3. In Proceedings of the ACL-2003 workshop on Patent corpus processing, pages 24–32, 2003. [11] Dylan Myungchul Kang, Charles Cheolgi Lee, Suan Lee, and Wookey Lee. Patent prior art search using deep learning language model. In Proceedings of the 24th Symposium on International Database Engineering & Applications, pages 1–5, 2020. [12] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436–444, 2015. [13] Jieh-Sheng Lee and Jieh Hsiang. Patentbert: Patent classification with fine-tuning a pretrained bert model. arXiv preprint arXiv:1906.02124, 2019. [14] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020. [15] Shaobo Li, Jie Hu, Yuxin Cui, and Jianjun Hu. Deeppatent: patent classification with convolutional neural networks and word embedding. Scientometrics, 117:721–744, 2018. [16] Mihai Lupu, Jimmy Huang, Jianhan Zhu, and John Tait. Trec-chem: large scale chemical information retrieval evaluation at trec. In Acm Sigir Forum, volume 43, pages 63–70. ACM New York, NY, USA, 2009. [17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013. [18] Luca Molà. The silk industry of Renaissance Venice. JHU Press, 2000. [19] NTCIR. Ntcir test collections, 2023. http://research.nii.ac.jp/ntcir/ link/link-en.html/, Last accessed on 2023-04-09. [20] World Intellectual Property Organization. World intellectual property indicators 2022. [21] Florina Piroi, Mihai Lupu, and Allan Hanbury. Overview of clef-ip 2013 lab: Information retrieval in the patent domain. In Information Access Evaluation. Multilinguality, Multimodality, and Visualization: 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23-26, 2013. Proceedings 4, pages 232–249. Springer, 2013. [22] André Rattinger12, Jean-Marie Le Goff, and Christian Guetl. Local word embeddings for query expansion based on co-authorship and citations. 2018. [23] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015. [24] Walid Shalaby and Wlodek Zadrozny. Patent retrieval: a literature review. Knowledge and Information Systems, 61:631–660, 2019. [25] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [26] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020. [27] world intellectual property (WIPO). Guide to the International Patent Classification (2023). 2023. [28] 宋皇志. 人工智能在專利檢索之應用初探. 全國律師, 21(10):27–33, 2017. [29] 張東揚/李維峻/吳俊彥. 我國金融科技專利的現況與挑戰. 財金資訊季刊, No.93: 頁 23–35, 2018. [30] 戴余修. 基於bert 預訓練模型的專利檢索方法. 2021. [31] 曹旭友, 周志平, 王利, and 赵卫东. 基于bert+ att 和dbscan 的长三角专利匹配算法. 信息技术, 44(3):1–5, 2020. [32] 曾元顯. 專利文自知知識探勘:技術與挑戰. 現代資訊組織與檢索研討會, pages 111–123, 2004. [33] 顏俊仁; 林彥廷; 廖國智; 李清祺. 金融科技專利發展的概貌. 智慧財產權月刊, pages 頁6–21, 2018. [34] 金融監督管理委員會. 金融科技發展策略白皮書. 2016. [35] 陳妍錦, 呂新科, 羅嘉惠, 林芃君, and 簡志維. 專利地圖分析與檢索技術之探討. In KC 2013 第九屆知識社群研討會論文集, pages 923–933, 2013. |
Description: | 碩士 國立政治大學 科技管理與智慧財產研究所 109364101 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109364101 |
Data Type: | thesis |
Appears in Collections: | [科技管理與智慧財產研究所] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
410101.pdf | | 1017Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|