Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/159097
|
Title: | LLM提示工程與查核報告能否提升財報舞弊偵測? Does LLM Prompt Engineering and Audit Report Embedding Improve Financial Fraud Detection? |
Authors: | 張永愛 Chang, Yung-Ai |
Contributors: | 莊皓鈞 周彥君 Chuang, Hao-Chun Chou, Yen-Chun 張永愛 Chang, Yung-Ai |
Keywords: | 財報舞弊偵測 查核報告嵌入 提示工程 BERT SBERT 孤立森林 SHAP values Financial Statement Fraud Detection Auditor Report Embedding Prompt Engineering BERT SBERT Isolation Forest SHAP Values |
Date: | 2025 |
Issue Date: | 2025-09-01 15:05:34 (UTC+8) |
Abstract: | 本研究探討結合大型語言模型(Large Language Models, LLM)提示工程與會計師查核報告嵌入(embedding)是否能提升財報舞弊偵測的效果。相較於過往僅使用數值型財務與非財務指標進行分析,本研究納入文字型內容,透過 ChatGPT-4o 提取與舞弊風險高度相關的五大語意構面與關鍵字,並結合 BERT 與 Sentence-BERT 等語言模型進行語意向量化,建立具語意辨識能力的文字型指標。 實證資料涵蓋台灣上市、上櫃、興櫃與創新版等公司,舞弊樣本由投保中心公布之「財報不實」訴訟案件中選取,正常樣本則依相同產業與時間配對。分析方法採用無監督學習之孤立森林(Isolation Forest,IF)進行異常偵測,並結合 SHAP values 提升模型可解釋性。 研究結果顯示,納入文字型指標能有效提升舞弊偵測之敏感度與精確性,特別是在採樣平衡情境下,「關鍵查核事項+年分」模型之真陽性數為全指標模型的兩倍,偽陽性亦較少。此外,SBERT 雖能提升召回率,但相對於 BERT 模型,其誤判數亦較多,顯示需視應用情境權衡選擇。本研究證實查核報告中語意訊號對舞弊風險具有高度辨識力,並提供監理機構與企業一套具備實務可行性的早期預警方法。 This study explores whether integrating prompt engineering with large language models (LLMs) and auditor report embeddings can enhance the detection of financial statement fraud. Unlike previous approaches that relied solely on numerical financial and non-financial indicators, this research incorporates textual data by extracting five key semantic dimensions and associated keywords related to fraud risk using ChatGPT-4o. These textual features are then vectorized using language models such as BERT and Sentence-BERT to create semantically meaningful indicators. The empirical data covers companies listed on the Taiwan Stock Exchange, OTC (Over-the-Counter), Emerging Stock Board, and the Innovation Board. Fraudulent samples are selected from financial misstatement litigation cases disclosed by the Securities and Futures Investors Protection Center. Normal samples are matched based on industry and reporting period. The analysis employs an unsupervised anomaly detection method, Isolation Forest (IF), and incorporates SHAP values to enhance model interpretability. The results show that incorporating textual indicators significantly improves the sensitivity and precision of fraud detection. In particular, under balanced sampling conditions, the "Key Audit Matters + Year" model identified twice as many true positives and fewer false positives compared to the full-feature model. While SBERT improved recall rates, it also resulted in more false positives than the BERT-based model, suggesting a trade-off depending on application context. This study confirms that semantic signals within auditor reports are highly indicative of fraud risk and offers a practical early warning framework for regulators and companies. |
Reference: | Achakzai, M. A. K., & Peng, J. (2023). Detecting financial statement fraud using dynamic ensemble machine learning. International Review of Financial Analysis, 89. Beneish, D. M. (1999). The Detection of Earnings Manipulation. Financial Analysts Journal, 55(5), 24–36. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fairchild, R., & Marnet, O. (2022). Cycles of Corporate Fraud: a Behavioural Economics Approach. In Research Handbook on Corporate Board Decision-Making: Research Handbooks in Business and Management series, 367-401. Hariri, S., Carrasco Kind, M., & Brunner, R. J. (2021). Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1479-1489. Hwang, TK., Chen, WC., Chiang, WC., Li, YM. (2022). Machine Learning Detection for Financial Statement Fraud. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data Mining techniques for the detection of fraudulent financial statements. Expert Systems With Applications, 32(4), 995-1003. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. Leevy, J. L., Salekshahrezaee, Z., & Khoshgoftaar, T. M. (2024). A Review of Unsupervised Anomaly Detection Techniques for Health Insurance Fraud. 141-149. Li, W., Liu, X., & Zhou, S. (2024). Deep Learning Model Based Research on Anomaly Detection and Financial Fraud Identification in Corporate Financial Reporting Statements. The Journal of Combinatorial Mathematics and Combinatorial Computing, 123(1), 343-355. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery From Data, 6(1), 3. Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) Isolation Forest. 2008 8th IEEE International Conference on Data Mining, Pisa, 15-19 December 2008, 413-422. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. Lundberg, S.M. and Lee, S.-I. (2017) A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 4766-4777. Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. 50(3), 559-569. Perols, J., & Lougee, B. A. (2011). The relation between earnings management and financial statement fraud. Advances in Accounting, 27(1), 39-53. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140), 1-67. Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. 50(2), 491-500. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084 Schilit, M. (2010). Financial Shenanigans: Detecting Accounting Gimmicks That Destroy Investments (corrected November 2010). 27(4), 67-74. Sharma, V. D. (2004). Board of Director Characteristics, Institutional Ownership and Fraud: Evidence from Australia. Ear and Hearing, 23(2), 105-117. Shahana, T., Lavanya, V., & Bhat, A. R. (2023). State of the art in financial statement fraud detection: A systematic review. Technological Forecasting and Social Change, 192, 122527. Summers, S. L., & Sweeney, J. T. (1998). Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis. The Accounting Review, 73(1), 131-146. van Vugt, M., Hogan, R., & Kaiser, R. B. (2008). Leadership, followership, and evolution: Some lessons from the past. American Psychologist, 63(3), 182-196. Vasarhelyi, M. A., Kogan, A., & Tuttle, B. (2015). Big Data in Accounting: An Overview. Accounting Horizons, 29(2), 381-396. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. Yao, J., Pan, Y., Yang, S., Chen, Y., & Li, Y. (2019). Detecting Fraudulent Financial Statements for the Sustainable Development of the Socio-Economy in China: A Multi-Analytic Approach. Sustainability, 11(6), 1579. Zainudin, E. F., & Hashim, H. A. (2016). Detecting fraudulent financial reporting using financial ratio. Journal of Financial Reporting and Accounting, 14(2), 266-278. 林均祐,2023,〈管理階層討論與分析語調對股票報酬中反映的預期未來盈餘之影響〉,國立臺灣大學會計學研究所碩士論文。 張莉,2019,〈雲時代的舞弊審計――基於國家治理的新策略〉,《Business & Economics》,崧燁文化出版。 許伯彥,2003,〈財務報表舞弊風險評量模式硏究〉,國立臺灣大學會計學研究所碩士論文。 陳雪如、林琦珍、柯佳玲,2009,〈自願性資訊揭露對財務報導舞弊偵測之研究〉,《會計與公司治理》,6(2)。 陳雪如、黃劭彥、史雅男、蕭鎮臺,〈再探財務報表舞弊-風險因子新鑑識〉。 劉若蘭、李旻育,2017,〈董事會政治關聯, 客戶重要性對財務報導舞弊之影響〉。 劉桂良、葉寶松、周蘭,2009,〈舞弊治理:基於上市公司財務舞弊特徵的分析〉,《財經理論與實踐》,頁52-56。 |
Description: | 碩士 國立政治大學 資訊管理學系 112356037 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0112356037 |
Data Type: | thesis |
Appears in Collections: | [資訊管理學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
603701.pdf | | 4024Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|