Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/153552
|
Title: | 辨識華語健康照護資料集內的症狀 Symptom name recognition in a Mandarin Chinese healthcare dataset |
Authors: | 徐宜君 Hsu, Yi-Chun |
Contributors: | 張瑜芸 Chang, Yu-Yun 徐宜君 Hsu, Yi-Chun |
Keywords: | 生物醫學命名實體識別 症狀辨識 語言學分析 Biomedical NER Symptom name recognition Linguistic analysis |
Date: | 2024 |
Issue Date: | 2024-09-04 16:03:17 (UTC+8) |
Abstract: | 隨著對生物醫學命名實體識別(NER)任務的需求不斷增加,本研究著重於從華語健康照護資料集中提取醫學實體,以症狀為主,原因是症狀的辨識相對困難。本篇的研究目的在於確定何種症狀的組成形式難以辨認,並檢驗包括部首、詞性標記、語義角色和使役動詞在內的四種語言特徵如何影響不同組成形式的症狀識別。本研究的方法包括根據症狀的組成形式修改症狀的原始標記,並應用條件隨機場(CRF)和 BERT 連結條件隨機場(BERT-CRF)兩種模型,結合四種特徵進行症狀識別任務。
結果顯示,「修飾語和中心語」和「修飾語和主事者和中心語」這兩種組成形式的症狀較難辨認。句法和語意層面上的兩種特徵,分別是詞性標記和語意角色,可能會影響症狀識別的表現。總結:本研究基於症狀的組成形式和四個語言層次的特徵,提供了症狀識別任務的語言學分析,此分析可用於評估該任務的表現。此外,此任務可能有益於提高醫病溝通的效率。 As the increasing need for biomedical named entity recognition (NER) tasks, this study targets the extraction of medical entities, especially symptoms, from a Mandarin Chinese healthcare dataset since symptoms are challenging to be recognized. The purpose of this study aims to identify which composition form of symptom is difficult to be recognized and examine how four linguistic features including radicals, POS tagging, semantic roles, and causative verbs affect symptom recognition in different composition forms. The method of this study includes revising the original annotation of symptoms based on the composition forms and applying two models, CRF and BERT-CRF, combined with four features to conduct the symptom recognition task.
The results revealed that symptoms of these composition forms "modifier" and "modifier and theme and head" were difficult to be recognized. Two features in syntactic and semantic level, POS tagging and semantic roles, may have an influence on the performance of symptom recognition. In conclusion, this study provides a linguistic analysis of the symptom recognition task based on the composition forms of symptom and features in four linguistic levels, which can be utilized to evaluate the performance of this task. This task may improve the efficiency of medical communication. |
Reference: | Budi, I., & Bressan, S. (2003). Association rules mining for name entity recognition. Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003., 325–328. Cai, X., Dong, S., & Hu, J. (2019). A deep learning model incorporating part of speech and self-matching attention for named entity recognition of chinese electronic medical records. BMC medical informatics and decision making, 19(2), 101– 109. Carnie, A. (2012). Syntax: A generative introduction. Wiley-Blackwell. Chen, F.-Y., Tsai, P.-F., Chen, K.-J., & Hunag, C.-R. (1999). 中文句結構樹資料庫的構建 (Sinica Treebank) [In Chinese]. International Journal of Computational Linguistics & Chinese Language Processing, 4(2), 87–104. https://aclanthology. org/O99-4004 Chen, J., Wang, Z., Tian, R., Yang, Z., & Yang, D. (2020). Local additivity based data augmentation for semi-supervised ner. arXiv preprint arXiv:2010.01677. Chinese Knowledge and Information Processing Laboratory. (2013, January). 句結構樹中的語意角色 (tech. rep. No. 13-01). Institute of Information Science, Academia Sinica. Chiou, S.-T., Huang, S.-W., Lo, Y.-C., Wu, Y.-H., & Wu, J.-L. (2022). Scu-nlp at rocling 2022 shared task: Experiment and error analysis of biomedical entity detection model. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 350–355. Chomsky, N. (1993). Lectures on government and binding: The pisa lectures. Walter de Gruyter. Chou, Y., Huang, C., et al. (2013). The formal representation for chinese characters. 当代语言学 (Contemporary linguistics), 15(2), 142–161. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Divita, G., Luo, G., Tran, L.-T. T., Workman, T. E., Gundlapalli, A. V., & Samore, M. H. (2017). General symptom extraction from va electronic medical notes. In Medinfo 2017: Precision healthcare through informatics (pp. 356–360). IOS Press. Dogan, R. I., & Lu, Z. (2012). An improved corpus of disease mentions in pubmed citations. BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 91–99. Doğan, R. I., Leaman, R., & Lu, Z. (2014). Ncbi disease corpus: A resource for disease name recognition and concept normalization. Journal of biomedical informatics, 47, 1–10. Feng, Z.-Q., Chen, P.-K., & Wang, J.-C. (2022). Ncu1415 at rocling 2022 shared task: A light-weight transformer-based approach for biomedical name entity recog- nition. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 316–320. Fukuda, K.-i., Tsunoda, T., Tamura, A., Takagi, T., et al. (1998). Toward information extraction: Identifying protein names from biological papers. Pac symp biocom- put, 707(18), 707–718. Gill, V. T., & Maynard, D. W. (2006). Explaining illness: Patients’ proposals and physi- cians’ responses. Studies in Interactional Sociolinguistics, 20, 115. Gu, X., Sun, Z., & Zhang, W. (2021). Composition-driven symptom phrase recognition for chinese medical consultation corpora. BMC medical informatics and decision making, 21, 1–15. Hassan, M., Makkaoui, O., Coulet, A., & Toussaint, Y. (2015). Extracting disease- symptom relationships by learning syntactic patterns from dependency graphs. BioNLP 15, 184. He, L., Yang, Z., Lin, H., & Li, Y. (2014). Drug name recognition in biomedical texts: A machine-learning-based method. Drug discovery today, 19(5), 610–617. Hu, Y., Ameer, I., Zuo, X., Peng, X., Zhou, Y., Li, Z., Li, Y., Li, J., Jiang, X., & Xu, H. (2023). Zero-shot clinical entity recognition using chatgpt. arXiv preprint arXiv:2303.16416. Huang, C.-R., & Hsieh, S.-K. (2015). Chinese lexical semantics. In The oxford handbook of chinese linguistics (pp. 290–305). Oxford Univ. Press. Huang, S.-L., Lin, S.-C., Ma, W.-Y., & Chen, K.-j. (2015, January). Semantic roles and semantic role labeling. https://doi.org/10.13140/RG.2.2.25040.20481 Ji, B., Liu, R., Li, S., Yu, J., Wu, Q., Tan, Y., & Wu, J. (2019). A hybrid approach for named entity recognition in chinese electronic medical record. BMC medical informatics and decision making, 19(2), 149–158. Lee, L.-H., Chen, C.-Y., Yu, L.-C., & Tseng, Y.-H. (2022). Overview of the rocling 2022 shared task for chinese healthcare named entity recognition. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 363–368. Lee, L.-H., & Lu, Y. (2021). Multiple embeddings enhanced multi-graph neural net- works for chinese healthcare named entity recognition. IEEE Journal of Biomed- ical and Health Informatics, 25(7), 2801–2810. Li, J., Fei, H., Liu, J., Wu, S., Zhang, M., Teng, C., Ji, D., & Li, F. (2022). Unified named entity recognition as word-word relation classification. proceedings of the AAAI conference on artificial intelligence, 36(10), 10965–10973. Li, X., Zhang, H., & Zhou, X.-H. (2020). Chinese clinical named entity recognition with variant neural structures based on bert methods. Journal of biomedical informat- ics, 107, 103422. Lin, B.-S., Chen, J.-H., & Chang, T.-H. (2022). Nerve at rocling 2022 shared task: A comparison of three named entity recognition frameworks based on language model and lexicon approach. Proceedings of the 34th Conference on Computa- tional Linguistics and Speech Processing (ROCLING 2022), 343–349. Liu, P., Guo, Y., Wang, F., & Li, G. (2022). Chinese named entity recognition: The state of the art. Neurocomputing, 473, 37–53. Luo, L., Yang, Z., Song, Y., Li, N., & Lin, H. (2020). Chinese clinical named entity recognition based on stroke elmo and multi-task learning. Chinese Journal of Computers, 43(10), 1943–1957. Luo, X., Wang, J., & Zhang, X. (2022). Ynu-hpcc at rocling 2022 shared task: A transformer- based model with focal loss and regularization dropout for chinese healthcare named entity recognition. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 335–342. Luo, X., Gandhi, P., Storey, S., & Huang, K. (2021). A deep language model for symptom extraction from clinical text and its application to extract covid-19 symptoms from social media. IEEE Journal of Biomedical and Health Informatics, 26(4), 1737–1748. Ma, H.-Y., Li, W.-J., & Liu, C.-L. (2022). Migbaseline at rocling 2022 shared task: Re- port on named entity recognition using chinese healthcare datasets. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 356–362. Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese word segmentation sys- tem for the first international Chinese word segmentation bakeoff. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 168–171. https://doi.org/10.3115/1119250.1119276 Mansouri, A., Affendey, L. S., & Mamat, A. (2008). Named entity recognition ap- proaches. International Journal of Computer Science and Network Security, 8(2), 339–344. Martin, L., Battistelli, D., & Charnois, T. (2014). Symptom recognition issue. 13th work- shop on Biomedical Natural Language Processing (BioNLP 2014), 107–111. Métivier, J.-P., Serrano, L., Charnois, T., Cuissart, B., & Widlöcher, A. (2015). Auto- matic symptom extraction from texts to enhance knowledge discovery on rare diseases. Artificial Intelligence in Medicine: 15th Conference on Artificial Intel- ligence in Medicine, AIME 2015, Pavia, Italy, June 17-20, 2015. Proceedings 15, 249–254. Pustejovsky, J. (1998). The generative lexicon. MIT press. Scheuermann, R. H., Ceusters, W., & Smith, B. (2009). Toward an ontological treatment of disease and diagnosis. Summit on Translational Bioinformatics, 2009, 116– 120. Schuyler, P. L., Hole, W. T., Tuttle, M. S., & Sherertz, D. D. (1993). The umls metathe- saurus: Representing different views of biomedical concepts. Bulletin of the Medical Library Association, 81(2), 217. Singh, A. P., Joshi, H. S., Singh, A., Agarwal, M., & Kaur, P. (2018). Online medi- cal consultation: A review. International Journal of Community Medicine and Public Health, 5(4), 1230–1232. Steinkamp, J. M., Bala, W., Sharma, A., & Kantrowitz, J. J. (2020). Task definition, an- notated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. Journal of biomedical informatics, 102, 103354. Wang, Y., Liu, Y., Yu, Z., Chen, L., & Jiang, Y. (2012). A preliminary work on symptom name recognition from free-text clinical records of traditional chinese medicine using conditional random fields and reasonable features. BioNLP: Proceedings of the 2012 workshop on biomedical natural language processing, 223–230. Wang, Y., Yu, Z., Chen, L., Chen, Y., Liu, Y., Hu, X., & Jiang, Y. (2014). Supervised methods for symptom name recognition in free-text clinical records of traditional chinese medicine: An empirical study. Journal of biomedical informatics, 47, 91–104. Wen, G., Chen, H., Li, H., Hu, Y., Li, Y., & Wang, C. (2020). Cross domains adversarial learning for chinese named entity recognition for online medical consultation. Journal of Biomedical Informatics, 112, 103608. Xu, K., Zhou, Z., Hao, T., & Liu, W. (2018). A bidirectional LSTM and conditional random fields approach to medical named entity recognition. Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017, 355–365. Xu, S. (2012). Shuowen jiezi. 艺术中国网. Yang, J., Wang, H., Tang, Y., & Yang, F. (2021). Incorporating lexicon and character glyph and morphological features into bilstm-crf for chinese medical ner. 2021 IEEE International Conference on Consumer Electronics and Computer Engi- neering (ICCECE), 12–17. Yang, T.-H., Su, R.-C., Su, T.-E., Chong, S.-S., & Su, M.-H. (2022). Scu-mesclab at rocling-2022 shared task: Named entity recognition using bert classifier. Pro- ceedings of the 34th Conference on Computational Linguistics and Speech Pro- cessing (ROCLING 2022), 329–334. You, J.-M., & Chen, K.-J. (2004). Automatic semantic role assignment for a tree struc- ture. Proceedings of the third SIGHAN workshop on chinese language process- ing, 109–115. Zhang, Q., Sun, Y., Zhang, L., Jiao, Y., & Tian, Y. (2021). Named entity recognition method in health preserving field based on BERT. Procedia Computer Science, 183, 212–220. Zhang, Q.-X., Chi, T.-Y., Yang, T.-L., & Jang, J.-S. R. (2022). Crowner at rocling 2022 shared task: Ner using macbert and adversarial training. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 321–328. Zhou, X., Menche, J., Barabási, A.-L., & Sharma, A. (2014). Human symptoms–disease network. Nature communications, 5(1), 4212. Zong, S., Baheti, A., Xu, W., & Ritter, A. (2020). Extracting a knowledge base of covid- 19 events from social media. arXiv preprint arXiv:2006.02567. |
Description: | 碩士 國立政治大學 語言學研究所 109555003 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109555003 |
Data Type: | thesis |
Appears in Collections: | [語言學研究所] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
500301.pdf | | 1257Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|