政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153552
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113485/144472 (79%)
Visitors : 51390998      Online Users : 719
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/153552


    Title: 辨識華語健康照護資料集內的症狀
    Symptom name recognition in a Mandarin Chinese healthcare dataset
    Authors: 徐宜君
    Hsu, Yi-Chun
    Contributors: 張瑜芸
    Chang, Yu-Yun
    徐宜君
    Hsu, Yi-Chun
    Keywords: 生物醫學命名實體識別
    症狀辨識
    語言學分析
    Biomedical NER
    Symptom name recognition
    Linguistic analysis
    Date: 2024
    Issue Date: 2024-09-04 16:03:17 (UTC+8)
    Abstract: 隨著對生物醫學命名實體識別(NER)任務的需求不斷增加,本研究著重於從華語健康照護資料集中提取醫學實體,以症狀為主,原因是症狀的辨識相對困難。本篇的研究目的在於確定何種症狀的組成形式難以辨認,並檢驗包括部首、詞性標記、語義角色和使役動詞在內的四種語言特徵如何影響不同組成形式的症狀識別。本研究的方法包括根據症狀的組成形式修改症狀的原始標記,並應用條件隨機場(CRF)和 BERT 連結條件隨機場(BERT-CRF)兩種模型,結合四種特徵進行症狀識別任務。

    結果顯示,「修飾語和中心語」和「修飾語和主事者和中心語」這兩種組成形式的症狀較難辨認。句法和語意層面上的兩種特徵,分別是詞性標記和語意角色,可能會影響症狀識別的表現。總結:本研究基於症狀的組成形式和四個語言層次的特徵,提供了症狀識別任務的語言學分析,此分析可用於評估該任務的表現。此外,此任務可能有益於提高醫病溝通的效率。
    As the increasing need for biomedical named entity recognition (NER) tasks, this study targets the extraction of medical entities, especially symptoms, from a Mandarin Chinese healthcare dataset since symptoms are challenging to be recognized. The purpose of this study aims to identify which composition form of symptom is difficult to be recognized and examine how four linguistic features including radicals, POS tagging, semantic roles, and causative verbs affect symptom recognition in different composition forms. The method of this study includes revising the original annotation of symptoms based on the composition forms and applying two models, CRF and BERT-CRF, combined with four features to conduct the symptom recognition task.

    The results revealed that symptoms of these composition forms "modifier" and "modifier and theme and head" were difficult to be recognized. Two features in syntactic and semantic level, POS tagging and semantic roles, may have an influence on the performance of symptom recognition. In conclusion, this study provides a linguistic analysis of the symptom recognition task based on the composition forms of symptom and features in four linguistic levels, which can be utilized to evaluate the performance of this task. This task may improve the efficiency of medical communication.
    Reference: Budi, I., & Bressan, S. (2003). Association rules mining for name entity recognition. Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003., 325–328.
    Cai, X., Dong, S., & Hu, J. (2019). A deep learning model incorporating part of speech and self-matching attention for named entity recognition of chinese electronic medical records. BMC medical informatics and decision making, 19(2), 101– 109.
    Carnie, A. (2012). Syntax: A generative introduction. Wiley-Blackwell.
    Chen, F.-Y., Tsai, P.-F., Chen, K.-J., & Hunag, C.-R. (1999). 中文句結構樹資料庫的構建 (Sinica Treebank) [In Chinese]. International Journal of Computational Linguistics & Chinese Language Processing, 4(2), 87–104. https://aclanthology. org/O99-4004
    Chen, J., Wang, Z., Tian, R., Yang, Z., & Yang, D. (2020). Local additivity based data augmentation for semi-supervised ner. arXiv preprint arXiv:2010.01677.
    Chinese Knowledge and Information Processing Laboratory. (2013, January). 句結構樹中的語意角色 (tech. rep. No. 13-01). Institute of Information Science, Academia Sinica.
    Chiou, S.-T., Huang, S.-W., Lo, Y.-C., Wu, Y.-H., & Wu, J.-L. (2022). Scu-nlp at rocling 2022 shared task: Experiment and error analysis of biomedical entity detection model. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 350–355.
    Chomsky, N. (1993). Lectures on government and binding: The pisa lectures. Walter de Gruyter.
    Chou, Y., Huang, C., et al. (2013). The formal representation for chinese characters. 当代语言学 (Contemporary linguistics), 15(2), 142–161.
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Divita, G., Luo, G., Tran, L.-T. T., Workman, T. E., Gundlapalli, A. V., & Samore,
    M. H. (2017). General symptom extraction from va electronic medical notes. In Medinfo 2017: Precision healthcare through informatics (pp. 356–360). IOS Press.
    Dogan, R. I., & Lu, Z. (2012). An improved corpus of disease mentions in pubmed citations. BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 91–99.
    Doğan, R. I., Leaman, R., & Lu, Z. (2014). Ncbi disease corpus: A resource for disease name recognition and concept normalization. Journal of biomedical informatics, 47, 1–10.
    Feng, Z.-Q., Chen, P.-K., & Wang, J.-C. (2022). Ncu1415 at rocling 2022 shared task: A light-weight transformer-based approach for biomedical name entity recog- nition. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 316–320.
    Fukuda, K.-i., Tsunoda, T., Tamura, A., Takagi, T., et al. (1998). Toward information extraction: Identifying protein names from biological papers. Pac symp biocom- put, 707(18), 707–718.
    Gill, V. T., & Maynard, D. W. (2006). Explaining illness: Patients’ proposals and physi- cians’ responses. Studies in Interactional Sociolinguistics, 20, 115.
    Gu, X., Sun, Z., & Zhang, W. (2021). Composition-driven symptom phrase recognition for chinese medical consultation corpora. BMC medical informatics and decision making, 21, 1–15.
    Hassan, M., Makkaoui, O., Coulet, A., & Toussaint, Y. (2015). Extracting disease- symptom relationships by learning syntactic patterns from dependency graphs. BioNLP 15, 184.
    He, L., Yang, Z., Lin, H., & Li, Y. (2014). Drug name recognition in biomedical texts: A machine-learning-based method. Drug discovery today, 19(5), 610–617.
    Hu, Y., Ameer, I., Zuo, X., Peng, X., Zhou, Y., Li, Z., Li, Y., Li, J., Jiang, X., & Xu,
    H. (2023). Zero-shot clinical entity recognition using chatgpt. arXiv preprint arXiv:2303.16416.
    Huang, C.-R., & Hsieh, S.-K. (2015). Chinese lexical semantics. In The oxford handbook of chinese linguistics (pp. 290–305). Oxford Univ. Press.
    Huang, S.-L., Lin, S.-C., Ma, W.-Y., & Chen, K.-j. (2015, January). Semantic roles and semantic role labeling. https://doi.org/10.13140/RG.2.2.25040.20481
    Ji, B., Liu, R., Li, S., Yu, J., Wu, Q., Tan, Y., & Wu, J. (2019). A hybrid approach for named entity recognition in chinese electronic medical record. BMC medical informatics and decision making, 19(2), 149–158.
    Lee, L.-H., Chen, C.-Y., Yu, L.-C., & Tseng, Y.-H. (2022). Overview of the rocling 2022 shared task for chinese healthcare named entity recognition. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 363–368.
    Lee, L.-H., & Lu, Y. (2021). Multiple embeddings enhanced multi-graph neural net- works for chinese healthcare named entity recognition. IEEE Journal of Biomed- ical and Health Informatics, 25(7), 2801–2810.
    Li, J., Fei, H., Liu, J., Wu, S., Zhang, M., Teng, C., Ji, D., & Li, F. (2022). Unified named entity recognition as word-word relation classification. proceedings of the AAAI conference on artificial intelligence, 36(10), 10965–10973.
    Li, X., Zhang, H., & Zhou, X.-H. (2020). Chinese clinical named entity recognition with variant neural structures based on bert methods. Journal of biomedical informat- ics, 107, 103422.
    Lin, B.-S., Chen, J.-H., & Chang, T.-H. (2022). Nerve at rocling 2022 shared task: A comparison of three named entity recognition frameworks based on language model and lexicon approach. Proceedings of the 34th Conference on Computa- tional Linguistics and Speech Processing (ROCLING 2022), 343–349.
    Liu, P., Guo, Y., Wang, F., & Li, G. (2022). Chinese named entity recognition: The state of the art. Neurocomputing, 473, 37–53.
    Luo, L., Yang, Z., Song, Y., Li, N., & Lin, H. (2020). Chinese clinical named entity recognition based on stroke elmo and multi-task learning. Chinese Journal of Computers, 43(10), 1943–1957.
    Luo, X., Wang, J., & Zhang, X. (2022). Ynu-hpcc at rocling 2022 shared task: A transformer- based model with focal loss and regularization dropout for chinese healthcare named entity recognition. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 335–342.
    Luo, X., Gandhi, P., Storey, S., & Huang, K. (2021). A deep language model for symptom extraction from clinical text and its application to extract covid-19 symptoms from social media. IEEE Journal of Biomedical and Health Informatics, 26(4), 1737–1748.
    Ma, H.-Y., Li, W.-J., & Liu, C.-L. (2022). Migbaseline at rocling 2022 shared task: Re- port on named entity recognition using chinese healthcare datasets. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 356–362.
    Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese word segmentation sys- tem for the first international Chinese word segmentation bakeoff. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 168–171. https://doi.org/10.3115/1119250.1119276
    Mansouri, A., Affendey, L. S., & Mamat, A. (2008). Named entity recognition ap- proaches. International Journal of Computer Science and Network Security, 8(2), 339–344.
    Martin, L., Battistelli, D., & Charnois, T. (2014). Symptom recognition issue. 13th work- shop on Biomedical Natural Language Processing (BioNLP 2014), 107–111.
    Métivier, J.-P., Serrano, L., Charnois, T., Cuissart, B., & Widlöcher, A. (2015). Auto- matic symptom extraction from texts to enhance knowledge discovery on rare diseases. Artificial Intelligence in Medicine: 15th Conference on Artificial Intel- ligence in Medicine, AIME 2015, Pavia, Italy, June 17-20, 2015. Proceedings 15, 249–254.
    Pustejovsky, J. (1998). The generative lexicon. MIT press.
    Scheuermann, R. H., Ceusters, W., & Smith, B. (2009). Toward an ontological treatment of disease and diagnosis. Summit on Translational Bioinformatics, 2009, 116– 120.
    Schuyler, P. L., Hole, W. T., Tuttle, M. S., & Sherertz, D. D. (1993). The umls metathe- saurus: Representing different views of biomedical concepts. Bulletin of the Medical Library Association, 81(2), 217.
    Singh, A. P., Joshi, H. S., Singh, A., Agarwal, M., & Kaur, P. (2018). Online medi- cal consultation: A review. International Journal of Community Medicine and Public Health, 5(4), 1230–1232.
    Steinkamp, J. M., Bala, W., Sharma, A., & Kantrowitz, J. J. (2020). Task definition, an-
    notated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. Journal of biomedical informatics, 102, 103354.
    Wang, Y., Liu, Y., Yu, Z., Chen, L., & Jiang, Y. (2012). A preliminary work on symptom name recognition from free-text clinical records of traditional chinese medicine using conditional random fields and reasonable features. BioNLP: Proceedings of the 2012 workshop on biomedical natural language processing, 223–230.
    Wang, Y., Yu, Z., Chen, L., Chen, Y., Liu, Y., Hu, X., & Jiang, Y. (2014). Supervised methods for symptom name recognition in free-text clinical records of traditional chinese medicine: An empirical study. Journal of biomedical informatics, 47, 91–104.
    Wen, G., Chen, H., Li, H., Hu, Y., Li, Y., & Wang, C. (2020). Cross domains adversarial learning for chinese named entity recognition for online medical consultation. Journal of Biomedical Informatics, 112, 103608.
    Xu, K., Zhou, Z., Hao, T., & Liu, W. (2018). A bidirectional LSTM and conditional random fields approach to medical named entity recognition. Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017, 355–365.
    Xu, S. (2012). Shuowen jiezi. 艺术中国网.
    Yang, J., Wang, H., Tang, Y., & Yang, F. (2021). Incorporating lexicon and character glyph and morphological features into bilstm-crf for chinese medical ner. 2021 IEEE International Conference on Consumer Electronics and Computer Engi- neering (ICCECE), 12–17.
    Yang, T.-H., Su, R.-C., Su, T.-E., Chong, S.-S., & Su, M.-H. (2022). Scu-mesclab at rocling-2022 shared task: Named entity recognition using bert classifier. Pro- ceedings of the 34th Conference on Computational Linguistics and Speech Pro- cessing (ROCLING 2022), 329–334.
    You, J.-M., & Chen, K.-J. (2004). Automatic semantic role assignment for a tree struc- ture. Proceedings of the third SIGHAN workshop on chinese language process- ing, 109–115.
    Zhang, Q., Sun, Y., Zhang, L., Jiao, Y., & Tian, Y. (2021). Named entity recognition method in health preserving field based on BERT. Procedia Computer Science, 183, 212–220.
    Zhang, Q.-X., Chi, T.-Y., Yang, T.-L., & Jang, J.-S. R. (2022). Crowner at rocling 2022 shared task: Ner using macbert and adversarial training. Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), 321–328.
    Zhou, X., Menche, J., Barabási, A.-L., & Sharma, A. (2014). Human symptoms–disease network. Nature communications, 5(1), 4212.
    Zong, S., Baheti, A., Xu, W., & Ritter, A. (2020). Extracting a knowledge base of covid- 19 events from social media. arXiv preprint arXiv:2006.02567.
    Description: 碩士
    國立政治大學
    語言學研究所
    109555003
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109555003
    Data Type: thesis
    Appears in Collections:[Graduate Institute of Linguistics] Theses

    Files in This Item:

    File Description SizeFormat
    500301.pdf1257KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback