Research on entity relationship extraction of Chinese medical literature and application in diabetes medical literature_Journal of Biomedical Engineering

Authors：

FAN Zhiyuan ¹ ,  HE Xuan ^1,2 , LIANG Pin ¹ , LU Jing ¹ ,  KANG Yan ³

1. College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110819, P.R.China;
2. Neusoft Research of Intelligent Healthcare Technology, Co. Ltd., Shenyang 110819, P.R.China;
3. College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, Guangdong 518118, P.R.China;

Corresponding?author：

HE Xuan, Email: hexuan@bmie.neu.edu.cn; KANG Yan, Email: kangyan@sztu.edu.cn

Keywords：

medical literature; entity relationship extraction; convolutional neural network; knowledge graph; diabetes

DOI：

10.7507/1001-5515.202001009

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

The medical literature contains a wealth of valuable medical knowledge. At present, the research on extraction of entity relationship in medical literature has made great progress, but with the exponential increase in the number of medical literature, the annotation of medical text has become a big problem. In order to solve the problem of manual annotation time such as consuming and heavy workload, a remote monitoring annotation method is proposed, but this method will introduce a lot of noise. In this paper, a novel neural network structure based on convolutional neural network is proposed, which can solve a large number of noise problems. The model can use the multi-window convolutional neural network to automatically extract sentence features. After the sentence vectors are obtained, the sentences that are effective to the real relationship are selected through the attention mechanism. In particular, an entity type (ET) embedding method is proposed for relationship classification by adding entity type characteristics. The attention mechanism at sentence level is proposed for relation extraction in allusion to the unavoidable labeling errors in training texts. We conducted an experiment using 968 medical references on diabetes, and the results showed that compared with the baseline model, the present model achieved good results in the medical literature, and F1-score reached 93.15%. Finally, the extracted 11 types of relationships were stored as triples, and these triples were used to create a medical map of complex relationships with 33 347 nodes and 43 686 relationship edges. Experimental results show that the algorithm used in this paper is superior to the optimal reference system for relationship extraction.

Citation： FAN Zhiyuan, HE Xuan, LIANG Pin, LU Jing, KANG Yan. Research on entity relationship extraction of Chinese medical literature and application in diabetes medical literature. Journal of Biomedical Engineering, 2021, 38(3): 563-573. doi: 10.7507/1001-5515.202001009 Copy

1.	鄭潔瓊. 生物醫學文本中實體關系抽取的研究. 大連: 大連理工大學, 2017.
2.	Munkhdalai T, Liu F, Yu H. Clinical relation extraction toward drug safety surveillance using electronic health record narratives: Classical learning versus deep learning. JMIR Public Health Surveill, 2018, 4(2): e29.
3.	Luo Y, Cheng Y, Uzuner ?, et al. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes. J Am Med Inform Assoc, 2017, 25(1): 93-98.
4.	He B, Guan Y, Dai R. Classifying medical relations in clinical text via convolutional neural networks. Artif Intell Med, 2019, 93: 43-49.
5.	Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform, 2017, 72: 85-95.
6.	Lin Y, Shen S, Liu Z, et al. Neural relation extraction with selective attention over instances// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016, 1: 2124-2133.
7.	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017: 5998-6008.
8.	俞雪歆, 張偉. 中國中老年糖尿病患者衛生可及性現狀及其影響因素研究. 中國循證醫學雜志, 2019, 19(6): 645-650.
9.	任明, 孫曉, 王美娜, 等. 中國人糖尿病前期致病危險因素的系統評價. 中國循證醫學雜志, 2019, 19(2): 140-146.
10.	Yoon K. Convolutional neural networks for sentence classification// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Doha: Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
11.	Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. arXiv: 2013: 1310.4546v1.
12.	Zeng D, Liu K, Lai S, et al. Relation classification via convolutional deep neural network// Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin: International Conference on Computational Linguistics, 2014: 2335-2344.
13.	Miotto R, Li L, Kidd B A, et al. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep, 2016, 6(1): 1-10.
14.	Ford E, Carroll J A, Smith H E, et al. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc, 2016, ; 23(5): 1007-1015.
15.	楊錦鋒, 關毅, 何彬, 等. 中文電子病歷命名實體和實體關系語料庫構建. 軟件學報, 2016, 27(11): 2725-2746.
16.	趙芳芳. 面向中文電子病歷的詞性標注技術研究. 哈爾濱: 哈爾濱工業大學, 2014.
17.	楊錦峰, 于秋濱, 關毅, 等. 電子病歷命名實體識別和實體關系抽取研究綜述. 自動化學報, 2014, 40(8): 1537-1562.
18.	Campos D, Matos S, Oliveira J L. Biomedical named entity recognition: A survey of machine-learning tools// Sakurai S. Theory and Applications for Advanced Text Mining: InTech, 2012: 175-195.
19.	Zhang Dongxu, Wang Dong. Relation classification via recurrent neural network. ArXiv: 2015: 1508.01006v2.
20.	Nguyen T H, Grishman R. Relation extraction: Perspective from convolutional neural networks// Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. Denver: Workshop on Vector Space Modeling for Natural Language Processing, 2015: 39-48.
21.	Dong X, Qian L, Guan Y, et al. A multiclass classification method based on deep learning for named entity recognition in electronic medical records// New York Scientific Data Summit (NYSDS). New York: IEEE; 2016: 1-10.
22.	Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2: Short Papers). Berlin: Association for Computational Linguistics, 2016: 207-212.
23.	黃夢醒, 李夢龍, 韓惠蕊. 基于電子病歷的實體識別和知識圖譜構建的研究. 計算機應用研究, 2019, 36(12): 221-225.
24.	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv: 2018: 1810.04805v2.
25.	Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans: North American Chapter of the Association for Computational Linguistics, 2018: 2227-2237.

1. 鄭潔瓊. 生物醫學文本中實體關系抽取的研究. 大連: 大連理工大學, 2017.
2. Munkhdalai T, Liu F, Yu H. Clinical relation extraction toward drug safety surveillance using electronic health record narratives: Classical learning versus deep learning. JMIR Public Health Surveill, 2018, 4(2): e29.
3. Luo Y, Cheng Y, Uzuner ?, et al. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes. J Am Med Inform Assoc, 2017, 25(1): 93-98.
4. He B, Guan Y, Dai R. Classifying medical relations in clinical text via convolutional neural networks. Artif Intell Med, 2019, 93: 43-49.
5. Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform, 2017, 72: 85-95.
6. Lin Y, Shen S, Liu Z, et al. Neural relation extraction with selective attention over instances// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016, 1: 2124-2133.
7. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017: 5998-6008.
8. 俞雪歆, 張偉. 中國中老年糖尿病患者衛生可及性現狀及其影響因素研究. 中國循證醫學雜志, 2019, 19(6): 645-650.
9. 任明, 孫曉, 王美娜, 等. 中國人糖尿病前期致病危險因素的系統評價. 中國循證醫學雜志, 2019, 19(2): 140-146.
10. Yoon K. Convolutional neural networks for sentence classification// Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Doha: Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
11. Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. arXiv: 2013: 1310.4546v1.
12. Zeng D, Liu K, Lai S, et al. Relation classification via convolutional deep neural network// Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin: International Conference on Computational Linguistics, 2014: 2335-2344.
13. Miotto R, Li L, Kidd B A, et al. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep, 2016, 6(1): 1-10.
14. Ford E, Carroll J A, Smith H E, et al. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc, 2016, ; 23(5): 1007-1015.
15. 楊錦鋒, 關毅, 何彬, 等. 中文電子病歷命名實體和實體關系語料庫構建. 軟件學報, 2016, 27(11): 2725-2746.
16. 趙芳芳. 面向中文電子病歷的詞性標注技術研究. 哈爾濱: 哈爾濱工業大學, 2014.
17. 楊錦峰, 于秋濱, 關毅, 等. 電子病歷命名實體識別和實體關系抽取研究綜述. 自動化學報, 2014, 40(8): 1537-1562.
18. Campos D, Matos S, Oliveira J L. Biomedical named entity recognition: A survey of machine-learning tools// Sakurai S. Theory and Applications for Advanced Text Mining: InTech, 2012: 175-195.
19. Zhang Dongxu, Wang Dong. Relation classification via recurrent neural network. ArXiv: 2015: 1508.01006v2.
20. Nguyen T H, Grishman R. Relation extraction: Perspective from convolutional neural networks// Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. Denver: Workshop on Vector Space Modeling for Natural Language Processing, 2015: 39-48.
21. Dong X, Qian L, Guan Y, et al. A multiclass classification method based on deep learning for named entity recognition in electronic medical records// New York Scientific Data Summit (NYSDS). New York: IEEE; 2016: 1-10.
22. Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2: Short Papers). Berlin: Association for Computational Linguistics, 2016: 207-212.
23. 黃夢醒, 李夢龍, 韓惠蕊. 基于電子病歷的實體識別和知識圖譜構建的研究. 計算機應用研究, 2019, 36(12): 221-225.
24. Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv: 2018: 1810.04805v2.
25. Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans: North American Chapter of the Association for Computational Linguistics, 2018: 2227-2237.

Journal of Biomedical Engineering

Research on entity relationship extraction of Chinese medical literature and application in diabetes medical literature

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content