Objective To evaluate the accuracy of three large language models (LLMs), ChatGPT, Grok, and DeepSeek, in predicting the natural outcome of pediatric ventricular septal defect (VSD) and their discrepancies with actual clinical outcomes, providing insights into whether LLMs can assist clinicians in providing personalized management recommendations. MethodsA retrospective analysis of clinical data from pediatric patients with VSD admitted to Children's Hospital of Nanjing Medical University between October and December 2020. The VSD severity, spontaneous closure probability and surgical necessity were evaluated by ChatGPT, Grok, DeepSeek, and the expert panel, respectively. Intergroup differences were analyzed and also compared with the actual outcomes. The stability of model performance was compared based on three repeated assessments by LLMs. Results A total of 146 children were enrolled, including 87 (59.6%) males and 59 (40.4%) females, with a median age at first diagnosis of 2.0 months (IQR: 1.1-3.4). Significant differences were observed between the Grok group and the expert panel in assessing the probability of spontaneous closure and the necessity of surgery (P=0.01, 0.02). The ChatGPT group also differed from the expert panel in evaluating the necessity of surgery (P=0.05). In comparison with the actual clinical outcomes, only the Grok group showed a significant difference (P<0.05), while ChatGPT achieved the highest consistency between predicted outcomes and actual outcomes. Intra-group analysis of three repeated assessments in the LLMs groups showed no statistically significant differences (all P>0.05). Conclusion LLMs demonstrate potential and high stability in predicting the natural outcome of VSD. In particular, ChatGPT shows the highest consistency between its assessments and actual outcomes. LLMs can serve as an auxiliary tool to support the formulation of personalized management strategy.
Objective To investigate the application effect of a virtual simulation teaching system based on DeepSeek technology in the clinical training of neonatal resuscitation. Methods A total of 48 clinical medicine (“5+3” integrated program) students from the 2020 cohort, interning in the Department of Neonatology of the First Affiliated Hospital of Harbin Medical University between January and June 2025, were selected and randomly divided into a trial group (n=24) and a control group (n=24). The trial group received training using a virtual simulation teaching system integrated with DeepSeek technology, featuring dynamic physiological response, natural language interaction, and hierarchical intelligent feedback. The control group received traditional virtual simulation teaching. After 6 hours of teaching, outcomes were evaluated through theoretical assessments, objective structured clinical examination (OSCE), key decision-making accuracy, and teaching satisfaction questionnaires. Results The theoretical score (93.5±3.3 vs. 84.7±4.9), OSCE score (95.3±2.6 vs. 86.1±4.3), and key decision-making accuracy [(91.6±3.7)% vs. (77.3±6.4)%] of the trial group were significantly higher than those of the control group (P<0.001). The trial group outperformed the control group in all OSCE subdomain scores (rapid assessment and initial resuscitation: 24.1±1.0 vs. 21.3±2.2; positive pressure ventilation: 23.8±1.2 vs. 20.1±2.1; chest compressions: 18.9±1.1 vs. 16.2±1.8; drug therapy: 14.3±0.9 vs. 12.0±1.5; teamwork and communication: 14.2±0.8 vs. 11.5±1.6) and in the accuracy rates for all key decision points [whether to initiate initial resuscitation: (93.8±4.5)% vs. (82.1±8.7)%; whether to initiate positive pressure ventilation: (92.9±5.3)% vs. (79.6±10.2)%; whether to correct ventilation: (90.4±6.1)% vs. (75.0±11.5)%; whether to initiate chest compressions (94.2±5.1)% vs. (70.8±12.3)%; whether to use epinephrine: (92.5±6.2)% vs. (68.3±14.1)%], with all differences being statistically significant (P<0.001). The teaching satisfaction survey showed that the satisfaction rates in the trial group for the dimensions of immersion (95.8% vs. 54.2%), knowledge understanding (91.7% vs. 58.3%), skill improvement (91.7% vs. 62.5%), decision-making logic (95.8% vs. 50.0%), feedback effectiveness (95.8% vs. 41.7%), and learning confidence (91.7% vs. 45.8%) were all significantly higher than those in the control group (P<0.05). Conclusion The DeepSeek-based virtual simulation teaching system can effectively enhance the quality of neonatal resuscitation training, significantly improving students’ clinical decision-making abilities, operational skills, and teamwork competence, demonstrating good application prospects in medical education.