Medical visual question answering (MVQA) plays a crucial role in the fields of computer-aided diagnosis and telemedicine. Due to the limited size and uneven annotation quality of the MVQA datasets, most existing methods rely on additional datasets for pre-training and use discriminant formulas to predict answers from a predefined set of labels. This approach makes the model prone to overfitting in low resource domains. To cope with the above problems, we propose an image-aware generative MVQA method based on image caption prompts. Firstly, we combine a dual visual feature extractor with a progressive bilinear attention interaction module to extract multi-level image features. Secondly, we propose an image caption prompt method to guide the model to better understand the image information. Finally, the image-aware generative model is used to generate answers. Experimental results show that our proposed method outperforms existing models on the MVQA task, realizing efficient visual feature extraction, as well as flexible and accurate answer outputs with small computational costs in low-resource domains. It is of great significance for achieving personalized precision medicine, reducing medical burden, and improving medical diagnosis efficiency.
Early diagnosis and treatment of colorectal polyps are crucial for preventing colorectal cancer. This paper proposes a lightweight convolutional neural network for the automatic detection and auxiliary diagnosis of colorectal polyps. Initially, a 53-layer convolutional backbone network is used, incorporating a spatial pyramid pooling module to achieve feature extraction with different receptive field sizes. Subsequently, a feature pyramid network is employed to perform cross-scale fusion of feature maps from the backbone network. A spatial attention module is utilized to enhance the perception of polyp image boundaries and details. Further, a positional pattern attention module is used to automatically mine and integrate key features across different levels of feature maps, achieving rapid, efficient, and accurate automatic detection of colorectal polyps. The proposed model is evaluated on a clinical dataset, achieving an accuracy of 0.9982, recall of 0.9988, F1 score of 0.9984, and mean average precision (mAP) of 0.9953 at an intersection over union (IOU) threshold of 0.5, with a frame rate of 74 frames per second and a parameter count of 9.08 M. Compared to existing mainstream methods, the proposed method is lightweight, has low operating configuration requirements, high detection speed, and high accuracy, making it a feasible technical method and important tool for the early detection and diagnosis of colorectal cancer.
ObjectivesTo develop a fundus photography (FP) image lesion recognition model based on the EfficientNet lightweight convolutional neural network architecture, and to preliminary evaluate its recognition performance. MethodsA diagnostic test. The data was collected in the Department of Ophthalmology at Sichuan Provincial People's Hospital from June 2023 to June 2025. A lightweight 16-category lesion recognition model was constructed based on deep learning and 610 072 FP images. The FP images were sourced from Sichuan Provincial People's Hospital as well as the APTOS, Diabetic Retinopathy_2015, Diabetic Retinopathy_2019, and Retinal Disease datasets. Model performance was evaluated as follows: first, testing was performed on four independent external validation sets using metrics such as accuracy, F1 score (the harmonic mean of precision and recall), and the area under the receiver operating characteristic curve (AUC) to measure the model's generalizability and accuracy. Second, the classification results of the model were compared with those of junior and mid-level ophthalmologists (two each) using the overlapping confidence interval (CI) comparison method to assess the clinical experience level corresponding to the model's medical proficiency. ResultsThe model achieved an accuracy of 96.78% (59 039/61 003), an F1 score of 82.51% (50 334/61 003), and an AUC of 99.93% (60 960/61 003) on the validation set. On the four external validation sets, it achieved an average accuracy of 87.77% (57 358/65 350), an average precision of 87.06% (56 894/65 350), and an average Kappa value of 82.28%. The average accuracy of FP image lesion identification for junior and mid-level ophthalmologists was 79.00% (79/100) (95%CI 67.71-90.29) and 87.00% (87/100) (95%CI 77.68-96.32), respectively. ConclusionsA 16-category FP image lesion recognition model is successfully constructed based on the EfficientNet lightweight convolutional neural network architecture. Its clinical performance preliminarily reaches the level of mid-level ophthalmologists.