Deep learning-based automatic classification of diabetic retinopathy (DR) helps to enhance the accuracy and efficiency of auxiliary diagnosis. This paper presents an improved residual network model for classifying DR into five different severity levels. First, the convolution in the first layer of the residual network was replaced with three smaller convolutions to reduce the computational load of the network. Second, to address the issue of inaccurate classification due to minimal differences between different severity levels, a mixed attention mechanism was introduced to make the model focus more on the crucial features of the lesions. Finally, to better extract the morphological features of the lesions in DR images, cross-layer fusion convolutions were used instead of the conventional residual structure. To validate the effectiveness of the improved model, it was applied to the Kaggle Blindness Detection competition dataset APTOS2019. The experimental results demonstrated that the proposed model achieved a classification accuracy of 97.75% and a Kappa value of 0.971 7 for the five DR severity levels. Compared to some existing models, this approach shows significant advantages in classification accuracy and performance.
Accurate segmentation of pediatric echocardiograms is a challenging task, because significant heart-size changes with age and faster heart rate lead to more blurred boundaries on cardiac ultrasound images compared with adults. To address these problems, a dual decoder network model combining channel attention and scale attention is proposed in this paper. Firstly, an attention-guided decoder with deep supervision strategy is used to obtain attention maps for the ventricular regions. Then, the generated ventricular attention is fed back to multiple layers of the network through skip connections to adjust the feature weights generated by the encoder and highlight the left and right ventricular areas. Finally, a scale attention module and a channel attention module are utilized to enhance the edge features of the left and right ventricles. The experimental results demonstrate that the proposed method in this paper achieves an average Dice coefficient of 90.63% in acquired bilateral ventricular segmentation dataset, which is better than some conventional and state-of-the-art methods in the field of medical image segmentation. More importantly, the method has a more accurate effect in segmenting the edge of the ventricle. The results of this paper can provide a new solution for pediatric echocardiographic bilateral ventricular segmentation and subsequent auxiliary diagnosis of congenital heart disease.
The skin is the largest organ of the human body, and many visceral diseases will be directly reflected on the skin, so it is of great clinical significance to accurately segment the skin lesion images. To address the characteristics of complex color, blurred boundaries, and uneven scale information, a skin lesion image segmentation method based on dense atrous spatial pyramid pooling (DenseASPP) and attention mechanism is proposed. The method is based on the U-shaped network (U-Net). Firstly, a new encoder is redesigned to replace the ordinary convolutional stacking with a large number of residual connections, which can effectively retain key features even after expanding the network depth. Secondly, channel attention is fused with spatial attention, and residual connections are added so that the network can adaptively learn channel and spatial features of images. Finally, the DenseASPP module is introduced and redesigned to expand the perceptual field size and obtain multi-scale feature information. The algorithm proposed in this paper has obtained satisfactory results in the official public dataset of the International Skin Imaging Collaboration (ISIC 2016). The mean Intersection over Union (mIOU), sensitivity (SE), precision (PC), accuracy (ACC), and Dice coefficient (Dice) are 0.901 8, 0.945 9, 0.948 7, 0.968 1, 0.947 3, respectively. The experimental results demonstrate that the method in this paper can improve the segmentation effect of skin lesion images, and is expected to provide an auxiliary diagnosis for professional dermatologists.
Glioma is a primary brain tumor with high incidence rate. High-grade gliomas (HGG) are those with the highest degree of malignancy and the lowest degree of survival. Surgical resection and postoperative adjuvant chemoradiotherapy are often used in clinical treatment, so accurate segmentation of tumor-related areas is of great significance for the treatment of patients. In order to improve the segmentation accuracy of HGG, this paper proposes a multi-modal glioma semantic segmentation network with multi-scale feature extraction and multi-attention fusion mechanism. The main contributions are, (1) Multi-scale residual structures were used to extract features from multi-modal gliomas magnetic resonance imaging (MRI); (2) Two types of attention modules were used for features aggregating in channel and spatial; (3) In order to improve the segmentation performance of the whole network, the branch classifier was constructed using ensemble learning strategy to adjust and correct the classification results of the backbone classifier. The experimental results showed that the Dice coefficient values of the proposed segmentation method in this article were 0.909 7, 0.877 3 and 0.839 6 for whole tumor, tumor core and enhanced tumor respectively, and the segmentation results had good boundary continuity in the three-dimensional direction. Therefore, the proposed semantic segmentation network has good segmentation performance for high-grade gliomas lesions.
To address the challenges in blood cell recognition caused by diverse morphology, dense distribution, and the abundance of small target information, this paper proposes a blood cell detection algorithm - the "You Only Look Once" model based on hybrid mixing attention and deep over-parameters (HADO-YOLO). First, a hybrid attention mechanism is introduced into the backbone network to enhance the model's sensitivity to detailed features. Second, the standard convolution layers with downsampling in the neck network are replaced with deep over-parameterized convolutions to expand the receptive field and improve feature representation. Finally, the detection head is decoupled to enhance the model's robustness for detecting abnormal cells. Experimental results on the Blood Cell Counting Dataset (BCCD) demonstrate that the HADO-YOLO algorithm achieves a mean average precision of 90.2% and a precision of 93.8%, outperforming the baseline YOLO model. Compared with existing blood cell detection methods, the proposed algorithm achieves state-of-the-art detection performance. In conclusion, HADO-YOLO offers a more efficient and accurate solution for identifying various types of blood cells, providing valuable technical support for future clinical diagnostic applications.
Early screening based on computed tomography (CT) pulmonary nodule detection is an important means to reduce lung cancer mortality, and in recent years three dimensional convolutional neural network (3D CNN) has achieved success and continuous development in the field of lung nodule detection. We proposed a pulmonary nodule detection algorithm by using 3D CNN based on a multi-scale attention mechanism. Aiming at the characteristics of different sizes and shapes of lung nodules, we designed a multi-scale feature extraction module to extract the corresponding features of different scales. Through the attention module, the correlation information between the features was mined from both spatial and channel perspectives to strengthen the features. The extracted features entered into a pyramid-similar fusion mechanism, so that the features would contain both deep semantic information and shallow location information, which is more conducive to target positioning and bounding box regression. On representative LUNA16 datasets, compared with other advanced methods, this method significantly improved the detection sensitivity, which can provide theoretical reference for clinical medicine.
The brain-computer interface (BCI) based on motor imagery electroencephalography (MI-EEG) enables direct information interaction between the human brain and external devices. In this paper, a multi-scale EEG feature extraction convolutional neural network model based on time series data enhancement is proposed for decoding MI-EEG signals. First, an EEG signals augmentation method was proposed that could increase the information content of training samples without changing the length of the time series, while retaining its original features completely. Then, multiple holistic and detailed features of the EEG data were adaptively extracted by multi-scale convolution module, and the features were fused and filtered by parallel residual module and channel attention. Finally, classification results were output by a fully connected network. The application experimental results on the BCI Competition IV 2a and 2b datasets showed that the proposed model achieved an average classification accuracy of 91.87% and 87.85% for the motor imagery task, respectively, which had high accuracy and strong robustness compared with existing baseline models. The proposed model does not require complex signals pre-processing operations and has the advantage of multi-scale feature extraction, which has high practical application value.
Current epilepsy prediction methods are not effective in characterizing the multi-domain features of complex long-term electroencephalogram (EEG) data, leading to suboptimal prediction performance. Therefore, this paper proposes a novel multi-scale sparse adaptive convolutional network based on multi-head attention mechanism (MS-SACN-MM) model to effectively characterize the multi-domain features. The model first preprocesses the EEG data, constructs multiple convolutional layers to effectively avoid information overload, and uses a multi-layer perceptron and multi-head attention mechanism to focus the network on critical pre-seizure features. Then, it adopts a focal loss training strategy to alleviate class imbalance and enhance the model's robustness. Experimental results show that on the publicly created dataset (CHB-MIT) by MIT and Boston Children's Hospital, the MS-SACN-MM model achieves a maximum accuracy of 0.999 for seizure prediction 10 ~ 15 minutes in advance. This demonstrates good predictive performance and holds significant importance for early intervention and intelligent clinical management of epilepsy patients.
Colorectal polyps are important early markers of colorectal cancer, and their early detection is crucial for cancer prevention. Although existing polyp segmentation models have achieved certain results, they still face challenges such as diverse polyp morphology, blurred boundaries, and insufficient feature extraction. To address these issues, this study proposes a parallel coordinate fusion network (PCFNet), aiming to improve the accuracy and robustness of polyp segmentation. PCFNet integrates parallel convolutional modules and a coordinate attention mechanism, enabling the preservation of global feature information while precisely capturing detailed features, thereby effectively segmenting polyps with complex boundaries. Experimental results on Kvasir-SEG and CVC-ClinicDB demonstrate the outstanding performance of PCFNet across multiple metrics. Specifically, on the Kvasir-SEG dataset, PCFNet achieved an F1-score of 0.897 4 and a mean intersection over union (mIoU) of 0.835 8; on the CVC-ClinicDB dataset, it attained an F1-score of 0.939 8 and an mIoU of 0.892 3. Compared with other methods, PCFNet shows significant improvements across all performance metrics, particularly in multi-scale feature fusion and spatial information capture, demonstrating its innovativeness. The proposed method provides a more reliable AI-assisted diagnostic tool for early colorectal cancer screening.
Accurate segmentation of whole slide images is of great significance for the diagnosis of pancreatic cancer. However, developing an automatic model is challenging due to the complex content, limited samples, and high sample heterogeneity of pathological images. This paper presented a multi-tissue segmentation model for whole slide images of pancreatic cancer. We introduced an attention mechanism in building blocks, and designed a multi-task learning framework as well as proper auxiliary tasks to enhance model performance. The model was trained and tested with the pancreatic cancer pathological image dataset from Shanghai Changhai Hospital. And the data of TCGA, as an external independent validation cohort, was used for external validation. The F1 scores of the model exceeded 0.97 and 0.92 in the internal dataset and external dataset, respectively. Moreover, the generalization performance was also better than the baseline method significantly. These results demonstrate that the proposed model can accurately segment eight kinds of tissue regions in whole slide images of pancreatic cancer, which can provide reliable basis for clinical diagnosis.