Fatigue driving is one of the leading causes of traffic accidents, posing a significant threat to drivers and road safety. Most existing methods focus on studying whole-brain multi-channel electroencephalogram (EEG) signals, which involve a large number of channels, complex data processing, and cumbersome wearable devices. To address this issue, this paper proposes a fatigue detection method based on frontal EEG signals and constructs a fatigue driving detection model using an asymptotic hierarchical fusion network. The model employed a hierarchical fusion strategy, integrating an attention mechanism module into the multi-level convolutional module. By utilizing both cross-attention and self-attention mechanisms, it effectively fused the hierarchical semantic features of power spectral density (PSD) and differential entropy (DE), enhancing the learning of feature dependencies and interactions. Experimental validation was conducted on the public SEED-VIG dataset. The proposed model achieved an accuracy of 89.80% using only four frontal EEG channels. Comparative experiments with existing methods demonstrate that the proposed model achieves high accuracy and superior practicality, providing valuable technical support for fatigue driving monitoring and prevention.
The diagnosis of hypertrophic cardiomyopathy (HCM) is of great significance for the early risk classification of sudden cardiac death and the screening of family genetic diseases. This research proposed a HCM automatic detection method based on convolution neural network (CNN) model, using single-lead electrocardiogram (ECG) signal as the research object. Firstly, the R-wave peak locations of single-lead ECG signal were determined, followed by the ECG signal segmentation and resample in units of heart beats, then a CNN model was built to automatically extract the deep features in the ECG signal and perform automatic classification and HCM detection. The experimental data is derived from 108 ECG records extracted from three public databases provided by PhysioNet, the database established in this research consists of 14,459 heartbeats, and each heartbeat contains 128 sampling points. The results revealed that the optimized CNN model could effectively detect HCM, the accuracy, sensitivity and specificity were 95.98%, 98.03% and 95.79% respectively. In this research, the deep learning method was introduced for the analysis of single-lead ECG of HCM patients, which could not only overcome the technical limitations of conventional detection methods based on multi-lead ECG, but also has important application value for assisting doctor in fast and convenient large-scale HCM preliminary screening.
Manual segmentation of coronary arteries in computed tomography angiography (CTA) images is inefficient, and existing deep learning segmentation models often exhibit low accuracy on coronary artery images. Inspired by the Transformer architecture, this paper proposes a novel segmentation model, the double parallel encoder u-net with transformers (DUNETR). This network employed a dual-encoder design integrating Transformers and convolutional neural networks (CNNs). The Transformer encoder transformed three-dimensional (3D) coronary artery data into a one-dimensional (1D) sequential problem, effectively capturing global multi-scale feature information. Meanwhile, the CNN encoder extracted local features of the 3D coronary arteries. The complementary features extracted by the two encoders were fused through the noise reduction feature fusion (NRFF) module and passed to the decoder. Experimental results on a public dataset demonstrated that the proposed DUNETR model achieved a Dice similarity coefficient of 81.19% and a recall rate of 80.18%, representing improvements of 0.49% and 0.46%, respectively, over the next best model in comparative experiments. These results surpassed those of other conventional deep learning methods. The integration of Transformers and CNNs as dual encoders enables the extraction of rich feature information, significantly enhancing the effectiveness of 3D coronary artery segmentation. Additionally, this model provides a novel approach for segmenting other vascular structures.
Pneumoconiosis ranks first among the newly-emerged occupational diseases reported annually in China, and imaging diagnosis is still one of the main clinical diagnostic methods. However, manual reading of films requires high level of doctors, and it is difficult to discriminate the staged diagnosis of pneumoconiosis imaging, and due to the influence of uneven distribution of medical resources and other factors, it is easy to lead to misdiagnosis and omission of diagnosis in primary healthcare institutions. Computer-aided diagnosis system can realize rapid screening of pneumoconiosis in order to assist clinicians in identification and diagnosis, and improve diagnostic efficacy. As an important branch of deep learning, convolutional neural network (CNN) is good at dealing with various visual tasks such as image segmentation, image classification, target detection and so on because of its characteristics of local association and weight sharing, and has been widely used in the field of computer-aided diagnosis of pneumoconiosis in recent years. This paper was categorized into three parts according to the main applications of CNNs (VGG, U-Net, ResNet, DenseNet, CheXNet, Inception-V3, and ShuffleNet) in the imaging diagnosis of pneumoconiosis, including CNNs in pneumoconiosis screening diagnosis, CNNs in staging diagnosis of pneumoconiosis, and CNNs in segmentation of pneumoconiosis foci to conduct a literature review. It aims to summarize the methods, advantages and disadvantages, and optimization ideas of CNN applied to the images of pneumoconiosis, and to provide a reference for the research direction of further development of computer-aided diagnosis of pneumoconiosis.
The processing mechanism of the human brain for speech information is a significant source of inspiration for the study of speech enhancement technology. Attention and lateral inhibition are key mechanisms in auditory information processing that can selectively enhance specific information. Building on this, the study introduces a dual-branch U-Net that integrates lateral inhibition and feedback-driven attention mechanisms. Noisy speech signals input into the first branch of the U-Net led to the selective feedback of time-frequency units with high confidence. The generated activation layer gradients, in conjunction with the lateral inhibition mechanism, were utilized to calculate attention maps. These maps were then concatenated to the second branch of the U-Net, directing the network’s focus and achieving selective enhancement of auditory speech signals. The evaluation of the speech enhancement effect was conducted by utilising five metrics, including perceptual evaluation of speech quality. This method was compared horizontally with five other methods: Wiener, SEGAN, PHASEN, Demucs and GRN. The experimental results demonstrated that the proposed method improved speech signal enhancement capabilities in various noise scenarios by 18% to 21% compared to the baseline network across multiple performance metrics. This improvement was particularly notable in low signal-to-noise ratio conditions, where the proposed method exhibited a significant performance advantage over other methods. The speech enhancement technique based on lateral inhibition and feedback-driven attention mechanisms holds significant potential in auditory speech enhancement, making it suitable for clinical practices related to artificial cochleae and hearing aids.
Glioma is a primary brain tumor with high incidence rate. High-grade gliomas (HGG) are those with the highest degree of malignancy and the lowest degree of survival. Surgical resection and postoperative adjuvant chemoradiotherapy are often used in clinical treatment, so accurate segmentation of tumor-related areas is of great significance for the treatment of patients. In order to improve the segmentation accuracy of HGG, this paper proposes a multi-modal glioma semantic segmentation network with multi-scale feature extraction and multi-attention fusion mechanism. The main contributions are, (1) Multi-scale residual structures were used to extract features from multi-modal gliomas magnetic resonance imaging (MRI); (2) Two types of attention modules were used for features aggregating in channel and spatial; (3) In order to improve the segmentation performance of the whole network, the branch classifier was constructed using ensemble learning strategy to adjust and correct the classification results of the backbone classifier. The experimental results showed that the Dice coefficient values of the proposed segmentation method in this article were 0.909 7, 0.877 3 and 0.839 6 for whole tumor, tumor core and enhanced tumor respectively, and the segmentation results had good boundary continuity in the three-dimensional direction. Therefore, the proposed semantic segmentation network has good segmentation performance for high-grade gliomas lesions.
Most current medical image segmentation models are primarily built upon the U-shaped network (U-Net) architecture, which has certain limitations in capturing both global contextual information and fine-grained details. To address this issue, this paper proposes a novel U-shaped network model, termed the Multi-View U-Net (MUNet), which integrates self-attention and multi-view attention mechanisms. Specifically, a newly designed multi-view attention module is introduced to aggregate semantic features from different perspectives, thereby enhancing the representation of fine details in images. Additionally, the MUNet model leverages a self-attention encoding block to extract global image features, and by fusing global and local features, it improves segmentation performance. Experimental results demonstrate that the proposed model achieves superior segmentation performance in coronary artery image segmentation tasks, significantly outperforming existing models. By incorporating self-attention and multi-view attention mechanisms, this study provides a novel and efficient modeling approach for medical image segmentation, contributing to the advancement of intelligent medical image analysis.
High resolution (HR) magnetic resonance images (MRI) or computed tomography (CT) images can provide clearer anatomical details of human body, which facilitates early diagnosis of the diseases. However, due to the imaging system, imaging environment and human factors, it is difficult to obtain clear high-resolution images. In this paper, we proposed a novel medical image super resolution (SR) reconstruction method via multi-scale information distillation (MSID) network in the non-subsampled shearlet transform (NSST) domain, namely NSST-MSID network. We first proposed a MSID network that mainly consisted of a series of stacked MSID blocks to fully exploit features from images and effectively restore the low resolution (LR) images to HR images. In addition, most previous methods predict the HR images in the spatial domain, producing over-smoothed outputs while losing texture details. Thus, we viewed the medical image SR task as the prediction of NSST coefficients, which make further MSID network keep richer structure details than that in spatial domain. Finally, the experimental results on our constructed medical image datasets demonstrated that the proposed method was capable of obtaining better peak signal to noise ratio (PSNR), structural similarity (SSIM) and root mean square error (RMSE) values and keeping global topological structure and local texture detail better than other outstanding methods, which achieves good medical image reconstruction effect.
Three-dimensional (3D) deformable image registration plays a critical role in 3D medical image processing. This technique aligns images from different time points, modalities, or individuals in 3D space, enabling the comparison and fusion of anatomical or functional information. To simultaneously capture the local details of anatomical structures and the long-range dependencies in 3D medical images, while reducing the high costs of manual annotations, this paper proposes an unsupervised 3D medical image registration method based on shifted window Transformer and convolutional neural network (CNN), termed Swin Transformer-CNN-hybrid network (STCHnet). In the encoder part, STCHnet uses Swin Transformer and CNN to extract global and local features from 3D images, respectively, and optimizes feature representation through feature fusion. In the decoder part, STCHnet utilizes Swin Transformer to integrate information globally, and CNN to refine local details, reducing the complexity of the deformation field while maintaining registration accuracy. Experiments on the information extraction from images (IXI) and open access series of imaging studies (OASIS) datasets, along with qualitative and quantitative comparisons with existing registration methods, demonstrate that the proposed STCHnet outperforms baseline methods in terms of Dice similarity coefficient (DSC) and standard deviation of the log-Jacobian determinant (SDlogJ), achieving improved 3D medical image registration performance under unsupervised conditions.
This study aims to optimize surface electromyography-based gesture recognition technique, focusing on the impact of muscle fatigue on the recognition performance. An innovative real-time analysis algorithm is proposed in the paper, which can extract muscle fatigue features in real time and fuse them into the hand gesture recognition process. Based on self-collected data, this paper applies algorithms such as convolutional neural networks and long short-term memory networks to provide an in-depth analysis of the feature extraction method of muscle fatigue, and compares the impact of muscle fatigue features on the performance of surface electromyography-based gesture recognition tasks. The results show that by fusing the muscle fatigue features in real time, the algorithm proposed in this paper improves the accuracy of hand gesture recognition at different fatigue levels, and the average recognition accuracy for different subjects is also improved. In summary, the algorithm in this paper not only improves the adaptability and robustness of the hand gesture recognition system, but its research process can also provide new insights into the development of gesture recognition technology in the field of biomedical engineering.