Objective To propose a heart sound segmentation method based on multi-feature fusion network. Methods Data were obtained from the CinC/PhysioNet 2016 Challenge dataset (a total of 3 153 recordings from 764 patients, about 91.93% of whom were male, with an average age of 30.36 years). Firstly the features were extracted in time domain and time-frequency domain respectively, and reduced redundant features by feature dimensionality reduction. Then, we selected optimal features separately from the two feature spaces that performed best through feature selection. Next, the multi-feature fusion was completed through multi-scale dilated convolution, cooperative fusion, and channel attention mechanism. Finally, the fused features were fed into a bidirectional gated recurrent unit (BiGRU) network to heart sound segmentation results. Results The proposed method achieved precision, recall and F1 score of 96.70%, 96.99%, and 96.84% respectively. Conclusion The multi-feature fusion network proposed in this study has better heart sound segmentation performance, which can provide high-accuracy heart sound segmentation technology support for the design of automatic analysis of heart diseases based on heart sounds.
Speech feature learning is the core and key of speech recognition method for mental illness. Deep feature learning can automatically extract speech features, but it is limited by the problem of small samples. Traditional feature extraction (original features) can avoid the impact of small samples, but it relies heavily on experience and is poorly adaptive. To solve this problem, this paper proposes a deep embedded hybrid feature sparse stack autoencoder manifold ensemble algorithm. Firstly, based on the prior knowledge, the psychotic speech features are extracted, and the original features are constructed. Secondly, the original features are embedded in the sparse stack autoencoder (deep network), and the output of the hidden layer is filtered to enhance the complementarity between the deep features and the original features. Third, the L1 regularization feature selection mechanism is designed to compress the dimensions of the mixed feature set composed of deep features and original features. Finally, a weighted local preserving projection algorithm and an ensemble learning mechanism are designed, and a manifold projection classifier ensemble model is constructed, which further improves the classification stability of feature fusion under small samples. In addition, this paper designs a medium-to-large-scale psychotic speech collection program for the first time, collects and constructs a large-scale Chinese psychotic speech database for the verification of psychotic speech recognition algorithms. The experimental results show that the main innovation of the algorithm is effective, and the classification accuracy is better than other representative algorithms, and the maximum improvement is 3.3%. In conclusion, this paper proposes a new method of psychotic speech recognition based on embedded mixed sparse stack autoencoder and manifold ensemble, which effectively improves the recognition rate of psychotic speech.
Current studies on electroencephalogram (EEG) emotion recognition primarily concentrate on discrete stimulus paradigms under controlled laboratory settings, which cannot adequately represent the dynamic transition characteristics of emotional states during multi-context interactions. To address this issue, this paper proposes a novel method for emotion transition recognition that leverages a cross-modal feature fusion and global perception network (CFGPN). Firstly, an experimental paradigm encompassing six types of emotion transition scenarios was designed, and EEG and eye movement data were simultaneously collected from 20 participants, each annotated with dynamic continuous emotion labels. Subsequently, deep canonical correlation analysis integrated with a cross-modal attention mechanism was employed to fuse features from EEG and eye movement signals, resulting in multimodal feature vectors enriched with highly discriminative emotional information. These vectors are then input into a parallel hybrid architecture that combines convolutional neural networks (CNNs) and Transformers. The CNN is employed to capture local time-series features, whereas the Transformer leverages its robust global perception capabilities to effectively model long-range temporal dependencies, enabling accurate dynamic emotion transition recognition. The results demonstrate that the proposed method achieves the lowest mean square error in both valence and arousal recognition tasks on the dynamic emotion transition dataset and a classic multimodal emotion dataset. It exhibits superior recognition accuracy and stability when compared with five existing unimodal and six multimodal deep learning models. The approach enhances both adaptability and robustness in recognizing emotional state transitions in real-world scenarios, showing promising potential for applications in the field of biomedical engineering.
The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.
Colorectal polyps are important early markers of colorectal cancer, and their early detection is crucial for cancer prevention. Although existing polyp segmentation models have achieved certain results, they still face challenges such as diverse polyp morphology, blurred boundaries, and insufficient feature extraction. To address these issues, this study proposes a parallel coordinate fusion network (PCFNet), aiming to improve the accuracy and robustness of polyp segmentation. PCFNet integrates parallel convolutional modules and a coordinate attention mechanism, enabling the preservation of global feature information while precisely capturing detailed features, thereby effectively segmenting polyps with complex boundaries. Experimental results on Kvasir-SEG and CVC-ClinicDB demonstrate the outstanding performance of PCFNet across multiple metrics. Specifically, on the Kvasir-SEG dataset, PCFNet achieved an F1-score of 0.897 4 and a mean intersection over union (mIoU) of 0.835 8; on the CVC-ClinicDB dataset, it attained an F1-score of 0.939 8 and an mIoU of 0.892 3. Compared with other methods, PCFNet shows significant improvements across all performance metrics, particularly in multi-scale feature fusion and spatial information capture, demonstrating its innovativeness. The proposed method provides a more reliable AI-assisted diagnostic tool for early colorectal cancer screening.
Lung nodules are the main manifestation of early lung cancer. So accurate detection of lung nodules is of great significance for early diagnosis and treatment of lung cancer. However, the rapid and accurate detection of pulmonary nodules is a challenging task due to the complex background, large detection range of pulmonary computed tomography (CT) images and the different sizes and shapes of pulmonary nodules. Therefore, this paper proposes a multi-scale feature fusion algorithm for the automatic detection of pulmonary nodules to achieve accurate detection of pulmonary nodules. Firstly, a three-layer modular lung nodule detection model was designed on the deep convolutional network (VGG16) for large-scale image recognition. The first-tier module of the network is used to extract the features of pulmonary nodules in CT images and roughly estimate the location of pulmonary nodules. Then the second-tier module of the network is used to fuse multi-scale image features to further enhance the details of pulmonary nodules. The third-tier module of the network was fused to analyze the features of the first-tier and the second-tier module of the network, and the candidate box of pulmonary nodules in multi-scale was obtained. Finally, the candidate box of pulmonary nodules under multi-scale was analyzed with the method of non-maximum suppression, and the final location of pulmonary nodules was obtained. The algorithm is validated by the data of pulmonary nodules on LIDC-IDRI common data set. The average detection accuracy is 90.9%.
In the process of lower limb rehabilitation training, fatigue estimation is of great significance to improve the accuracy of intention recognition and avoid secondary injury. However, most of the existing methods only consider surface electromyography (sEMG) features but ignore electrocardiogram (ECG) features when performing in fatigue estimation, which leads to the low and unstable recognition efficiency. Aiming at this problem, a method that uses the fusion features of ECG and sEMG signal to estimate the fatigue during lower limb rehabilitation was proposed, and an improved particle swarm optimization-support vector machine classifier (improved PSO-SVM) was proposed and used to identify the fusion feature vector. Finally, the accurate recognition of the three states of relax, transition and fatigue was achieved, and the recognition rates were 98.5%, 93.5%, and 95.5%, respectively. Comparative experiments showed that the average recognition rate of this method was 4.50% higher than that of sEMG features alone, and 13.66% higher than that of the combined features of ECG and sEMG without feature fusion. It is proved that the feature fusion of ECG and sEMG signals in the process of lower limb rehabilitation training can be used for recognizing fatigue more accurately.
The effective classification of multi-task motor imagery electroencephalogram (EEG) is helpful to achieve accurate multi-dimensional human-computer interaction, and the high frequency domain specificity between subjects can improve the classification accuracy and robustness. Therefore, this paper proposed a multi-task EEG signal classification method based on adaptive time-frequency common spatial pattern (CSP) combined with convolutional neural network (CNN). The characteristics of subjects' personalized rhythm were extracted by adaptive spectrum awareness, and the spatial characteristics were calculated by using the one-versus-rest CSP, and then the composite time-domain characteristics were characterized to construct the spatial-temporal frequency multi-level fusion features. Finally, the CNN was used to perform high-precision and high-robust four-task classification. The algorithm in this paper was verified by the self-test dataset containing 10 subjects (33 ± 3 years old, inexperienced) and the dataset of the 4th 2018 Brain-Computer Interface Competition (BCI competition Ⅳ-2a). The average accuracy of the proposed algorithm for the four-task classification reached 93.96% and 84.04%, respectively. Compared with other advanced algorithms, the average classification accuracy of the proposed algorithm was significantly improved, and the accuracy range error between subjects was significantly reduced in the public dataset. The results show that the proposed algorithm has good performance in multi-task classification, and can effectively improve the classification accuracy and robustness.
The PET/CT imaging technology combining positron emission tomography (PET) and computed tomography (CT) is the most advanced imaging examination method currently, and is mainly used for tumor screening, differential diagnosis of benign and malignant tumors, staging and grading. This paper proposes a method for breast cancer lesion segmentation based on PET/CT bimodal images, and designs a dual-path U-Net framework, which mainly includes three modules: encoder module, feature fusion module and decoder module. Among them, the encoder module uses traditional convolution for feature extraction of single mode image; The feature fusion module adopts collaborative learning feature fusion technology and uses Transformer to extract the global features of the fusion image; The decoder module mainly uses multi-layer perceptron to achieve lesion segmentation. This experiment uses actual clinical PET/CT data to evaluate the effectiveness of the algorithm. The experimental results show that the accuracy, recall and accuracy of breast cancer lesion segmentation are 95.67%, 97.58% and 96.16%, respectively, which are better than the baseline algorithm. Therefore, it proves the rationality of the single and bimodal feature extraction method combining convolution and Transformer in the experimental design of this article, and provides reference for feature extraction methods for tasks such as multimodal medical image segmentation or classification.
Diabetic retinopathy (DR) and its complication, diabetic macular edema (DME), are major causes of visual impairment and even blindness. The occurrence of DR and DME is pathologically interconnected, and their clinical diagnoses are closely related. Joint learning can help improve the accuracy of diagnosis. This paper proposed a novel adaptive lesion-aware fusion network (ALFNet) to facilitate the joint grading of DR and DME. ALFNet employed DenseNet-121 as the backbone and incorporated an adaptive lesion attention module (ALAM) to capture the distinct lesion characteristics of DR and DME. A deep feature fusion module (DFFM) with a shared-parameter local attention mechanism was designed to learn the correlation between the two diseases. Furthermore, a four-branch composite loss function was introduced to enhance the network’s multi-task learning capability. Experimental results demonstrated that ALFNet achieved superior joint grading performance on the Messidor dataset, with joint accuracy rates of 0.868 (DR 2 & DME 3), outperforming state-of-the-art methods. These results highlight the unique advantages of the proposed approach in the joint grading of DR and DME, thereby improving the efficiency and accuracy of clinical decision-making.