Colorectal cancer typically originates from the malignant transformation of colonic polyps, making the automatic and accurate segmentation of colonic polyps crucial for clinical diagnosis. Deep learning techniques such as U-Net and Transformer can effectively extract implicit features from medical images, and thus have significant potential in colonic polyp image segmentation. This paper first introduced commonly used evaluation metrics and datasets for colonic polyp segmentation. It then reviewed the application of segmentation models based on U-Net, Transformer, and their hybrid approaches in this domain. Finally, it summarized the improvement methods, advantages, and limitations of polyp segmentation algorithms, discussed the challenges faced by U-Net- and Transformer-based models, and provided an outlook on future research directions in this field.
Manual segmentation of coronary arteries in computed tomography angiography (CTA) images is inefficient, and existing deep learning segmentation models often exhibit low accuracy on coronary artery images. Inspired by the Transformer architecture, this paper proposes a novel segmentation model, the double parallel encoder u-net with transformers (DUNETR). This network employed a dual-encoder design integrating Transformers and convolutional neural networks (CNNs). The Transformer encoder transformed three-dimensional (3D) coronary artery data into a one-dimensional (1D) sequential problem, effectively capturing global multi-scale feature information. Meanwhile, the CNN encoder extracted local features of the 3D coronary arteries. The complementary features extracted by the two encoders were fused through the noise reduction feature fusion (NRFF) module and passed to the decoder. Experimental results on a public dataset demonstrated that the proposed DUNETR model achieved a Dice similarity coefficient of 81.19% and a recall rate of 80.18%, representing improvements of 0.49% and 0.46%, respectively, over the next best model in comparative experiments. These results surpassed those of other conventional deep learning methods. The integration of Transformers and CNNs as dual encoders enables the extraction of rich feature information, significantly enhancing the effectiveness of 3D coronary artery segmentation. Additionally, this model provides a novel approach for segmenting other vascular structures.
Accurate detection of cephalometric landmarks is crucial for orthodontic diagnosis and treatment planning. Current landmark detection methods are mainly divided into heatmap-based and regression-based approaches. However, these methods often rely on parallel computation of multiple models to improve accuracy, significantly increasing the complexity of training and deployment. This paper presented a novel regression method that can simultaneously detect all cephalometric landmarks in high-resolution X-ray images. By leveraging the encoder module of Transformer, a dual-encoder model was designed to achieve coarse-to-fine localization of cephalometric landmarks. The entire model consisted of three main components: a feature extraction module, a reference encoder module, and a fine-tuning encoder module, responsible for feature extraction and fusion of X-ray images, coarse localization of cephalometric landmarks, and fine localization of landmarks, respectively. The model was fully end-to-end differentiable and could learn the intercorrelation relationships between cephalometric landmarks. Experimental results showed that the successful detection rate (SDR) of our algorithm was superior to other existing methods. It attained the highest 2 mm SDR of 89.51% on test set 1 of the ISBI2015 dataset and 90.68% on the test set of the ISBI2023 dataset. Meanwhile, it reduces memory consumption and enhances the model’s popularity and applicability, providing more reliable technical support for orthodontic diagnosis and treatment plan formulation.
Protein lysine β-hydroxybutyrylation (Kbhb) is a newly discovered post-translational modification associated with a wide range of biological processes. Identifying Kbhb sites is critical for a better understanding of its mechanism of action. However, biochemical experimental methods for probing Kbhb sites are costly and have a long cycle. Therefore, a feature embedding learning method based on the Transformer encoder was proposed to predict Kbhb sites. In this method, amino acid residues were mapped into numerical vectors according to their amino acid class and position in a learnable feature embedding method. Then the Transformer encoder was used to extract discriminating features, and the bidirectional long short-term memory network (BiLSTM) was used to capture the correlation between different features. In this paper, a benchmark dataset was constructed, and a Kbhb site predictor, AutoTF-Kbhb, was implemented based on the proposed method. Experimental results showed that the proposed feature embedding learning method could extract effective features. AutoTF-Kbhb achieved an area under curve (AUC) of 0.87 and a Matthews correlation coefficient (MCC) of 0.37 on the independent test set, significantly outperforming other methods in comparison. Therefore, AutoTF-Kbhb can be used as an auxiliary means to identify Kbhb sites.
In order to address the issues of spatial induction bias and lack of effective representation of global contextual information in colon polyp image segmentation, which lead to the loss of edge details and mis-segmentation of lesion areas, a colon polyp segmentation method that combines Transformer and cross-level phase-awareness is proposed. The method started from the perspective of global feature transformation, and used a hierarchical Transformer encoder to extract semantic information and spatial details of lesion areas layer by layer. Secondly, a phase-aware fusion module (PAFM) was designed to capture cross-level interaction information and effectively aggregate multi-scale contextual information. Thirdly, a position oriented functional module (POF) was designed to effectively integrate global and local feature information, fill in semantic gaps, and suppress background noise. Fourthly, a residual axis reverse attention module (RA-IA) was used to improve the network’s ability to recognize edge pixels. The proposed method was experimentally tested on public datasets CVC-ClinicDB, Kvasir, CVC-ColonDB, and EITS, with Dice similarity coefficients of 94.04%, 92.04%, 80.78%, and 76.80%, respectively, and mean intersection over union of 89.31%, 86.81%, 73.55%, and 69.10%, respectively. The simulation experimental results show that the proposed method can effectively segment colon polyp images, providing a new window for the diagnosis of colon polyps.
The synergistic effect of drug combinations can solve the problem of acquired resistance to single drug therapy and has great potential for the treatment of complex diseases such as cancer. In this study, to explore the impact of interactions between different drug molecules on the effect of anticancer drugs, we proposed a Transformer-based deep learning prediction model—SMILESynergy. First, the drug text data—simplified molecular input line entry system (SMILES) were used to represent the drug molecules, and drug molecule isomers were generated through SMILES Enumeration for data augmentation. Then, the attention mechanism in the Transformer was used to encode and decode the drug molecules after data augmentation, and finally, a multi-layer perceptron (MLP) was connected to obtain the synergy value of the drugs. Experimental results showed that our model had a mean squared error of 51.34 in regression analysis, an accuracy of 0.97 in classification analysis, and better predictive performance than the DeepSynergy and MulinputSynergy models. SMILESynergy offers improved predictive performance to assist researchers in rapidly screening optimal drug combinations to improve cancer treatment outcomes.
Medical cross-modal retrieval aims to achieve semantic similarity search between different modalities of medical cases, such as quickly locating relevant ultrasound images through ultrasound reports, or using ultrasound images to retrieve matching reports. However, existing medical cross-modal hash retrieval methods face significant challenges, including semantic and visual differences between modalities and the scalability issues of hash algorithms in handling large-scale data. To address these challenges, this paper proposes a Medical image Semantic Alignment Cross-modal Hashing based on Transformer (MSACH). The algorithm employed a segmented training strategy, combining modality feature extraction and hash function learning, effectively extracting low-dimensional features containing important semantic information. A Transformer encoder was used for cross-modal semantic learning. By introducing manifold similarity constraints, balance constraints, and a linear classification network constraint, the algorithm enhanced the discriminability of the hash codes. Experimental results demonstrated that the MSACH algorithm improved the mean average precision (MAP) by 11.8% and 12.8% on two datasets compared to traditional methods. The algorithm exhibits outstanding performance in enhancing retrieval accuracy and handling large-scale medical data, showing promising potential for practical applications.
Leukemia is a common, multiple and dangerous blood disease, whose early diagnosis and treatment are very important. At present, the diagnosis of leukemia heavily relies on morphological examination of blood cell images by pathologists, which is tedious and time-consuming. Meanwhile, the diagnostic results are highly subjective, which may lead to misdiagnosis and missed diagnosis. To address the gap above, we proposed an improved Vision Transformer model for blood cell recognition. First, a faster R-CNN network was used to locate and extract individual blood cell slices from original images. Then, we split the single-cell image into multiple image patches and put them into the encoder layer for feature extraction. Based on the self-attention mechanism of the Transformer, we proposed a sparse attention module which could focus on the discriminative parts of blood cell images and improve the fine-grained feature representation ability of the model. Finally, a contrastive loss function was adopted to further increase the inter-class difference and intra-class consistency of the extracted features. Experimental results showed that the proposed module outperformed the other approaches and significantly improved the accuracy to 91.96% on the Munich single-cell morphological dataset of leukocytes, which is expected to provide a reference for physicians’ clinical diagnosis.
Objective To observe the differences in protein contents of three transforming growth factorbeta(TGF-β) isoforms, β1, β2, β3 andtheir receptor(I) in hypertrophic scar and normal skin and to explore their influence on scar formation. Methods Eight cases of hypertrophic scar and their corresponding normal skin were detected to compare the expression and distribution of TGF-β1, β2, β3 and receptor(I) with immunohistochemistry and common pathological methods. Results Positive signals of TGF-β1, β2, and β3 could all be deteted in normal skin, mainly in the cytoplasm and extracellular matrix of epidermal cells; in addition, those factors could also be found in interfollicular keratinocytes and sweat gland cells; and the positive particles of TGF-β R(I) were mostly located in the membrane of keratinocytes and some fibroblasts. In hypertrophic scar, TGF-β1 and β3 could be detected in epidermal basal cells; TGFβ2 chiefly distributed in epidermal cells and some fibroblast cells; the protein contents of TGF-β1 and β3 were significantly lower than that of normal skin, while the change of TGF-β2 content was undistinguished when compared withnormalskin. In two kinds of tissues, the distribution and the content of TGF-β R(I) hadno obviously difference. ConclusionThe different expression and distribution of TGF-β1, β2 andβ3 between hypertrophic scar and normal skin may beassociated with the mechanism controlling scar formation, in which the role of the TGF-βR (I) and downstream signal factors need to be further studied.
ObjectiveTo observe the expression of hot shock protein 47 (HSP47) in pre-retinal membrane of proliferative vitreoretinopathy (PVR) and the influence of transforming growth factor-β2 (TGF-β2) on the expression of HSP47 in retinal pigment epithelial (RPE) cell.
MethodsPre-retinal membranes were collected and observed by hematoxylin-eosin, Masson and immunohistochemical staining. Cultured ARPE-19 cells were treated with TGF-β2 at serial concentration (0, 1, 5, 10 ng/ml) and time (0, 12, 24, 48 hours), respectively. And then the mRNA and protein expressions of HSP47 and Col-Ⅰ were measured by fluorescence quantitative reverse transcription polymerase chain reaction and Western blot at the same time.
ResultsA lot of epithelial cells with pigmental particles were observed in pre-retinal membranes of PVR, much accumulated collagen protein was observed in the specimens, and HSP47 positive expression was bserved in cytoplasm and stroma of most of the epithelioid cells. Compared with 0 ng/ml group, the expressions of HSP47 mRNA in ARPE-19 were up-regulated by 1.32, 2.35, 1.85 fold, significant differences were observed in all groups (F=27.21, P<0.05); the expressions of protein were up-regulated by 2.33, 2.89, 2.60 fold, significant differences were observed in all groups (F=39.78, P<0.05). The expressions of Col-Ⅰ mRNA were up-regulated by 1.29, 1.52, 2.11 fold, significant differences were observed in all groups (F=23.45, P<0.05); the expressions of protein were up-regulated by 1.18, 1.49, 2.11 fold and significant differences were observed in all groups (F=29.10, P<0.05). Compared with 0 hour group, the expressions of HSP47 mRNA were up-regulated by 1.56, 1.84, 2.86 fold in ARPE-19 cells stimulated by 5 ng/ml TGF-β2 for 12, 24 and 48 hours, and the differences were all significant (F=31.56, P<0.05); the expressions of protein were up-regulated by 2.08, 2.37, 2.80 fold, and the differences were all significant (F=49.18, P<0.05). The expressions of Col-Ⅰ mRNA were up-regulated by 1.57, 1.86, 2.78 fold and the differences were all significant (F=54.43, P<0.05), the expressions of protein were up-regulated by 1.38, 1.59, 2.16 fold and the differences were all significant (F=42.52, P<0.05).
ConclusionTGF-β2 may play a role in the pathologic process of PVR by promoting the expression of HSP47 and then increasing the synthesis and accumulation of Col-Ⅰ.