ObjectiveTo reveal the expression patterns of tertiary lymphoid structure (TLS)-related gene features in non-small cell lung cancer (NSCLC), and further construct a prognostic prediction model for NSCLC patients based on machine learning, as well as evaluate the correlation between the TLS risk score and tumor immune microenvironment characteristics and potential immunotherapy benefits. MethodsThe training cohort was derived from the NSCLC dataset of The Cancer Genome Atlas (TCGA) database, including 994 tumor samples with survival time >0 days (and 110 normal tissue samples for differential expression analysis). External validation cohorts were obtained from the Gene Expression Omnibus (GEO) database, including GSE30219 (n=289) and GSE72094 (n=398). Based on the expression levels of TLS-related genes, consensus clustering was performed to identify molecular subtypes associated with TLS. Weighted gene co-expression network analysis (WGCNA) was applied to screen co-expression modules significantly correlated with TLS subtypes. To construct the TLS prognostic model, 101 algorithm combinations comprising 10 machine learning algorithms were employed for model training and selection. A high-confidence TLS prognostic model was established and systematically evaluated for its predictive performance in both the training cohort and external validation cohorts. Additionally, associations between the model and clinical characteristics as well as immune microenvironment indicators were analyzed. ResultsConsensus clustering identified three TLS molecular subtypes in the TCGA-NSCLC cohort (n=994): C1 (n=441), C2 (n=263), and C3 (n=290). These subtypes exhibited distinct overall survival outcomes and demonstrated differences in clinical characteristics and immune infiltration levels. Under the soft threshold β=9 condition, WGCNA identified seven co-expression modules, among which the blue module (r=0.32) and yellow module (r=0.44) showed the highest correlations with TLS subtypes. From these two modules containing 758 genes, univariate Cox regression analysis selected 32 prognosis-related genes. Through optimization across 101 algorithm combinations, the optimal TLS prognostic model was established and validated in external cohorts. This model stratified patients into high-risk and low-risk groups, demonstrating stable prognostic discrimination capability in TCGA, GSE30219, and GSE72094 datasets. Immune infiltration analysis revealed significantly higher infiltration levels of multiple immune cell types in the low-risk group. Drug sensitivity analysis indicated that the low-risk group exhibited greater sensitivity to cisplatin, docetaxel, gemcitabine, and paclitaxel. Additionally, pharmacological screening identified four potential candidate drugs (BI-2536, GSK461364, Paclitaxel, SB-743921) in the Cancer Therapeutics Response Portal (CTRP) database and three candidates (Epothilone-b, Mitoxantrone, Volasertib) in the Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) database for high-risk group patients. ConclusionThe TLS risk score serves as an independent prognostic factor effectively predicting NSCLC patient outcomes, representing a potential biomarker for NSCLC.