Screening of immune related gene and survival prediction of lung adenocarcinoma patients based on LightGBM model_Journal of Biomedical Engineering

Authors：

 MENG Xiangfu , TIAN Youfa , ZHANG Xiaoyan

1. School of Electronics and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125000, P. R. China;

Corresponding?author：

MENG Xiangfu, Email: marxi@126.com

Keywords：

Lung adenocarcinoma; Bioinformatics; Ensemble learning; Immune related gene; LightGBM

DOI：

10.7507/1001-5515.202305038

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

Lung cancer is one of the malignant tumors with the greatest threat to human health, and studies have shown that some genes play an important regulatory role in the occurrence and development of lung cancer. In this paper, a LightGBM ensemble learning method is proposed to construct a prognostic model based on immune relate gene (IRG) profile data and clinical data to predict the prognostic survival rate of lung adenocarcinoma patients. First, this method used the Limma package for differential gene expression, used CoxPH regression analysis to screen the IRG to prognosis, and then used XGBoost algorithm to score the importance of the IRG features. Finally, the LASSO regression analysis was used to select IRG that could be used to construct a prognostic model, and a total of 17 IRG features were obtained that could be used to construct model. LightGBM was trained according to the IRG screened. The K-means algorithm was used to divide the patients into three groups, and the area under curve (AUC) of receiver operating characteristic (ROC) of the model output showed that the accuracy of the model in predicting the survival rates of the three groups of patients was 96%, 98% and 96%, respectively. The experimental results show that the model proposed in this paper can divide patients with lung adenocarcinoma into three groups [5-year survival rate higher than 65% (group 1), lower than 65% but higher than 30% (group 2) and lower than 30% (group 3)] and can accurately predict the 5-year survival rate of lung adenocarcinoma patients.

Citation： MENG Xiangfu, TIAN Youfa, ZHANG Xiaoyan. Screening of immune related gene and survival prediction of lung adenocarcinoma patients based on LightGBM model. Journal of Biomedical Engineering, 2024, 41(1): 70-79. doi: 10.7507/1001-5515.202305038 Copy

1.	Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2018, 68(6): 394-424.
2.	劉鄧, 楊嘯林, 孟祥福. RcaNet: 一種預測腫瘤突變負荷的深度學習模型. 中國生物醫學工程學報, 2023, 42(1): 51-61.
3.	Thai A, Solomon B, Sequist L, et al. Lung cancer. Lancet, 2021, 398(10299): 535-554.
4.	Foret P, Kleiner A, Mobahi H, et al. Sharpness-aware minimization for efficiently improving generalization. arXiv, 2021, 79(5): 122-126.
5.	Denisenko T V, Budkevich I N, Zhivotovsky B, et al. Cell death-based treatment of lung adenocarcinoma. Cell Death Dis, 2018, 9(2): 117.
6.	Li L, Sun Y, Feng M, et al. Clinical significance of blood-based miRNAs as biomarkers of non-small cell lung cancer. Oncol Lett, 2018, 15(6): 8915-8925.
7.	趙丹, 牟海軍. 基于TCGA數據庫應用生物信息學方法分析和挖掘肺腺癌預后和診斷miRNA研究. 當代醫學, 2022, 28(4): 33-36.
8.	張滿堂, 吳小永, 杜敏. 非小細胞肺癌組織中TP酶激活蛋白SH3功能結合蛋白2的表達及臨床意義. 臨床肺科雜志, 2023, 28(2): 189-194.
9.	Yang L, Wang S, Zhou Y, et al. Evaluation of the 7th and 8th editions of the AJCC/UICC TNM staging systems for lung cancer in a large North American cohort. Oncotarget, 2017, 8(40): 66784-66795.
10.	黃正品, 黃鋼. 肺腺癌免疫相關基因預后模型的構建與應用. 生物技術, 2022, 32(3): 313-320.
11.	Miller H A, Berkel V V, Frieboes H B. Lung cancer survival prediction and biomarker identification with an ensemble machine learning analysis of tumor core biopsy metabolomic data. Metabolomics, 2022, 18(8): 1-12.
12.	陳麗, 朱裴松, 錢鐵云, 等. 基于邊采樣的網絡表示學習模型. 軟件學報, 2018, 29(3): 756-771.
13.	陳亦琦, 錢鐵云, 李萬理, 等. 基于復合關系圖卷積的屬性網絡嵌入方法. 計算機研究與發展, 2020, 57(8): 1674-1682.
14.	Anika C, Olivier G. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics, 2019, 35(14): i446-i454.
15.	Zhu X, Yao J, Huang J. Deep convolutional neural network for survival analysis with pathological images// 2016 IEEE International Conference on Bioinformatics (BIBM). Shenzhen: IEEE, 2017: 544-547.
16.	Thedinga K, Herwig R. A gradient tree boosting and network propagation derived pan-cancer survival network of the tumor microenvironment. iScience, 2022, 25(1): 103617.
17.	Kourou K, Exarchos T P, Exarchos K P, et al. Machine learning applications in cancer prediction and prognosis. Comput Struct Biotechnol J, 2014, 13: 8-17.
18.	Liu S, Wang Z, Zhu R. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. J Vis Exp, 2021, 18(175): e62528.
19.	Yang C H, Moi S H, Fu O Y, et al. Identifying risk stratification associated with a cancer for overall survival by deep learning-based CoxPH. IEEE Access, 2019, 7(99): 67708-67717.
20.	杜也, 米熱阿依·阿布都熱孜克, 左冉, 等. 基于LASSO回歸篩選影響肺腺癌患者預后的糖酵解相關基因. 中國腫瘤臨床, 2023, 50(01): 16-21.
21.	Kanungo T, Mount D M, Netanyahu N S, et al. An efficient k-means clustering algorithm: analysis and implementation. IEEE Comput Soc, 2002, 24(7): 881-892.
22.	Chen T, Tong H, Benesty M. xgboost: eXtreme Gradient Boosting. BibSonomy, 2016, 1(4): 1-4.
23.	Qi M. LightGBM: A highly efficient gradient boosting decision tree // NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2017: 3149-3157.
24.	Karsoliya S. Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture. Int J Eng Trends Technol, 2012, 3(6): 714-717.
25.	Nichols J A, Herbert C H W, Baker M A B. Machine learning: applications of artificial intelligence to imaging and diagnosis. Biophys Rev, 2018, 11(1): 111-118.
26.	Cherif W. Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis. Proc Comput Sci, 2018, 127: 293-299.
27.	Altman N, Krzywinski M. Points of significance: Clustering. Nat Methods, 2017, 14(6): 545-546.
28.	Dagogo-Jack I, Shaw A T. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Cli Oncol, 2018, 15(2): 81-94.
29.	Haque M N, Tazin T, Khan M M, et al. Predicting characteristics associated with breast cancer survival using multiple machine learning approaches. Comput Math Methods Med, 2022, 2022: 1249692.
30.	彭華. 肺癌易感基因的研究進展. 中國藥物經濟學, 2019, 14(9): 126-128.
31.	卞秀森, 李光, 關欣宇, 等. UHRF1通過調控細胞自噬抑制肺腺癌細胞增殖的分子機制研究. 實用腫瘤學雜志, 2018, 32(6): 498-502.
32.	Woo S, Corces, Ryan M, et al. The chromatin accessibility landscape of primary human cancers. iScience, 2018, 362(6413): eaav18989.
33.	劉鳳燕, 張元媛, 張琪, 等. 基于TCGA數據庫構建肺腺癌相關免疫基因預后模型. 河南大學學報(自然科學版), 2023, 53(2): 186-195.
34.	李昂, 謝俞寧, 仵紅嬌, 等. 肺腺癌預后關鍵基因的篩選、驗證及其調控通路分析. 山東醫藥, 2020, 60(23): 1-5.
35.	范興. 肺腺癌關鍵預后基因的篩選和分析. 太原: 山西財經大學, 2023.
36.	劉少博, 黃波. 基于生物信息學方法識別肺腺癌預后相關基因及預后風險模型的構建. 中國免疫學雜志, 2021, 37(23): 2880-2892.
37.	馬國玉, 熊慶, 蔣國慶, 等. 基于生物信息學方法識別肺腺癌預后相關基因. 昆明醫科大學學報, 2020, 41(7): 30-37.

1. Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2018, 68(6): 394-424.
2. 劉鄧, 楊嘯林, 孟祥福. RcaNet: 一種預測腫瘤突變負荷的深度學習模型. 中國生物醫學工程學報, 2023, 42(1): 51-61.
3. Thai A, Solomon B, Sequist L, et al. Lung cancer. Lancet, 2021, 398(10299): 535-554.
4. Foret P, Kleiner A, Mobahi H, et al. Sharpness-aware minimization for efficiently improving generalization. arXiv, 2021, 79(5): 122-126.
5. Denisenko T V, Budkevich I N, Zhivotovsky B, et al. Cell death-based treatment of lung adenocarcinoma. Cell Death Dis, 2018, 9(2): 117.
6. Li L, Sun Y, Feng M, et al. Clinical significance of blood-based miRNAs as biomarkers of non-small cell lung cancer. Oncol Lett, 2018, 15(6): 8915-8925.
7. 趙丹, 牟海軍. 基于TCGA數據庫應用生物信息學方法分析和挖掘肺腺癌預后和診斷miRNA研究. 當代醫學, 2022, 28(4): 33-36.
8. 張滿堂, 吳小永, 杜敏. 非小細胞肺癌組織中TP酶激活蛋白SH3功能結合蛋白2的表達及臨床意義. 臨床肺科雜志, 2023, 28(2): 189-194.
9. Yang L, Wang S, Zhou Y, et al. Evaluation of the 7th and 8th editions of the AJCC/UICC TNM staging systems for lung cancer in a large North American cohort. Oncotarget, 2017, 8(40): 66784-66795.
10. 黃正品, 黃鋼. 肺腺癌免疫相關基因預后模型的構建與應用. 生物技術, 2022, 32(3): 313-320.
11. Miller H A, Berkel V V, Frieboes H B. Lung cancer survival prediction and biomarker identification with an ensemble machine learning analysis of tumor core biopsy metabolomic data. Metabolomics, 2022, 18(8): 1-12.
12. 陳麗, 朱裴松, 錢鐵云, 等. 基于邊采樣的網絡表示學習模型. 軟件學報, 2018, 29(3): 756-771.
13. 陳亦琦, 錢鐵云, 李萬理, 等. 基于復合關系圖卷積的屬性網絡嵌入方法. 計算機研究與發展, 2020, 57(8): 1674-1682.
14. Anika C, Olivier G. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics, 2019, 35(14): i446-i454.
15. Zhu X, Yao J, Huang J. Deep convolutional neural network for survival analysis with pathological images// 2016 IEEE International Conference on Bioinformatics (BIBM). Shenzhen: IEEE, 2017: 544-547.
16. Thedinga K, Herwig R. A gradient tree boosting and network propagation derived pan-cancer survival network of the tumor microenvironment. iScience, 2022, 25(1): 103617.
17. Kourou K, Exarchos T P, Exarchos K P, et al. Machine learning applications in cancer prediction and prognosis. Comput Struct Biotechnol J, 2014, 13: 8-17.
18. Liu S, Wang Z, Zhu R. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. J Vis Exp, 2021, 18(175): e62528.
19. Yang C H, Moi S H, Fu O Y, et al. Identifying risk stratification associated with a cancer for overall survival by deep learning-based CoxPH. IEEE Access, 2019, 7(99): 67708-67717.
20. 杜也, 米熱阿依·阿布都熱孜克, 左冉, 等. 基于LASSO回歸篩選影響肺腺癌患者預后的糖酵解相關基因. 中國腫瘤臨床, 2023, 50(01): 16-21.
21. Kanungo T, Mount D M, Netanyahu N S, et al. An efficient k-means clustering algorithm: analysis and implementation. IEEE Comput Soc, 2002, 24(7): 881-892.
22. Chen T, Tong H, Benesty M. xgboost: eXtreme Gradient Boosting. BibSonomy, 2016, 1(4): 1-4.
23. Qi M. LightGBM: A highly efficient gradient boosting decision tree // NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2017: 3149-3157.
24. Karsoliya S. Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture. Int J Eng Trends Technol, 2012, 3(6): 714-717.
25. Nichols J A, Herbert C H W, Baker M A B. Machine learning: applications of artificial intelligence to imaging and diagnosis. Biophys Rev, 2018, 11(1): 111-118.
26. Cherif W. Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis. Proc Comput Sci, 2018, 127: 293-299.
27. Altman N, Krzywinski M. Points of significance: Clustering. Nat Methods, 2017, 14(6): 545-546.
28. Dagogo-Jack I, Shaw A T. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Cli Oncol, 2018, 15(2): 81-94.
29. Haque M N, Tazin T, Khan M M, et al. Predicting characteristics associated with breast cancer survival using multiple machine learning approaches. Comput Math Methods Med, 2022, 2022: 1249692.
30. 彭華. 肺癌易感基因的研究進展. 中國藥物經濟學, 2019, 14(9): 126-128.
31. 卞秀森, 李光, 關欣宇, 等. UHRF1通過調控細胞自噬抑制肺腺癌細胞增殖的分子機制研究. 實用腫瘤學雜志, 2018, 32(6): 498-502.
32. Woo S, Corces, Ryan M, et al. The chromatin accessibility landscape of primary human cancers. iScience, 2018, 362(6413): eaav18989.
33. 劉鳳燕, 張元媛, 張琪, 等. 基于TCGA數據庫構建肺腺癌相關免疫基因預后模型. 河南大學學報(自然科學版), 2023, 53(2): 186-195.
34. 李昂, 謝俞寧, 仵紅嬌, 等. 肺腺癌預后關鍵基因的篩選、驗證及其調控通路分析. 山東醫藥, 2020, 60(23): 1-5.
35. 范興. 肺腺癌關鍵預后基因的篩選和分析. 太原: 山西財經大學, 2023.
36. 劉少博, 黃波. 基于生物信息學方法識別肺腺癌預后相關基因及預后風險模型的構建. 中國免疫學雜志, 2021, 37(23): 2880-2892.
37. 馬國玉, 熊慶, 蔣國慶, 等. 基于生物信息學方法識別肺腺癌預后相關基因. 昆明醫科大學學報, 2020, 41(7): 30-37.

Journal of Biomedical Engineering

Screening of immune related gene and survival prediction of lung adenocarcinoma patients based on LightGBM model

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content