基于MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预测模型构建
x
请在关注微信后,向客服人员索取文件
篇名: | 基于MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预测模型构建 |
TITLE: | Construction of machine learning classification prediction model for vancomycin blood concentrations based on MIMIC-Ⅳ database |
摘要: | 目的 构建万古霉素血药谷浓度的分类预测模型,优化其精准用药策略。方法从重症监护医学信息集市数据库中筛选符合条件的患者数据,经过数据清洗和预处理,最终纳入9902例患者,结合相关性分析和Boruta特征选择算法进行特征选择,根据临床治疗窗标准离散化万古霉素血药谷浓度结果为低浓度(<10μg/mL)、中浓度(10~20μg/mL)和高浓度(≥20μg/mL)。采用6种机器学习算法:表格先验数据拟合网络(TabPFN)、逻辑回归(LR)、随机森林(RF)、极端梯度提升(XGBoost)、支持向量机(SVM)、K近邻(KNN)构建分类模型,通过10折交叉验证(10-CV)评估模型性能,主要性能评估指标包括准确率、平衡准确率、宏平均精确率、宏平均召回率、宏平均F1、多类ROC曲线的曲线下面积(OvR-AUC)。采用沙普利加性解释(SHAP)分析不同特征对模型预测结果的影响方向与强度。结果RF和TabPFN模型表现最优(准确率为0.7414和0.7377,OvR-AUC为0.9070和0.8958),XGBoost模型表现中等,而LR、SVM和KNN模型的性能较差。混淆矩阵热力图显示,RF和TabPFN模型在高浓度类别上的预测准确率较高,但在低、中浓度类别上的表观略显不足。自举法结合10-CV评估显示,RF模型各项性能评价指标表现稳定(准确率0.7414,平衡准确率0.7403,宏平均精确率0.7321,宏平均召回率0.7360,宏平均F10.7360,OvR-AUC0.9070),具备良好的分类性能与判断能力。SHAP法分析发现,肌酐、尿素氮及万古霉素日累计量和给药频率等关键特征对预测结果具有显著影响。结论RF和TabPFN模型在万古霉素血药谷浓度分类预测任务中表现出一定优势,在低、中浓度类别上的表现仍有改进空间。 |
ABSTRACT: | OBJECTIVE To construct a classification prediction model for vancomycin blood concentration, and to optimize its precision dosing strategies. METHODS Patient records meeting inclusion criteria were extracted from the Medical Information Mart for Intensive Care database. Following data cleaning and preprocessing, a final cohort of 9 902 patient was analyzed. Feature selection was performed through correlation analysis and the Boruta feature selection algorithm. Vancomycin blood concentrations were discretized into three categories based on clinical therapeutic windows: low (<10 μg/mL), intermediate (10-20 μg/mL), and high (≥20 μg/mL). Six machine learning algorithms were employed to construct classification models: tabular prior-data fitted network (TabPFN), logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN). Model performance was evaluated using 10-fold cross-validation (10-CV), with primary metrics including: accuracy, balanced accuracy, precision macro, recall macro, macro F1, area under the receiver operating characteristic curve (OvR-AUC). Shapley Additive Explanations (SHAP) was adopted to analyze the direction and magnitude of the impact that different features had on the model’s predictive outcomes. RESULTS The results showed that the RF and TabPFN models performed the best (with accuracy of 0.741 4 and 0.737 7, and OvR-AUC of 0.907 0 and 0.895 8, respectively). XGBoost model exhibited moderate performance, while LR, SVM, and KNN models demonstrated relatively poor performance. Confusion matrix heatmap analysis revealed that both RF and TabPFN achieved higher accuracy in predicting high- concentration cases but exhibited slightly lower performance in the low and medium concentration categories. Bootstrap with 10-CV revealed that the RF model demonstrated stable performance across various evaluation metrics (accuracy: 0.741 4; balanced accuracy: 0.740 3; precision macro: 0.732 1; recall macro: 0.736 0; macro F1: 0.736 0; OvR-AUC: 0.907 0), indicating good classification performance and generalization ability. SHAP analysis revealed that creatinine, urea nitrogen, daily cumulative dose and administration frequency of vancomycin, which were key predictors, had a significant impact on the prediction results. CONCLUSIONS RF and TabPFN models demonstrate certain advantages in the classification prediction of vancomycin trough blood concentrations; however, their performance in the low to moderate concentration categories still requires improvement. |
期刊: | 2025年第36卷第19期 |
作者: | 林小惠;汪余嘉;张玲玲;许双临 |
AUTHORS: | LIN Xiaohui,WANG Yujia,ZHANG Lingling,XU Shuanglin |
关键字: | 机器学习;万古霉素;血药浓度;MIMIC-Ⅳ数据库;分类预测 |
KEYWORDS: | machine learning; vancomycin; blood concentration; MIMIC-Ⅳ database; classification prediction |
阅读数: | 3 次 |
本月下载数: | 0 次 |
* 注:未经本站明确许可,任何网站不得非法盗链资源下载连接及抄袭本站原创内容资源!在此感谢您的支持与合作!