Gestational diabetes prediction in pregnancy: A machine learning and data preprocessing approach

Presented block diagram of the GDM classification to diabetes (positive) or non-diabetes (negative)

Abstract

Gestational diabetes mellitus (GDM) is characterized by glucose intolerance during pregnancy, resulting in an elevated blood glucose level and short-term and long-term health burdens. Therefore, early screening would aid in reducing complications associated with GDM and adverse pregnancy outcomes. Machine learning (ML) algorithms are a promising alternative to manual GDM early-stage assessment. In this article, we propose a machine learning (ML) pipeline that employs five distinct classifiers: decision trees (DT), linear discriminant analysis (LDA), logistic regression (LR), XGBoost (XGB), and Gaussian naive Bayes (GNB). Our framework incorporates the essential preprocessing stages, such as filling in missing values, selecting important features, tuning hyperparameters, and applying stratified K-fold cross-validation to improve the model’s robustness and precision. The K-Nearest Neighbors (KNN) method outperforms the other strategies in the proposed framework based on a comprehensive analysis of three distinct missing data imputation techniques. In addition, eight out of fifteen features are chosen, implementing a procedure for feature selection. Finally, when the XGB classifier is combined with the presented preprocessing, the performance improves by significant margins, yielding the utmost achievable accuracy of 0.9719 and an area under the ROC curve of 0.9982. This promising result makes our pipeline useful for GDM prediction in the earliest stages.

Publication
2023 26th International Conference on Computer and Information Technology (ICCIT), Bangladesh, 13-15 December 2023, pp. 1-6

Related