Abstract:
Diabetes is a major non-communicable disease that is responsible for many associated health risks and is rapidly increasing in low and middle income countries like Bangladesh. Class imbalance existing in datasets is a dire issue that can result the predictions of diabetes to be biased towards the majority class - thus reducing the reliability of machine learning models. Considering the associated risks of diabetes, a decrease in recall can result in life threatening consequences. In order to tackle this problem, a cost-sensitive learning and synthetic minority oversampling technique (SMOTE) have been applied on the PIMA Indian dataset. After that, the models have been tested on PIMA test set as well as on dataset collected from Kurmitola General Hospital (KGH), Dhaka, Bangladesh. Our results demonstrate that this proposed approach has successfully improved the reliability of the previous ML models to predict diabetes among Bangladeshi female population.
Citation:
B. Pranto, S. M. Mehnaz, S. Momen and S. M. Huq, "Prediction of diabetes using cost sensitive learning and oversampling techniques on Bangladeshi and Indian female patients," 2020 5th International Conference on Information Technology Research (ICITR), 2020, pp. 1-6, doi: 10.1109/ICITR51448.2020.9310892.