Abstract:
Type 2 diabetes is one of the growing vitally fatal diseases all over the world. The knowledge of the significant risk factors for type 2 diabetes will be useful to keep the diabetes under control. This study has identified eight significant risk factors for type 2 diabetes in the data set of UCI machine learning repository by using point-biserial correlation. With the aim of developing an accurate predictive model to predict the presence of diabetes based on identified significant risk factors a binary logistic regression approach was applied. The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Therefore five-fold cross validation technique has applied in order to validate the predictive ability of the developed model. Results reveal that low value of optimism (0.0108) and high value of c-statistic (0.8512) in the fitted model indicating an acceptable discrimination power of type 2 diabetes. There is a significant influence by Glucose level, BMI and Pedigree for the diabetes on the classification of the patient as type 2 diabetes.