Institutional-Repository, University of Moratuwa.  

A Statistical comparison between genetic algorithm and logistic regression for a clinical study

Show simple item record

dc.contributor.advisor Daundasekera WB
dc.contributor.advisor Edirisinghe PM
dc.contributor.author Aththanayake AMSMCM
dc.date.accessioned 2020
dc.date.available 2020
dc.date.issued 2020
dc.identifier.uri http://dl.lib.uom.lk/handle/123/16903
dc.description.abstract Identifying a combination of variables causing infections or infectious diseases is one of the main tasks in clinical models in medicine. Forward and backward variable selection techniques in Logistic Regression (LR) are widely used in such situations, where it assumes linearity of independent variables and the absence of multi-collinearity. More often, the observed data do not satisfy these assumptions and thus, LR is not applicable. Hence, the Genetic Algorithm (GA), which does not depend on pre-defined assumptions, has proven to be better under such circumstances. By evaluating prediction rates of LR and GA techniques, the objective of this study was to perform binary LR and GA to reduce the number of variables on a sample of clinical data and compare the goodness of fit statistics to identify the better variable reduction method. Three models were built using 40 independent variables (3 non-categorical and 37 categorical) for a sample of 497 observations collected from suspected respiratory syncytial virus (RSV) infected children under 5 years of age, who were hospitalized to the Kegalle Base Hospital from May 2016 to July 2018. The binary dependent variable indicates whether the suspected child is infected with RSV positive or negative. Log-likelihood and Area Under Curve (AUC) represent the fitness functions of two GAs. The goodness of fits on the three models was compared using statistical measurements: -2log-likelihood, Psudo R-square values, Correctly Classified Percentage, Specificity, and Sensitivity. Results shown that Log-likelihood GA produces better goodness of fit measurements compared to other the two methods. However, LR reduces 40 variables into 8 with lower number of iterations while two GAs reduced into 17 variables to predict the status of RSV infection. This study suggests that the LR has a better prediction power with the most associated combination of variables. However, GA indicated better in analysing when the predefined assumptions were not satisfied and solving high dimensional classification problems in a large or complex searching space in the background of the study. en_US
dc.language.iso en en_US
dc.subject MATHEMATICS- Dissertations en_US
dc.subject BUSINESS STATISTICS – Dissertations en_US
dc.subject CLINICAL DATA en_US
dc.subject FITNESS FUNCTION en_US
dc.subject GENETIC ALGORITHM en_US
dc.subject LOGISTIC REGRESSION en_US
dc.subject RESPIRATORY SYNCYTIAL VIRUS en_US
dc.title A Statistical comparison between genetic algorithm and logistic regression for a clinical study en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc in Business Statistics en_US
dc.identifier.department Department of Mathematics en_US
dc.date.accept 2020
dc.identifier.accno TH4487 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record