A Statistical comparison between genetic algorithm and logistic regression for a clinical study

Aththanayake AMSMCM

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Mathematics
→
Master of Science in Business Statistics
→
View Item

dc.contributor.advisor	Daundasekera WB
dc.contributor.advisor	Edirisinghe PM
dc.contributor.author	Aththanayake AMSMCM
dc.date.accessioned	2020
dc.date.available	2020
dc.date.issued	2020
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/16903
dc.description.abstract	Identifying a combination of variables causing infections or infectious diseases is one of the main tasks in clinical models in medicine. Forward and backward variable selection techniques in Logistic Regression (LR) are widely used in such situations, where it assumes linearity of independent variables and the absence of multi-collinearity. More often, the observed data do not satisfy these assumptions and thus, LR is not applicable. Hence, the Genetic Algorithm (GA), which does not depend on pre-defined assumptions, has proven to be better under such circumstances. By evaluating prediction rates of LR and GA techniques, the objective of this study was to perform binary LR and GA to reduce the number of variables on a sample of clinical data and compare the goodness of fit statistics to identify the better variable reduction method. Three models were built using 40 independent variables (3 non-categorical and 37 categorical) for a sample of 497 observations collected from suspected respiratory syncytial virus (RSV) infected children under 5 years of age, who were hospitalized to the Kegalle Base Hospital from May 2016 to July 2018. The binary dependent variable indicates whether the suspected child is infected with RSV positive or negative. Log-likelihood and Area Under Curve (AUC) represent the fitness functions of two GAs. The goodness of fits on the three models was compared using statistical measurements: -2log-likelihood, Psudo R-square values, Correctly Classified Percentage, Specificity, and Sensitivity. Results shown that Log-likelihood GA produces better goodness of fit measurements compared to other the two methods. However, LR reduces 40 variables into 8 with lower number of iterations while two GAs reduced into 17 variables to predict the status of RSV infection. This study suggests that the LR has a better prediction power with the most associated combination of variables. However, GA indicated better in analysing when the predefined assumptions were not satisfied and solving high dimensional classification problems in a large or complex searching space in the background of the study.	en_US
dc.language.iso	en	en_US
dc.subject	MATHEMATICS- Dissertations	en_US
dc.subject	BUSINESS STATISTICS – Dissertations	en_US
dc.subject	CLINICAL DATA	en_US
dc.subject	FITNESS FUNCTION	en_US
dc.subject	GENETIC ALGORITHM	en_US
dc.subject	LOGISTIC REGRESSION	en_US
dc.subject	RESPIRATORY SYNCYTIAL VIRUS	en_US
dc.title	A Statistical comparison between genetic algorithm and logistic regression for a clinical study	en_US
dc.type	Thesis-Full-text	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	MSc in Business Statistics	en_US
dc.identifier.department	Department of Mathematics	en_US
dc.date.accept	2020
dc.identifier.accno	TH4487	en_US