Institutional-Repository, University of Moratuwa.  

Usage of topic modeling method for high dimensional gene expression data analysis

Show simple item record

dc.contributor.author Senadheera, SPBM
dc.contributor.author Weerasinghe, AR
dc.contributor.editor Ganegoda, GU
dc.contributor.editor Mahadewa, KT
dc.date.accessioned 2022-11-10T03:10:32Z
dc.date.available 2022-11-10T03:10:32Z
dc.date.issued 2021-12
dc.identifier.citation S. P. B. M. Senadheera and A. R. Weerasinghe, "Usage of Topic Modeling Method for High Dimensional Gene Expression Data Analysis," 2021 6th International Conference on Information Technology Research (ICITR), 2021, pp. 1-6, doi: 10.1109/ICITR54349.2021.9657380. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/19455
dc.description.abstract Gene expression data analysis is a major area in biological system interpretation. Since, gene expression data have large numbers of variables, high dimensional clustering methods are required for analysis. The objectives of this study were to understand the effectiveness of different clustering methods in gene expression data analysis based on biological relatedness and study of the advantages and disadvantages of different clustering strategies in gene expression analysis. The data was obtained from the GSE19830 dataset and the brain tumor data (TCGA project). To test the hard clustering, hierarchical clustering and fuzzy clustering, the K-means algorithm, HClust and topic modeling were used respectively. Prior knowledge about the dataset was required to define the number of clusters (K). Initially, the GSE19830 (Brain, Lung, Liver tissue mixture) dataset was used for developing the clusters. All models clustered the observations similar to the physical tags in the dataset. Secondly, Clustering methods were developed with the brain tumor dataset consisting of 202 samples (four specified physically categorized tumors). According to hierarchical clustering and topic modeling, when analyzing similar tissues, gene expression tumor subtypes (clusters) were not aligned with physical categorization. Finally, 81 cancer genes were filtered and generated a topic model. In order to understand the biological relevance of the final model, Reactome and PCViz tools were used. Reactome results supported topics developed from topic modeling. According to the results, in high dimensional data analysis, topic modeling was found to be a promising approach for gene expression based clustering while K-means was found to be inappropriate for gene clustering. en_US
dc.language.iso en en_US
dc.publisher Faculty of Information Technology, University of Moratuwa. en_US
dc.relation.uri https://ieeexplore.ieee.org/document/9657380 en_US
dc.subject Topic modeling en_US
dc.subject Clustering en_US
dc.subject Gene expression en_US
dc.title Usage of topic modeling method for high dimensional gene expression data analysis en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty IT en_US
dc.identifier.department Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa. en_US
dc.identifier.year 2021 en_US
dc.identifier.conference 6th International Conference in Information Technology Research 2021 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.proceeding Proceedings of the 6th International Conference in Information Technology Research 2021 en_US
dc.identifier.doi doi: 10.1109/ICITR54349.2021.9657380 en_US


Files in this item

This item appears in the following Collection(s)

  • ICITR - 2021 [39]
    International Conference on Information Technology Research (ICITR)

Show simple item record