Show simple item record

dc.contributor.advisor Jayasena S
dc.contributor.author Brihadiswaran G
dc.date.accessioned 2021T03:23:16Z
dc.date.available 2021T03:23:16Z
dc.date.issued 2021
dc.identifier.citation Brihadiswaran G. (2021). Accelerating K-MER counting for genomic analysis [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22662
dc.identifier.uri http://dl.lib.uom.lk/handle/123/22662
dc.description.abstract A-mer counting is the process of counting k length substrings in a sequence. It is an important step in many bioinformatics applications including genome assembly, sequence error correction, and sequence alignment. Even though generating A-mer histograms seems simple and straightforward, processing large datasets efficiently with limited resources, especially memory, is very challenging. As the advancements in next-generation sequencing technologies have resulted in a tremendous growth of genomic data, it is inevitable for /r-mer counters to be faster and more efficient. A lot of work has been done in the past decade to optimize A-mer counting. Frigate, a fast and efficient tool capable of counting and querying A-mers is presented. Its inmemory design utilizes multithreaded, lock-free data structures to improve performance. Thread synchronization is handled using the compare-and-swap technique. The parallel processing pipeline of Frigate is the result of careful performance engineering and design. Frigate was developed with the emphasis on values of k less than 20, aiming to maximize performance by employing different algorithms for different ranges of k values. The performance of Frigate was compared with six state-of-the-art A-mer counters: Jellyfish, DSK, Gerbil, CHTKC, KMC2, and KMC3, using two real-world datasets. The experiments were carried out for k values of 10, 15, and 17 using a different number of threads in the range [1, 32]. The results show that Frigate achieves a comparable performance or up to 2-3x speedup compared to its competitors, especially for large datasets. The A-mer counters were analyzed based on the running time, amount of memory used, and scalability. The correctness of Frigate was evaluated by comparing the A-mer frequency histogram with those of other A-mer counters. Frigate is written in C and freely available at https: github.com Gunavaran, frigate under MIT license. en_US
dc.language.iso en en_US
dc.subject K-MER COUNTING
dc.subject GENOME ANALYSIS
dc.subject PERFORMANCE ENGINEERING
dc.subject PARALLEL CPMPUTING
dc.subject COMPUTER SCIENCE - Dissertation
dc.subject COMPUTER SCIENCE AND ENGINEERING - Dissertation
dc.subject MSc (Major Component Research)
dc.title Accelerating K-MER counting for genomic analysis en_US
dc.type Thesis-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree Master of Science (Major Component of Research) en_US
dc.identifier.department Department of Computer Science & Engineering en_US
dc.date.accept 2021
dc.identifier.accno TH5106 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record