Accelerating K-MER counting for genomic analysis

Brihadiswaran G

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Computer Science & Engineering
→
Master of Science By Research
→
View Item

dc.contributor.advisor	Jayasena S
dc.contributor.author	Brihadiswaran G
dc.date.accessioned	2021T03:23:16Z
dc.date.available	2021T03:23:16Z
dc.date.issued	2021
dc.identifier.citation	Brihadiswaran G. (2021). Accelerating K-MER counting for genomic analysis [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22662
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/22662
dc.description.abstract	A-mer counting is the process of counting k length substrings in a sequence. It is an important step in many bioinformatics applications including genome assembly, sequence error correction, and sequence alignment. Even though generating A-mer histograms seems simple and straightforward, processing large datasets efficiently with limited resources, especially memory, is very challenging. As the advancements in next-generation sequencing technologies have resulted in a tremendous growth of genomic data, it is inevitable for /r-mer counters to be faster and more efficient. A lot of work has been done in the past decade to optimize A-mer counting. Frigate, a fast and efficient tool capable of counting and querying A-mers is presented. Its inmemory design utilizes multithreaded, lock-free data structures to improve performance. Thread synchronization is handled using the compare-and-swap technique. The parallel processing pipeline of Frigate is the result of careful performance engineering and design. Frigate was developed with the emphasis on values of k less than 20, aiming to maximize performance by employing different algorithms for different ranges of k values. The performance of Frigate was compared with six state-of-the-art A-mer counters: Jellyfish, DSK, Gerbil, CHTKC, KMC2, and KMC3, using two real-world datasets. The experiments were carried out for k values of 10, 15, and 17 using a different number of threads in the range [1, 32]. The results show that Frigate achieves a comparable performance or up to 2-3x speedup compared to its competitors, especially for large datasets. The A-mer counters were analyzed based on the running time, amount of memory used, and scalability. The correctness of Frigate was evaluated by comparing the A-mer frequency histogram with those of other A-mer counters. Frigate is written in C and freely available at https: github.com Gunavaran, frigate under MIT license.	en_US
dc.language.iso	en	en_US
dc.subject	K-MER COUNTING
dc.subject	GENOME ANALYSIS
dc.subject	PERFORMANCE ENGINEERING
dc.subject	PARALLEL CPMPUTING
dc.subject	COMPUTER SCIENCE - Dissertation
dc.subject	COMPUTER SCIENCE AND ENGINEERING - Dissertation
dc.subject	MSc (Major Component Research)
dc.title	Accelerating K-MER counting for genomic analysis	en_US
dc.type	Thesis-Abstract	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	Master of Science (Major Component of Research)	en_US
dc.identifier.department	Department of Computer Science & Engineering	en_US
dc.date.accept	2021
dc.identifier.accno	TH5106	en_US