Abstract:
Data Mining by its definition is meant to deal with large volumes of data. Ever
growing volumes of Data and increasing demand for data driven decisions are placing
new requirements on Data Mining algorithms. To respond to these demands Data
Mining practitioners are focusing on improving speed and turnaround time without
compromising accuracy.
Among different approaches in improving speed, one approach gaining increased
attention is the use of GPUs. Ability of GPUs to perform parallel executions at a
massive scale and inherently repetitive nature of Data Mining workloads make GPUs a
better candidate in improving speed.
Another area getting increased attention is using Bitmaps for Data Mining algorithms.
Bitmap representations have been abundantly used in analytical queries for
their ability to represent data concisely and for being able to simplify processing.
A number of studies have been carried out which combine these two techniques
to achieve greater performance improvements. But most of those studies are revolving
around FIM based algorithms, processing of which naturally aligns with Bitmap
representations.
In this study, we explore the ability of using Bitmap techniques on GPUs to speed up
a class of Data Mining Algorithms. A Counting based Algorithm can be defined as an
Algorithm which can be separated into to two distinct phases a pattern counting phase
and a model building phase. We propose a framework based on Bitmap techniques,
which speeds up these counting based algorithms on GPUs. The proposed framework
uses both CPU and GPU for the algorithm execution, where the core computing is
delegated to GPU. We implement two algorithms Naïve Bayes and Decision Trees,
using the framework, both of which outperform CPU counterparts by several orders of
magnitude.
Citation:
De Silva, A .(2019). Techniques to speed-up counting based data mining algorithms on GPUS [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.mrt.ac.lk/handle/123/15855