Abstract:
Most of the well established clustering algorithms assume that the underlying clustering structure of data set does not change over time. Hence, those algorithms fail to identify underlying cluster structures in currently available large scale dynamic data sources in an efficient manner. According to the literature, there were many attempts to address this issue by extending well established static clustering algorithms for dynamic context. For instance, Incremental K-Means and Incremental DBSCAN algorithms are two such attempts. Those methods update the model periodically according to changes in the database or change the model parameters whenever new data appear in an incremental manner. Additionally, Fuzzy Logic and Rough Set concepts are also employed to deal with the uncertainty in dynamic data sources. Other than these, Multi-agent technology has been used to address issues in data clustering in both static and dynamic contexts. PADMA and PAPYRUS are two reported Multi-agent clustering systems in 1990s. Moreover, Chaimontree, Atkinson, and Coenen developed a Multi-agent based clustering system which improves the initial cluster configuration by the interaction and negotiation between cluster agents that represent each cluster in the data set.
Although there were many attempts to develop agent based clustering algorithms, but there are lesser number of reported works on identification of partitional clusters in a dynamic data source. The study presented in this thesis proposes a Multi-agent based approach to identify partitional clusters in a dynamic data source. Set of partitional clusters in a dynamic data source is identified by interactions and negotiations among the agents who represent data records in the data source. Then identified potential clusters are assigned to what are called Cluster agents. By interactions and negotiations between cluster agents and Data Record agents, the identified cluster configuration is continuously improved according to the internal cluster evaluation measures.
The proposed method is evaluated by synthetic data sets with different number of clusters in 2D and 3D spaces. Results indicate that the proposed method successfully identifies the clusters in those data sets with minimal human intervention.