Scalable clustering using mapreduce programming model Santra Abhishek1,*, Agarwal Anurag1,** 1Department of Computer Science, University of Delhi, Delhi Email: *abhishek.santra@gmail.com,
**anuragagarwal90@gmail.com
Online published on 22 June, 2017. Abstract The aim is to implement a clustering algorithm, which will run in a distributed computing environment for which, a multi-node Hadoop cluster providing support for the Hadoop Distributed File System and the MapReduce Programming Model has been set up. In this paper, Exclusive and Complete Clustering (ExCC), a grid based algorithm, is implemented by scheduling consecutive MapReduce Jobs, for massive data sets. An optimal cluster parameter setting of four datanodes with 64 MB block size is obtained upon performing experiments to know the functional characteristics of ExCC in the distributed environment under different parameter settings. Top Keywords Grid based, Incremental, Exclusive, Complete and Scalable Clustering, Distributed Environment, Hadoop Cluster, Hadoop Distributed File System, Map Reduce Jobs. Top |