Clusutils

Back to Clusutils' Homepage

Introduction

The Clusutils package contains two programs to assist researchers in their study of data clustering algorithms. The clusgen program generates artificial data sets. These sets are designed to have distinct clusters. The algorithms used in generating the data are based on the work done by Milligan in [1]. The cluscomp program compares partitions of a data set, and returns an index (between plus/minus 1) which is a measure of the correspondence of the partitions with each other. The comparison algorithms used are based on algorithms described by Hubert and Arabie in [2].

The programs in the Clusutils package are written from scratch in Standard C++ in order to provide portable, optimized versions. There are original FORTRAN and DOS implementations of the algorithms above available on the web but they suffer from non-portability and low dimensionality restrictions.

The Clusutils package is currently being developed on a machine running the Linux operating system, but should be able to be compiled with any fairly modern C++ compiler.

References

Milligan, Glenn W. An Algorithm for Generating Artificial Test Clusters. Psychometrica, 50(1):123-127, 1985.

Hubert, Lawrence and Arabie, Phipps. Comparing Partitions. Journal of Classification, 2:193-218, 1985.