cluscomp output
|
Cluscomp has very simple output. Basically, for two given partitions of a dataset, it will simply output a single line like this:
This is it saying that the two partionings of the dataset are in correspondence
with each other to a degree As explained in the Manual, cluscomp currently offers two different measurements of partition correspondence: the Hubert-Arabie Object Triple index (which it uses as the default), and the Corrected Rand Index, which is used by specifying the -r option. Simple datasetsAs a simple example, consider the dataset shown here: Let's say that you want to test some clustering software you are developing.
As a first run, your software partitions the original dataset nearly correctly,
except for one misclassified point:
As a second run, your software partitions the original dataset slightly
worse--there are now two misclassified points:
The third time you run it, your software partitions the original dataset
pretty badly. It has merged two of the clusters into one:
For the sake of illustration, let's say that you try two more times,
and each time, the software misclassified another point:
Larger datasetsHere is an example with a larger dataset. In the original set A,
there were 5 clusters, each of 200 "normal" points, and 50 "outlier"
points. That set can be seen here: The first attempt at partitioning the raw data produced a dataset B,
which is shown in the next image:
The second attempt produced the dataset C, shown here:
|