Class ClusterEvaluation

  • All Implemented Interfaces:
    java.io.Serializable, RevisionHandler

    public class ClusterEvaluation
    extends java.lang.Object
    implements java.io.Serializable, RevisionHandler
    Class for evaluating clustering models.

    Valid options are:

    -t name of the training file
    Specify the training file.

    -T name of the test file
    Specify the test file to apply clusterer to.

    -d name of file to save clustering model to
    Specify output file.

    -l name of file to load clustering model from
    Specifiy input file.

    -p attribute range
    Output predictions. Predictions are for the training file if only the training file is specified, otherwise they are for the test file. The range specifies attribute values to be output with the predictions. Use '-p 0' for none.

    -x num folds
    Set the number of folds for a cross validation of the training data. Cross validation can only be done for distribution clusterers and will be performed if the test file is missing.

    -s num
    Sets the seed for randomizing the data for cross-validation.

    -c class
    Set the class attribute. If set, then class based evaluation of clustering is performed.

    -g name of graph file
    Outputs the graph representation of the clusterer to the file. Only for clusterer that implemented the weka.core.Drawable interface.

    Version:
    $Revision: 7753 $
    Author:
    Mark Hall (mhall@cs.waikato.ac.nz)
    See Also:
    Drawable, Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      ClusterEvaluation()
      Constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String clusterResultsToString()
      return the results of clustering.
      static java.lang.String crossValidateModel​(java.lang.String clustererString, Instances data, int numFolds, java.lang.String[] options, java.util.Random random)
      Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.
      static double crossValidateModel​(DensityBasedClusterer clusterer, Instances data, int numFolds, java.util.Random random)
      Perform a cross-validation for DensityBasedClusterer on a set of instances.
      boolean equals​(java.lang.Object obj)
      Tests whether the current evaluation object is equal to another evaluation object
      static java.lang.String evaluateClusterer​(Clusterer clusterer, java.lang.String[] options)
      Evaluates a clusterer with the options given in an array of strings.
      void evaluateClusterer​(Instances test)
      Evaluate the clusterer on a set of instances.
      void evaluateClusterer​(Instances test, java.lang.String testFileName)
      Evaluate the clusterer on a set of instances.
      void evaluateClusterer​(Instances test, java.lang.String testFileName, boolean outputModel)
      Evaluate the clusterer on a set of instances.
      int[] getClassesToClusters()
      Return the array (ordered by cluster number) of minimum error class to cluster mappings
      double[] getClusterAssignments()
      Return an array of cluster assignments corresponding to the most recent set of instances clustered.
      double getLogLikelihood()
      Return the log likelihood corresponding to the most recent set of instances clustered.
      int getNumClusters()
      Return the number of clusters found for the most recent call to evaluateClusterer
      java.lang.String getRevision()
      Returns the revision string.
      static void main​(java.lang.String[] args)
      Main method for testing this class.
      static void mapClasses​(int numClusters, int lev, int[][] counts, int[] clusterTotals, double[] current, double[] best, int error)
      Finds the minimum error mapping of classes to clusters.
      void setClusterer​(Clusterer clusterer)
      set the clusterer
      • Methods inherited from class java.lang.Object

        getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ClusterEvaluation

        public ClusterEvaluation()
        Constructor. Sets defaults for each member variable. Default Clusterer is EM.
    • Method Detail

      • setClusterer

        public void setClusterer​(Clusterer clusterer)
        set the clusterer
        Parameters:
        clusterer - the clusterer to use
      • clusterResultsToString

        public java.lang.String clusterResultsToString()
        return the results of clustering.
        Returns:
        a string detailing the results of clustering a data set
      • getNumClusters

        public int getNumClusters()
        Return the number of clusters found for the most recent call to evaluateClusterer
        Returns:
        the number of clusters found
      • getClusterAssignments

        public double[] getClusterAssignments()
        Return an array of cluster assignments corresponding to the most recent set of instances clustered.
        Returns:
        an array of cluster assignments
      • getClassesToClusters

        public int[] getClassesToClusters()
        Return the array (ordered by cluster number) of minimum error class to cluster mappings
        Returns:
        an array of class to cluster mappings
      • getLogLikelihood

        public double getLogLikelihood()
        Return the log likelihood corresponding to the most recent set of instances clustered.
        Returns:
        a double value
      • evaluateClusterer

        public void evaluateClusterer​(Instances test)
                               throws java.lang.Exception
        Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
        Parameters:
        test - the set of instances to cluster
        Throws:
        java.lang.Exception - if something goes wrong
      • evaluateClusterer

        public void evaluateClusterer​(Instances test,
                                      java.lang.String testFileName)
                               throws java.lang.Exception
        Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
        Parameters:
        test - the set of instances to cluster
        testFileName - the name of the test file for incremental testing, if "" or null then not used
        Throws:
        java.lang.Exception - if something goes wrong
      • evaluateClusterer

        public void evaluateClusterer​(Instances test,
                                      java.lang.String testFileName,
                                      boolean outputModel)
                               throws java.lang.Exception
        Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
        Parameters:
        test - the set of instances to cluster
        testFileName - the name of the test file for incremental testing, if "" or null then not used
        outputModel - true if the clustering model is to be output as well as the stats
        Throws:
        java.lang.Exception - if something goes wrong
      • mapClasses

        public static void mapClasses​(int numClusters,
                                      int lev,
                                      int[][] counts,
                                      int[] clusterTotals,
                                      double[] current,
                                      double[] best,
                                      int error)
        Finds the minimum error mapping of classes to clusters. Recursively considers all possible class to cluster assignments.
        Parameters:
        numClusters - the number of clusters
        lev - the cluster being processed
        counts - the counts of classes in clusters
        clusterTotals - the total number of examples in each cluster
        current - the current path through the class to cluster assignment tree
        best - the best assignment path seen
        error - accumulates the error for a particular path
      • evaluateClusterer

        public static java.lang.String evaluateClusterer​(Clusterer clusterer,
                                                         java.lang.String[] options)
                                                  throws java.lang.Exception
        Evaluates a clusterer with the options given in an array of strings. It takes the string indicated by "-t" as training file, the string indicated by "-T" as test file. If the test file is missing, a stratified ten-fold cross-validation is performed (distribution clusterers only). Using "-x" you can change the number of folds to be used, and using "-s" the random seed. If the "-p" option is present it outputs the classification for each test instance. If you provide the name of an object file using "-l", a clusterer will be loaded from the given file. If you provide the name of an object file using "-d", the clusterer built from the training data will be saved to the given file.
        Parameters:
        clusterer - machine learning clusterer
        options - the array of string containing the options
        Returns:
        a string describing the results
        Throws:
        java.lang.Exception - if model could not be evaluated successfully
      • crossValidateModel

        public static double crossValidateModel​(DensityBasedClusterer clusterer,
                                                Instances data,
                                                int numFolds,
                                                java.util.Random random)
                                         throws java.lang.Exception
        Perform a cross-validation for DensityBasedClusterer on a set of instances.
        Parameters:
        clusterer - the clusterer to use
        data - the training data
        numFolds - number of folds of cross validation to perform
        random - random number seed for cross-validation
        Returns:
        the cross-validated log-likelihood
        Throws:
        java.lang.Exception - if an error occurs
      • crossValidateModel

        public static java.lang.String crossValidateModel​(java.lang.String clustererString,
                                                          Instances data,
                                                          int numFolds,
                                                          java.lang.String[] options,
                                                          java.util.Random random)
                                                   throws java.lang.Exception
        Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.
        Parameters:
        clustererString - a string naming the class of the clusterer
        data - the data on which the cross-validation is to be performed
        numFolds - the number of folds for the cross-validation
        options - the options to the clusterer
        random - a random number generator
        Returns:
        a string containing the cross validated log likelihood
        Throws:
        java.lang.Exception - if a clusterer could not be generated
      • equals

        public boolean equals​(java.lang.Object obj)
        Tests whether the current evaluation object is equal to another evaluation object
        Overrides:
        equals in class java.lang.Object
        Parameters:
        obj - the object to compare against
        Returns:
        true if the two objects are equal
      • getRevision

        public java.lang.String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface RevisionHandler
        Returns:
        the revision
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        args - the options