Class JRip

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

    public class JRip
    extends Classifier
    implements AdditionalMeasureProducer, WeightedInstancesHandler, TechnicalInformationHandler
    This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP.

    The algorithm is briefly described as follows:

    Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

    1. Building stage:
    Repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

    1.1. Grow phase:
    Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

    1.2. Prune phase:
    Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

    2. Optimization stage:
    after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.
    3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.
    ENDDO

    Note that there seem to be 2 bugs in the original ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.

    Details please see:

    William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.

    PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.

    BibTeX:

     @inproceedings{Cohen1995,
        author = {William W. Cohen},
        booktitle = {Twelfth International Conference on Machine Learning},
        pages = {115-123},
        publisher = {Morgan Kaufmann},
        title = {Fast Effective Rule Induction},
        year = {1995}
     }
     

    Valid options are:

     -F <number of folds>
      Set number of folds for REP
      One fold is used as pruning set.
      (default 3)
     -N <min. weights>
      Set the minimal weights of instances
      within a split.
      (default 2.0)
     -O <number of runs>
      Set the number of runs of
      optimizations. (Default: 2)
     -D
      Set whether turn on the
      debug mode (Default: false)
     -S <seed>
      The seed of randomization
      (Default: 1)
     -E
      Whether NOT check the error rate>=0.5
      in stopping criteria  (default: check)
     -P
      Whether NOT use pruning
      (default: use pruning)
    Version:
    $Revision: 8119 $
    Author:
    Xin Xu (xx5@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      class  JRip.Antd
      The single antecedent in the rule, which is composed of an attribute and the corresponding value.
      class  JRip.NominalAntd
      The antecedent with nominal attribute
      class  JRip.NumericAntd
      The antecedent with numeric attribute
      class  JRip.RipperRule
      This class implements a single rule that predicts specified class.
    • Constructor Summary

      Constructors 
      Constructor Description
      JRip()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void buildClassifier​(Instances instances)
      Builds Ripper in the order of class frequencies.
      java.lang.String checkErrorRateTipText()
      Returns the tip text for this property
      java.lang.String debugTipText()
      Returns the tip text for this property
      double[] distributionForInstance​(Instance datum)
      Classify the test instance with the rule learner and provide the class distributions
      java.util.Enumeration enumerateMeasures()
      Returns an enumeration of the additional measure names
      java.lang.String foldsTipText()
      Returns the tip text for this property
      Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      boolean getCheckErrorRate()
      Gets whether to check for error rate is in stopping criterion
      boolean getDebug()
      Gets whether debug information is output to the console
      int getFolds()
      Gets the number of folds
      double getMeasure​(java.lang.String additionalMeasureName)
      Returns the value of the named measure
      double getMinNo()
      Gets the minimum total weight of the instances in a rule
      int getOptimizations()
      Gets the the number of optimization runs
      java.lang.String[] getOptions()
      Gets the current settings of the Classifier.
      java.lang.String getRevision()
      Returns the revision string.
      FastVector getRuleset()
      Get the ruleset generated by Ripper
      RuleStats getRuleStats​(int pos)
      Get the statistics of the ruleset in the given position
      long getSeed()
      Gets the current seed value to use in randomizing the data
      TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      boolean getUsePruning()
      Gets whether pruning is performed
      java.lang.String globalInfo()
      Returns a string describing classifier
      java.util.Enumeration listOptions()
      Returns an enumeration describing the available options Valid options are:
      static void main​(java.lang.String[] args)
      Main method.
      java.lang.String minNoTipText()
      Returns the tip text for this property
      java.lang.String optimizationsTipText()
      Returns the tip text for this property
      java.lang.String seedTipText()
      Returns the tip text for this property
      void setCheckErrorRate​(boolean d)
      Sets whether to check for error rate is in stopping criterion
      void setDebug​(boolean d)
      Sets whether debug information is output to the console
      void setFolds​(int fold)
      Sets the number of folds to use
      void setMinNo​(double m)
      Sets the minimum total weight of the instances in a rule
      void setOptimizations​(int run)
      Sets the number of optimization runs
      void setOptions​(java.lang.String[] options)
      Parses a given list of options.
      void setSeed​(long s)
      Sets the seed value to use in randomizing the data
      void setUsePruning​(boolean d)
      Sets whether pruning is performed
      java.lang.String toString()
      Prints the all the rules of the rule learner.
      java.lang.String usePruningTipText()
      Returns the tip text for this property
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • JRip

        public JRip()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options Valid options are:

        -F number
        The number of folds for reduced error pruning. One fold is used as the pruning set. (Default: 3)

        -N number
        The minimal weights of instances within a split. (Default: 2)

        -O number
        Set the number of runs of optimizations. (Default: 2)

        -D
        Whether turn on the debug mode -S number
        The seed of randomization used in Ripper.(Default: 1)

        -E
        Whether NOT check the error rate >= 0.5 in stopping criteria. (default: check)

        -P
        Whether NOT use pruning. (default: use pruning)

        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class Classifier
        Returns:
        an enumeration of all the available options
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -F <number of folds>
          Set number of folds for REP
          One fold is used as pruning set.
          (default 3)
         -N <min. weights>
          Set the minimal weights of instances
          within a split.
          (default 2.0)
         -O <number of runs>
          Set the number of runs of
          optimizations. (Default: 2)
         -D
          Set whether turn on the
          debug mode (Default: false)
         -S <seed>
          The seed of randomization
          (Default: 1)
         -E
          Whether NOT check the error rate>=0.5
          in stopping criteria  (default: check)
         -P
          Whether NOT use pruning
          (default: use pruning)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class Classifier
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of the Classifier.
        Specified by:
        getOptions in interface OptionHandler
        Overrides:
        getOptions in class Classifier
        Returns:
        an array of strings suitable for passing to setOptions
      • enumerateMeasures

        public java.util.Enumeration enumerateMeasures()
        Returns an enumeration of the additional measure names
        Specified by:
        enumerateMeasures in interface AdditionalMeasureProducer
        Returns:
        an enumeration of the measure names
      • getMeasure

        public double getMeasure​(java.lang.String additionalMeasureName)
        Returns the value of the named measure
        Specified by:
        getMeasure in interface AdditionalMeasureProducer
        Parameters:
        additionalMeasureName - the name of the measure to query for its value
        Returns:
        the value of the named measure
        Throws:
        java.lang.IllegalArgumentException - if the named measure is not supported
      • foldsTipText

        public java.lang.String foldsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setFolds

        public void setFolds​(int fold)
        Sets the number of folds to use
        Parameters:
        fold - the number of folds
      • getFolds

        public int getFolds()
        Gets the number of folds
        Returns:
        the number of folds
      • minNoTipText

        public java.lang.String minNoTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMinNo

        public void setMinNo​(double m)
        Sets the minimum total weight of the instances in a rule
        Parameters:
        m - the minimum total weight of the instances in a rule
      • getMinNo

        public double getMinNo()
        Gets the minimum total weight of the instances in a rule
        Returns:
        the minimum total weight of the instances in a rule
      • seedTipText

        public java.lang.String seedTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSeed

        public void setSeed​(long s)
        Sets the seed value to use in randomizing the data
        Parameters:
        s - the new seed value
      • getSeed

        public long getSeed()
        Gets the current seed value to use in randomizing the data
        Returns:
        the seed value
      • optimizationsTipText

        public java.lang.String optimizationsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setOptimizations

        public void setOptimizations​(int run)
        Sets the number of optimization runs
        Parameters:
        run - the number of optimization runs
      • getOptimizations

        public int getOptimizations()
        Gets the the number of optimization runs
        Returns:
        the number of optimization runs
      • debugTipText

        public java.lang.String debugTipText()
        Returns the tip text for this property
        Overrides:
        debugTipText in class Classifier
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebug

        public void setDebug​(boolean d)
        Sets whether debug information is output to the console
        Overrides:
        setDebug in class Classifier
        Parameters:
        d - whether debug information is output to the console
      • getDebug

        public boolean getDebug()
        Gets whether debug information is output to the console
        Overrides:
        getDebug in class Classifier
        Returns:
        whether debug information is output to the console
      • checkErrorRateTipText

        public java.lang.String checkErrorRateTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCheckErrorRate

        public void setCheckErrorRate​(boolean d)
        Sets whether to check for error rate is in stopping criterion
        Parameters:
        d - whether to check for error rate is in stopping criterion
      • getCheckErrorRate

        public boolean getCheckErrorRate()
        Gets whether to check for error rate is in stopping criterion
        Returns:
        true if checking for error rate is in stopping criterion
      • usePruningTipText

        public java.lang.String usePruningTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setUsePruning

        public void setUsePruning​(boolean d)
        Sets whether pruning is performed
        Parameters:
        d - Whether pruning is performed
      • getUsePruning

        public boolean getUsePruning()
        Gets whether pruning is performed
        Returns:
        true if pruning is performed
      • getRuleset

        public FastVector getRuleset()
        Get the ruleset generated by Ripper
        Returns:
        the ruleset
      • getRuleStats

        public RuleStats getRuleStats​(int pos)
        Get the statistics of the ruleset in the given position
        Parameters:
        pos - the position of the stats, assuming correct
        Returns:
        the statistics of the ruleset in the given position
      • buildClassifier

        public void buildClassifier​(Instances instances)
                             throws java.lang.Exception
        Builds Ripper in the order of class frequencies. For each class it's built in two stages: building and optimization
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        instances - the training data
        Throws:
        java.lang.Exception - if classifier can't be built successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance datum)
        Classify the test instance with the rule learner and provide the class distributions
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        datum - the instance to be classified
        Returns:
        the distribution
      • toString

        public java.lang.String toString()
        Prints the all the rules of the rule learner.
        Overrides:
        toString in class java.lang.Object
        Returns:
        a textual description of the classifier
      • main

        public static void main​(java.lang.String[] args)
        Main method.
        Parameters:
        args - the options for the classifier