Class InterquartileRange
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.attribute.InterquartileRange
-
- All Implemented Interfaces:
java.io.Serializable
,CapabilitiesHandler
,OptionHandler
,RevisionHandler
public class InterquartileRange extends SimpleBatchFilter
A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor Valid options are:-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)
Thanks to Dale for a few brainstorming sessions.- Version:
- $Revision: 9529 $
- Author:
- Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int
NON_NUMERIC
indicator for non-numeric attributes
-
Constructor Summary
Constructors Constructor Description InterquartileRange()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
attributeIndicesTipText()
Returns the tip text for this propertyjava.lang.String
detectionPerAttributeTipText()
Returns the tip text for this propertyjava.lang.String
extremeValuesAsOutliersTipText()
Returns the tip text for this propertyjava.lang.String
extremeValuesFactorTipText()
Returns the tip text for this propertyjava.lang.String
getAttributeIndices()
Gets the current range selectionCapabilities
getCapabilities()
Returns the Capabilities of this filter.boolean
getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").boolean
getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers.double
getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values.java.lang.String[]
getOptions()
Gets the current settings of the filter.double
getOutlierFactor()
Gets the factor for determining the thresholds for outliers.boolean
getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.java.lang.String
getRevision()
Returns the revision string.java.lang.String
globalInfo()
Returns a string describing this filterjava.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] args)
Main method for testing this class.java.lang.String
outlierFactorTipText()
Returns the tip text for this propertyjava.lang.String
outputOffsetMultiplierTipText()
Returns the tip text for this propertyvoid
setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).void
setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).void
setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").void
setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers.void
setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values.void
setOptions(java.lang.String[] options)
Parses a list of options for this object.void
setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers.void
setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.-
Methods inherited from class weka.filters.SimpleBatchFilter
batchFinished, input
-
Methods inherited from class weka.filters.SimpleFilter
debugTipText, getDebug, setDebug, setInputFormat
-
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
NON_NUMERIC
public static final int NON_NUMERIC
indicator for non-numeric attributes- See Also:
- Constant Field Values
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this filter- Specified by:
globalInfo
in classSimpleFilter
- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classSimpleFilter
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a list of options for this object. Valid options are:-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classSimpleFilter
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported- See Also:
SimpleFilter.reset()
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classSimpleFilter
- Returns:
- an array of strings suitable for passing to setOptions
-
attributeIndicesTipText
public java.lang.String attributeIndicesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
public java.lang.String getAttributeIndices()
Gets the current range selection- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
public void setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value
- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
java.lang.IllegalArgumentException
- if an invalid range list is supplied
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value
- an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.- Throws:
java.lang.IllegalArgumentException
- if an invalid set of ranges is supplied
-
outlierFactorTipText
public java.lang.String outlierFactorTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutlierFactor
public void setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers.- Parameters:
value
- the factor.
-
getOutlierFactor
public double getOutlierFactor()
Gets the factor for determining the thresholds for outliers.- Returns:
- the factor.
-
extremeValuesFactorTipText
public java.lang.String extremeValuesFactorTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesFactor
public void setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values.- Parameters:
value
- the factor.
-
getExtremeValuesFactor
public double getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values.- Returns:
- the factor.
-
extremeValuesAsOutliersTipText
public java.lang.String extremeValuesAsOutliersTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesAsOutliers
public void setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers.- Parameters:
value
- whether or not to tag extreme values also as outliers.
-
getExtremeValuesAsOutliers
public boolean getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers.- Returns:
- true if extreme values are also tagged as outliers.
-
detectionPerAttributeTipText
public java.lang.String detectionPerAttributeTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDetectionPerAttribute
public void setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Parameters:
value
- whether or not to generate indicator attribute pairs for each numeric attribute.
-
getDetectionPerAttribute
public boolean getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Returns:
- true if indicator attribute pairs are generated for each numeric attribute.
-
outputOffsetMultiplierTipText
public java.lang.String outputOffsetMultiplierTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutputOffsetMultiplier
public void setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Parameters:
value
- whether or not to generate the additional attribute.
-
getOutputOffsetMultiplier
public boolean getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Returns:
- true if the additional attribute is generated.
-
getCapabilities
public Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classFilter
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classFilter
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method for testing this class.- Parameters:
args
- should contain arguments to the filter: use -h for help
-
-