Package weka.datagenerators.clusterers
Class BIRCHCluster
- java.lang.Object
-
- weka.datagenerators.DataGenerator
-
- weka.datagenerators.ClusterGenerator
-
- weka.datagenerators.clusterers.BIRCHCluster
-
- All Implemented Interfaces:
java.io.Serializable
,OptionHandler
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
public class BIRCHCluster extends ClusterGenerator implements TechnicalInformationHandler
Cluster data generator designed for the BIRCH System
Dataset is generated with instances in K clusters.
Instances are 2-d data points.
Each cluster is characterized by the number of data points in itits radius and its center. The location of the cluster centers isdetermined by the pattern parameter. Three patterns are currentlysupported grid, sine and random.
For more information refer to:
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: ACM SIGMOD International Conference on Management of Data, 103-114, 1996. BibTeX:@inproceedings{Zhang1996, author = {Tian Zhang and Raghu Ramakrishnan and Miron Livny}, booktitle = {ACM SIGMOD International Conference on Management of Data}, pages = {103-114}, publisher = {ACM Press}, title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases}, year = {1996} }
Valid options are:-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
- Version:
- $Revision: 1.8 $
- Author:
- Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int
GRID
Constant set for choice of pattern.static int
ORDERED
Constant set for input order (option O)static int
RANDOM
Constant set for choice of pattern.static int
RANDOMIZED
Constant set for input order (default)static int
SINE
Constant set for choice of pattern.static Tag[]
TAGS_INPUTORDER
the input order tagsstatic Tag[]
TAGS_PATTERN
the pattern tags
-
Constructor Summary
Constructors Constructor Description BIRCHCluster()
initializes the generator with default values
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Instances
defineDataFormat()
Initializes the format for the dataset produced.java.lang.String
distMultTipText()
Returns the tip text for this propertyInstance
generateExample()
Generate an example of the dataset.Instances
generateExamples()
Generate all examples of the dataset.Instances
generateExamples(java.util.Random random, Instances format)
Generate all examples of the dataset.java.lang.String
generateFinished()
Compiles documentation about the data generation after the generation processjava.lang.String
generateStart()
Compiles documentation about the data generation before the generation processdouble
getDistMult()
Gets the distance multiplier.SelectedTag
getInputOrder()
Gets the input order.int
getMaxInstNum()
Gets the upper boundary for instances per cluster.double
getMaxRadius()
Gets the upper boundary for the radiuses of the clusters.int
getMinInstNum()
Gets the lower boundary for instances per cluster.double
getMinRadius()
Gets the lower boundary for the radiuses of the clusters.double
getNoiseRate()
Gets the percentage of noise set.int
getNumClusters()
Gets the number of clusters the dataset should have.int
getNumCycles()
Gets the number of cycles.java.lang.String[]
getOptions()
Gets the current settings of the datagenerator BIRCHCluster.boolean
getOrderedFlag()
Gets the ordered flag (option O).SelectedTag
getPattern()
Gets the pattern type.java.lang.String
getRevision()
Returns the revision string.boolean
getSingleModeFlag()
Gets the single mode flag.TechnicalInformation
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.java.lang.String
globalInfo()
Returns a string describing this data generator.java.lang.String
inputOrderTipText()
Returns the tip text for this propertyjava.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] args)
Main method for testing this class.java.lang.String
maxInstNumTipText()
Returns the tip text for this propertyjava.lang.String
maxRadiusTipText()
Returns the tip text for this propertyjava.lang.String
minInstNumTipText()
Returns the tip text for this propertyjava.lang.String
minRadiusTipText()
Returns the tip text for this propertyjava.lang.String
noiseRateTipText()
Returns the tip text for this propertyjava.lang.String
numClustersTipText()
Returns the tip text for this propertyjava.lang.String
numCyclesTipText()
Returns the tip text for this propertyjava.lang.String
patternTipText()
Returns the tip text for this propertyvoid
setDistMult(double newDistMult)
Sets the distance multiplier.void
setInputOrder(SelectedTag value)
Sets the input order.void
setMaxInstNum(int newMaxInstNum)
Sets the upper boundary for instances per cluster.void
setMaxRadius(double newMaxRadius)
Sets the upper boundary for the radiuses of the clusters.void
setMinInstNum(int newMinInstNum)
Sets the lower boundary for instances per cluster.void
setMinRadius(double newMinRadius)
Sets the lower boundary for the radiuses of the clusters.void
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.void
setNumClusters(int numClusters)
Sets the number of clusters the dataset should have.void
setNumCycles(int newNumCycles)
Sets the the number of cycles.void
setOptions(java.lang.String[] options)
Parses a list of options for this object.void
setPattern(SelectedTag value)
Sets the pattern type.-
Methods inherited from class weka.datagenerators.ClusterGenerator
booleanColsTipText, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, numAttributesTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndices, setNumAttributes
-
Methods inherited from class weka.datagenerators.DataGenerator
debugTipText, defaultOutput, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed
-
-
-
-
Field Detail
-
GRID
public static final int GRID
Constant set for choice of pattern. (option G)- See Also:
- Constant Field Values
-
SINE
public static final int SINE
Constant set for choice of pattern. (option I)- See Also:
- Constant Field Values
-
RANDOM
public static final int RANDOM
Constant set for choice of pattern. (default)- See Also:
- Constant Field Values
-
TAGS_PATTERN
public static final Tag[] TAGS_PATTERN
the pattern tags
-
ORDERED
public static final int ORDERED
Constant set for input order (option O)- See Also:
- Constant Field Values
-
RANDOMIZED
public static final int RANDOMIZED
Constant set for input order (default)- See Also:
- Constant Field Values
-
TAGS_INPUTORDER
public static final Tag[] TAGS_INPUTORDER
the input order tags
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this data generator.- Returns:
- a description of the data generator suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classClusterGenerator
- Returns:
- an enumeration of all the available options
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a list of options for this object. Valid options are:-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classClusterGenerator
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the datagenerator BIRCHCluster.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classClusterGenerator
- Returns:
- an array of strings suitable for passing to setOptions
- See Also:
DataGenerator.removeBlacklist(String[])
-
setNumClusters
public void setNumClusters(int numClusters)
Sets the number of clusters the dataset should have.- Parameters:
numClusters
- the new number of clusters
-
getNumClusters
public int getNumClusters()
Gets the number of clusters the dataset should have.- Returns:
- the number of clusters the dataset should have
-
numClustersTipText
public java.lang.String numClustersTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinInstNum
public int getMinInstNum()
Gets the lower boundary for instances per cluster.- Returns:
- the the lower boundary for instances per cluster
-
setMinInstNum
public void setMinInstNum(int newMinInstNum)
Sets the lower boundary for instances per cluster.- Parameters:
newMinInstNum
- new lower boundary for instances per cluster
-
minInstNumTipText
public java.lang.String minInstNumTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxInstNum
public int getMaxInstNum()
Gets the upper boundary for instances per cluster.- Returns:
- the upper boundary for instances per cluster
-
setMaxInstNum
public void setMaxInstNum(int newMaxInstNum)
Sets the upper boundary for instances per cluster.- Parameters:
newMaxInstNum
- new upper boundary for instances per cluster
-
maxInstNumTipText
public java.lang.String maxInstNumTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinRadius
public double getMinRadius()
Gets the lower boundary for the radiuses of the clusters.- Returns:
- the lower boundary for the radiuses of the clusters
-
setMinRadius
public void setMinRadius(double newMinRadius)
Sets the lower boundary for the radiuses of the clusters.- Parameters:
newMinRadius
- new lower boundary for the radiuses of the clusters
-
minRadiusTipText
public java.lang.String minRadiusTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxRadius
public double getMaxRadius()
Gets the upper boundary for the radiuses of the clusters.- Returns:
- the upper boundary for the radiuses of the clusters
-
setMaxRadius
public void setMaxRadius(double newMaxRadius)
Sets the upper boundary for the radiuses of the clusters.- Parameters:
newMaxRadius
- new upper boundary for the radiuses of the clusters
-
maxRadiusTipText
public java.lang.String maxRadiusTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getPattern
public SelectedTag getPattern()
Gets the pattern type.- Returns:
- the current pattern type
-
setPattern
public void setPattern(SelectedTag value)
Sets the pattern type.- Parameters:
value
- new pattern type
-
patternTipText
public java.lang.String patternTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDistMult
public double getDistMult()
Gets the distance multiplier.- Returns:
- the distance multiplier
-
setDistMult
public void setDistMult(double newDistMult)
Sets the distance multiplier.- Parameters:
newDistMult
- new distance multiplier
-
distMultTipText
public java.lang.String distMultTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumCycles
public int getNumCycles()
Gets the number of cycles.- Returns:
- the number of cycles
-
setNumCycles
public void setNumCycles(int newNumCycles)
Sets the the number of cycles.- Parameters:
newNumCycles
- new number of cycles
-
numCyclesTipText
public java.lang.String numCyclesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getInputOrder
public SelectedTag getInputOrder()
Gets the input order.- Returns:
- the current input order
-
setInputOrder
public void setInputOrder(SelectedTag value)
Sets the input order.- Parameters:
value
- new input order
-
inputOrderTipText
public java.lang.String inputOrderTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getOrderedFlag
public boolean getOrderedFlag()
Gets the ordered flag (option O).- Returns:
- true if ordered flag is set
-
getNoiseRate
public double getNoiseRate()
Gets the percentage of noise set.- Returns:
- the percentage of noise set
-
setNoiseRate
public void setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.- Parameters:
newNoiseRate
- new percentage of noise
-
noiseRateTipText
public java.lang.String noiseRateTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getSingleModeFlag
public boolean getSingleModeFlag()
Gets the single mode flag.- Specified by:
getSingleModeFlag
in classDataGenerator
- Returns:
- true if methode generateExample can be used.
-
defineDataFormat
public Instances defineDataFormat() throws java.lang.Exception
Initializes the format for the dataset produced.- Overrides:
defineDataFormat
in classDataGenerator
- Returns:
- the output data format
- Throws:
java.lang.Exception
- data format could not be defined- See Also:
DataGenerator.defaultRelationName()
-
generateExample
public Instance generateExample() throws java.lang.Exception
Generate an example of the dataset.- Specified by:
generateExample
in classDataGenerator
- Returns:
- the instance generated
- Throws:
java.lang.Exception
- if format not defined or generating
examples one by one is not possible, because voting is chosen
-
generateExamples
public Instances generateExamples() throws java.lang.Exception
Generate all examples of the dataset.- Specified by:
generateExamples
in classDataGenerator
- Returns:
- the instance generated
- Throws:
java.lang.Exception
- if format not defined
-
generateExamples
public Instances generateExamples(java.util.Random random, Instances format) throws java.lang.Exception
Generate all examples of the dataset.- Parameters:
random
- the random number generator to useformat
- the dataset format- Returns:
- the instance generated
- Throws:
java.lang.Exception
- if format not defined
-
generateFinished
public java.lang.String generateFinished() throws java.lang.Exception
Compiles documentation about the data generation after the generation process- Specified by:
generateFinished
in classDataGenerator
- Returns:
- string with additional information about generated dataset
- Throws:
java.lang.Exception
- no input structure has been defined
-
generateStart
public java.lang.String generateStart()
Compiles documentation about the data generation before the generation process- Specified by:
generateStart
in classDataGenerator
- Returns:
- string with additional information
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method for testing this class.- Parameters:
args
- should contain arguments for the data producer:
-
-