Package weka.core
Class Stopwords
- java.lang.Object
-
- weka.core.Stopwords
-
- All Implemented Interfaces:
RevisionHandler
public class Stopwords extends java.lang.Object implements RevisionHandler
Class that can test whether a given string is a stop word. Lowercases all words before the test. The format for reading and writing is one word per line, lines starting with '#' are interpreted as comments and therefore skipped. The default stopwords are based on Rainbow. Accepts the following parameter: -i file
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.- Version:
- $Revision: 1.6 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Ashraf M. Kibriya (amk14@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(java.lang.String word)
adds the given word to the stopword list (is automatically converted to lower case and trimmed)void
clear()
removes all stopwordsjava.util.Enumeration
elements()
Returns a sorted enumeration over all stored stopwordsjava.lang.String
getRevision()
Returns the revision string.boolean
is(java.lang.String word)
Returns true if the given string is a stop word.static boolean
isStopword(java.lang.String str)
Returns true if the given string is a stop word.static void
main(java.lang.String[] args)
Accepts the following parameter:void
read(java.io.BufferedReader reader)
Generates a new Stopwords object from the reader.void
read(java.io.File file)
Generates a new Stopwords object from the given filevoid
read(java.lang.String filename)
Generates a new Stopwords object from the given fileboolean
remove(java.lang.String word)
removes the word from the stopword listjava.lang.String
toString()
returns the current stopwords in a stringvoid
write(java.io.BufferedWriter writer)
Writes the current stopwords to the given writer.void
write(java.io.File file)
Writes the current stopwords to the given filevoid
write(java.lang.String filename)
Writes the current stopwords to the given file
-
-
-
Constructor Detail
-
Stopwords
public Stopwords()
initializes the stopwords (based on Rainbow).
-
-
Method Detail
-
clear
public void clear()
removes all stopwords
-
add
public void add(java.lang.String word)
adds the given word to the stopword list (is automatically converted to lower case and trimmed)- Parameters:
word
- the word to add
-
remove
public boolean remove(java.lang.String word)
removes the word from the stopword list- Parameters:
word
- the word to remove- Returns:
- true if the word was found in the list and then removed
-
is
public boolean is(java.lang.String word)
Returns true if the given string is a stop word.- Parameters:
word
- the word to test- Returns:
- true if the word is a stopword
-
elements
public java.util.Enumeration elements()
Returns a sorted enumeration over all stored stopwords- Returns:
- the enumeration over all stopwords
-
read
public void read(java.lang.String filename) throws java.lang.Exception
Generates a new Stopwords object from the given file- Parameters:
filename
- the file to read the stopwords from- Throws:
java.lang.Exception
- if reading fails
-
read
public void read(java.io.File file) throws java.lang.Exception
Generates a new Stopwords object from the given file- Parameters:
file
- the file to read the stopwords from- Throws:
java.lang.Exception
- if reading fails
-
read
public void read(java.io.BufferedReader reader) throws java.lang.Exception
Generates a new Stopwords object from the reader. The reader is closed automatically.- Parameters:
reader
- the reader to get the stopwords from- Throws:
java.lang.Exception
- if reading fails
-
write
public void write(java.lang.String filename) throws java.lang.Exception
Writes the current stopwords to the given file- Parameters:
filename
- the file to write the stopwords to- Throws:
java.lang.Exception
- if writing fails
-
write
public void write(java.io.File file) throws java.lang.Exception
Writes the current stopwords to the given file- Parameters:
file
- the file to write the stopwords to- Throws:
java.lang.Exception
- if writing fails
-
write
public void write(java.io.BufferedWriter writer) throws java.lang.Exception
Writes the current stopwords to the given writer. The writer is closed automatically.- Parameters:
writer
- the writer to get the stopwords from- Throws:
java.lang.Exception
- if writing fails
-
toString
public java.lang.String toString()
returns the current stopwords in a string- Overrides:
toString
in classjava.lang.Object
- Returns:
- the current stopwords
-
isStopword
public static boolean isStopword(java.lang.String str)
Returns true if the given string is a stop word.- Parameters:
str
- the word to test- Returns:
- true if the word is a stopword
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
Accepts the following parameter: -i file
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.- Parameters:
args
- commandline parameters- Throws:
java.lang.Exception
- if something goes wrong
-
-