|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
Appliable | |
Processor | |
Tokenizer | A tokenizer has methods for tokenizing a String, Reader, or InputStream as words. |
Class Summary | |
AbstractTokenizer | Abstract tokenizer. |
CollectionProcessor | |
Lowercaser | Processor whose process method Converts a
collection of mixed-case Words to a collection of lowercase Words. |
NumberFilter | Filter which converts numbers to the word "*NUMBER*" |
PTBTokenizer | |
SentenceBoundaryDetector | A (terrible) SentenceBoundaryDetector built to test the
Processor interface. |
SimpleTokenizer | A tokenizer has methods for tokenizing a String as words or sentences |
Stemmer | Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form. |
StopList | Simple stoplist class. |
StoplistFilter | Filter which removes stop-listed words. |
Contains classes for processing documents. The key here is the Processor
interface, which has a sole Document process(Document)
method
which takes a document and returns another document, which may
be parsed, stoplisted, stemmed, etc.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |