|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
| Appliable | |
| Processor | |
| Tokenizer | A tokenizer has methods for tokenizing a String, Reader, or InputStream as words. |
| Class Summary | |
| AbstractTokenizer | Abstract tokenizer. |
| CollectionProcessor | |
| Lowercaser | Processor whose process method Converts a
collection of mixed-case Words to a collection of lowercase Words. |
| NumberFilter | Filter which converts numbers to the word "*NUMBER*" |
| PTBTokenizer | |
| SentenceBoundaryDetector | A (terrible) SentenceBoundaryDetector built to test the
Processor interface. |
| SimpleTokenizer | A tokenizer has methods for tokenizing a String as words or sentences |
| Stemmer | Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form. |
| StopList | Simple stoplist class. |
| StoplistFilter | Filter which removes stop-listed words. |
Contains classes for processing documents. The key here is the Processor
interface, which has a sole Document process(Document) method
which takes a document and returns another document, which may
be parsed, stoplisted, stemmed, etc.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||