Package edu.stanford.nlp.process

Contains classes for processing documents.


Interface Summary
Tokenizer A tokenizer has methods for tokenizing a String, Reader, or InputStream as words.

Class Summary
AbstractTokenizer Abstract tokenizer.
Lowercaser Processor whose process method Converts a collection of mixed-case Words to a collection of lowercase Words.
NumberFilter Filter which converts numbers to the word "*NUMBER*"
SentenceBoundaryDetector A (terrible) SentenceBoundaryDetector built to test the Processor interface.
SimpleTokenizer A tokenizer has methods for tokenizing a String as words or sentences
Stemmer Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.
StopList Simple stoplist class.
StoplistFilter Filter which removes stop-listed words.

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.

Sepandar David Kamvar
Last modified: Thu Oct 31 11:14:34 PST 2002

Stanford NLP Group