Package edu.stanford.nlp.process

Contains classes for processing documents.

See:
          Description

Interface Summary
Appliable  
Processor  
Tokenizer A tokenizer has methods for tokenizing a String, Reader, or InputStream as words.
 

Class Summary
AbstractTokenizer Abstract tokenizer.
CollectionProcessor  
Lowercaser Processor whose process method Converts a collection of mixed-case Words to a collection of lowercase Words.
NumberFilter Filter which converts numbers to the word "*NUMBER*"
PTBTokenizer  
SentenceBoundaryDetector A (terrible) SentenceBoundaryDetector built to test the Processor interface.
SimpleTokenizer A tokenizer has methods for tokenizing a String as words or sentences
Stemmer Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.
StopList Simple stoplist class.
StoplistFilter Filter which removes stop-listed words.
 

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.


Sepandar David Kamvar
Last modified: Thu Oct 31 11:14:34 PST 2002



Stanford NLP Group