edu.stanford.nlp.process(Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package edu.stanford.nlp.process

Contains classes for processing documents.

See:
Description

Interface Summary

Appliable

Processor

Tokenizer A tokenizer has methods for tokenizing a String, Reader, or InputStream as words.

Class Summary

AbstractTokenizer Abstract tokenizer.

CollectionProcessor

Lowercaser Processor whose process method Converts a collection of mixed-case Words to a collection of lowercase Words.

NumberFilter Filter which converts numbers to the word "*NUMBER*"

PTBTokenizer

SentenceBoundaryDetector A (terrible) SentenceBoundaryDetector built to test the Processor interface.

SimpleTokenizer A tokenizer has methods for tokenizing a String as words or sentences

Stemmer Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.

StopList Simple stoplist class.

StoplistFilter Filter which removes stop-listed words.

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.

Sepandar David Kamvar

Last modified: Thu Oct 31 11:14:34 PST 2002