Package edu.stanford.nlp.annotation

A simple markup annotator, originally designed for producing training data for supervised information extraction systems.

See:
          Description

Interface Summary
HtmlLexerConstants  
 

Class Summary
Annotator A simple Java editor that supports the easy addition of xml-style tags to mark off portions of text.
HtmlCleaner HtmlCleaner removes various code elements (style, script, applet, and so on) from an HTML document.
HtmlLexer  
HtmlLexerTokenManager  
PTBLexer This class is a scanner generated by JFlex 1.3.5 on 8/5/02 12:16 AM from the specification file file:/dfs/hake/0/grow/lexer/jflex/ptblexer.flex
SimpleCharStream An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
TaggedStreamTokenizer TaggedStreamTokenizer is similar to java.io.StreamTokenizer, except that it is better suited to deal with documents containing html-style tags.
Token Describes the input token stream.
 

Exception Summary
ParseException This exception is thrown when parse errors are encountered.
 

Error Summary
TokenMgrError  
 

Package edu.stanford.nlp.annotation Description

A simple markup annotator, originally designed for producing training data for supervised information extraction systems. The main class for doing this is Annotator.

This package currently also contains a couple of tokenizers. They should probably really live somewhere else.

Since:
1.2



Stanford NLP Group