edu.stanford.nlp.trees
Class SentenceNormalizer

java.lang.Object
  |
  +--edu.stanford.nlp.trees.SentenceNormalizer
Direct Known Subclasses:
OnePerLineSentenceNormalizer, PennSentenceNormalizer

public class SentenceNormalizer
extends Object

A class for sentence normalization. Part of the job of a SentenceNormalizer is to encode what is a sentence end. The default one does no normalization, but implements Penn Treebank rules for a sentence end. Other sentence normalizers will change various node labels. Another operation that a SentenceNormalizer may wish to perform is interning the String's passed to it. A Singleton. Designed to be overriden.


Constructor Summary
SentenceNormalizer()
           
 
Method Summary
 boolean endSentenceToken(String token, String prev, String next)
          Returns true if this token represents the end of a sentence.
 boolean eolIsSentenceEnd()
          This function can be checked by a SentenceReader so as to know whether an end-of-line is always to be treated as an end-of-sentence.
 Sentence normalizeSentence(Sentence sent, LabelFactory lf)
          Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence.
 String normalizeString(String word)
          Normalizes a read string word (and maybe intern it).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SentenceNormalizer

public SentenceNormalizer()
Method Detail

normalizeString

public String normalizeString(String word)
Normalizes a read string word (and maybe intern it).

Parameters:
word - The word to normalize
Returns:
The normalized form

normalizeSentence

public Sentence normalizeSentence(Sentence sent,
                                  LabelFactory lf)
Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence. It is normally implemented as a List-walking routine. It is assumed that the unnormalized sentence can be destructively modified, as it is otherwise unneeded.

Parameters:
sent - The sentence to be normalized
lf - the LabelFactory to create new words (if needed)
Returns:
Sentence the normalized sentence

eolIsSentenceEnd

public boolean eolIsSentenceEnd()
This function can be checked by a SentenceReader so as to know whether an end-of-line is always to be treated as an end-of-sentence. If this is true, then the endSentenceToken() function is not used.

Returns:
true if an eol is always a sentence end

endSentenceToken

public boolean endSentenceToken(String token,
                                String prev,
                                String next)
Returns true if this token represents the end of a sentence. Perhaps shouldn't be in this class, but it seemed a good place since other source-specific handling is here....

Parameters:
token - The String to be checked
prev - The previous token
next - The next token (lookahead)
Returns:
boolean True if this token is a sentence end


Stanford NLP Group