SentenceNormalizer (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.trees
Class SentenceNormalizer

java.lang.Object
  |
  +--edu.stanford.nlp.trees.SentenceNormalizer

Direct Known Subclasses:: OnePerLineSentenceNormalizer, PennSentenceNormalizer

public class SentenceNormalizer
extends Object

A class for sentence normalization. Part of the job of a SentenceNormalizer is to encode what is a sentence end. The default one does no normalization, but implements Penn Treebank rules for a sentence end. Other sentence normalizers will change various node labels. Another operation that a SentenceNormalizer may wish to perform is interning the String's passed to it. A Singleton. Designed to be overriden.

Constructor Summary

SentenceNormalizer()


Method Summary

boolean endSentenceToken(String token, String prev, String next)
          Returns true if this token represents the end of a sentence.

boolean eolIsSentenceEnd()
          This function can be checked by a SentenceReader so as to know whether an end-of-line is always to be treated as an end-of-sentence.

Sentence normalizeSentence(Sentence sent, LabelFactory lf)
          Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence.

String normalizeString(String word)
          Normalizes a read string word (and maybe intern it).

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

SentenceNormalizer

public SentenceNormalizer()

Method Detail

normalizeString

public String normalizeString(String word)

Normalizes a read string word (and maybe intern it).

Parameters:: word - The word to normalize
Returns:: The normalized form

normalizeSentence

public Sentence normalizeSentence(Sentence sent,
                                  LabelFactory lf)

Normalize a sentence -- this method assumes that the argument that it is passed is the whole (linguistic) Sentence. It is normally implemented as a List-walking routine. It is assumed that the unnormalized sentence can be destructively modified, as it is otherwise unneeded.

Parameters:: sent - The sentence to be normalized; lf - the LabelFactory to create new words (if needed)
Returns:: Sentence the normalized sentence

eolIsSentenceEnd

public boolean eolIsSentenceEnd()

This function can be checked by a SentenceReader so as to know whether an end-of-line is always to be treated as an end-of-sentence. If this is true, then the endSentenceToken() function is not used.

Returns:: true if an eol is always a sentence end

endSentenceToken

public boolean endSentenceToken(String token,
                                String prev,
                                String next)

Returns true if this token represents the end of a sentence. Perhaps shouldn't be in this class, but it seemed a good place since other source-specific handling is here....

Parameters:: token - The String to be checked; prev - The previous token; next - The next token (lookahead)
Returns:: boolean True if this token is a sentence end